Try to get a fast (what I mean is detecting in lesss than 1 second on mainstream CPU) object-detection tool from Github, I experiment with some repositories written by PyTorch (because I am familiar with it). Below are some conclusions:
1. detectron2
This the official tool from Facebook Corporation. I download and installed it successfully. The test python code is:
import detectron2
from detectron2.utils.logger import setup_logger
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2 import model_zoo
setup_logger()
# import some common libraries
import numpy as np
import cv2
import sys
import time
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.DEVICE = "cpu"
predictor = DefaultPredictor(cfg)
img = cv2.imread(sys.argv[1])
begin = time.time()
outputs = predictor(img)
print("time:", time.time() - begin)
print(outputs)
Although can’t recognize all birds in below image, it will cost more than 5 seconds on CPU (my MackbookPro). Performance is not as good as my expectation.
2. efficientdet
From the paper, the EfficientDet should be fast and accurate. But after I wrote a test program, it totally couldn’t recognize the object at all. Then I gave up this solution.
3. EfficientDet.Pytorch
Couldn’t download models from it’s model_zoo.
4. ssd.pytorch
Finally, I came to my sweet ssd(Single Shot Detection). Since have studied it for more than half a year, I wrote below snippet quickly:
def base_transform(image, size, mean):
x = cv2.resize(image, (size, size)).astype(np.float32)
x -= mean
x = x.astype(np.float32)
return x
class BaseTransform:
def __init__(self, size, mean):
self.size = size
self.mean = np.array(mean, dtype=np.float32)
def __call__(self, image, boxes=None, labels=None):
return base_transform(image, self.size, self.mean), boxes, labels
def detect(img, net, transform):
FONT = cv2.FONT_HERSHEY_SIMPLEX
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
height, width = img.shape[:2]
x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1)
x = Variable(x.unsqueeze(0))
y = net(x) # forward pass
detections = y.data[0]
# scale each detection back up to the image
scale = torch.Tensor([width, height, width, height])
for index, loc in enumerate(detections[3]):
score = loc.numpy()[0]
if score >= 0.5:
loc = loc[1:]
pt = loc * scale
print(score, pt)
cv2.rectangle(
img,
(int(pt[0]), int(pt[1])),
(int(pt[2]), int(pt[3])),
COLORS[index % 3],
2,
)
cv2.putText(
img,
str(score),
(int(pt[0]), int(pt[1])),
FONT,
1,
(255, 255, 255),
1,
cv2.LINE_AA,
)
return img
img = cv2.imread("bird_matrix.jpg")
net = build_ssd("test", 300, 21) # initialize SSD
net.load_state_dict(torch.load("ssd300_mAP_77.43_v2.pth", map_location="cpu"))
transform = BaseTransform(net.size, (104 / 256.0, 117 / 256.0, 123 / 256.0))
img = detect(img, net, transform)
cv2.imwrite("result.jpg", img)
The result is not perfect but good enough for my current situation.