Lightning-Fast Face Detection: Achieving 100+ FPS with an Ultra-Lightweight CPU and OpenVINO Model

Luís Condados
5 min readAug 5, 2023

--

This article presents a highly impressive 😎 face-detection solution that is incredibly efficient and precise. What's more, it can achieve over 100 fps on an Intel CPU without the need for a GPU. 🚀

That will be the first article of a series of my personal “OpenVINO exploration”. So let’s get started!

Are you curious about OpenVINO and what it can do? Here's a brief introduction for those who may not be familiar with it.

OpenVINO is a robust toolkit that's open-source and optimized to enhance deep learning models for various devices 📱💻🖥️. The primary aim of OpenVINO is to simplify the process of developing and deploying AI-powered applications 🚀 by providing a streamlined and efficient framework that can be tailored to different use cases. Whether you're working on a project related to computer vision 👁️‍🗨️, natural language processing 🗣️, or any other AI-related field 🔬, OpenVINO can help you maximize the potential of your models and bring your ideas to fruition 💡.

So, let’s get back to the topic of this article, a very fast and accurate face detector that runs up to 100 fps on CPU!

The OpenVINO GitHub repository has a section that contains nothing less than a bunch of public pre-trained models ready to be used!

Screenshot from the public pre-trained models part from OpenVINO repo.

On that page, which contains all public pre-trained models, models for tasks such as classification, segmentation, style transfer, detection, and more… We can find out today’s target model in the “Object Detection Models” on that page.

Specifically, the one we are looking for today is the “Ultra Lightweight Face Detection RFB 320”, as shown in the figure below.

Following the documentation for this model we’re going to see something like this:

That model was trained on the WIDER FACE dataset, which is a challenging dataset, so we can expect a robust model in the wild scenes. Furthermore, it’s nice to look at the number of parameters (0.28 M) and the number of float points operations (0.17 GFLOPS), which confirms the model is really “ultra-lightweight”.

Okay, but, how do I use that??

I strongly recommend you to create a Python virtual env. using whatever you want, in my case I’m using a Miniconda env.

Installing the openvino dev lib

$ pip install openvino-dev

After that you need to download the pre-trained model from the OpenVINOs repository, you can do that as following

$ omz_downloader --name ultra-lightweight-face-detection-rfb-320 --output_dir model

That command line will download a .onnx file, so now you can call a program to convert this file to openvino’s intermediate representation using:

$ omz_converter --name ultra-lightweight-face-detection-rfb-320 --download_dir model --output_dir model --precision=FP16

After those steps, you should have the following two files under the model folder

Almost done… Now we can go into a bit of Python code of how to use this in our program. To let things more organized and easy to plug and play the face detector on any application you want, I’m going to share a way to use this pre-trained model converted to OpenVINO IR as a Python object.

Here is a FaceDetector class that can use this model. Note that we can easily take advantage of this structure to adapt and use it in different models.

from openvino.runtime import Core
import numpy as np
import cv2

from src import utils

class FaceDetector:
def __init__(self,
model,
confidence_thr=0.5,
overlap_thr=0.7):
# load and compile the model
core = Core()
model = core.read_model(model=model)
compiled_model = core.compile_model(model=model)
self.model = compiled_model

# 'Cause that model has more than one output,
# We are saving the names in a more human frendly
# variable to remember later how to recover the output we wish
# In our case here, is a output for hte bbox and other for the score
# /confidence. Have a look at the openvino documentation for more i
self.output_scores_layer = self.model.output(0)
self.output_boxes_layer = self.model.output(1)
# confidence threshold
self.confidence_thr = confidence_thr
# threshold for the nonmaximum suppression
self.overlap_thr = overlap_thr

def preprocess(self, image):
"""
input image is a numpy array image representation,
in the BGR format of any shape.
"""
# resize to match the expected by the model
input_image = cv2.resize(image, dsize=[320,240])
# changing from [H, W, C] to [C, H, W]. "channels first"
input_image = np.expand_dims(input_image.transpose(2,0,1), axis=0)
return input_image

def posprocess(self, pred_scores, pred_boxes, image_shape):
# get all predictions with more than confidence_thr of confidence
filtered_indexes = np.argwhere( pred_scores[0,:,1] > self.confidence_thr ).tolist()
filtered_boxes = pred_boxes[0,filtered_indexes,:]
filtered_scores = pred_scores[0,filtered_indexes,1]

if len(filtered_scores) == 0:
return [],[]

# convert all boxes to image coordinates
h, w = image_shape
def _convert_bbox_format(*args):
bbox = args[0]
x_min, y_min, x_max, y_max = bbox
x_min = int(w*x_min)
y_min = int(h*y_min)
x_max = int(w*x_max)
y_max = int(h*y_max)
return x_min, y_min, x_max, y_max

bboxes_image_coord = np.apply_along_axis(_convert_bbox_format, axis = 2, arr=filtered_boxes)

# apply non-maximum supressions
bboxes_image_coord, indexes = utils.non_max_suppression(bboxes_image_coord.reshape([-1,4]),
overlapThresh=self.overlap_thr)
filtered_scores = filtered_scores[indexes]
return bboxes_image_coord, filtered_scores

def draw_bboxes(self, image, bboxes, color=[0,255,0]):
# Just for visualization
# draw all bboxes on the input image
for boxe in bboxes:
x_min, y_min, x_max, y_max = boxe
pt1 = (x_min, y_min)
pt2 = (x_max, y_max)
cv2.rectangle(image, pt1, pt2, color=color, thickness=2, lineType=cv2.LINE_4)#BGR

def inference(self, image):
input_image = self.preprocess(image)
# inference
pred_scores = self.model( [input_image] )[self.output_scores_layer]
pred_boxes = self.model( [input_image] )[self.output_boxes_layer]

image_shape = image.shape[:2]
faces, scores = self.posprocess(pred_scores, pred_boxes, image_shape)
return faces, scores

Basically there are four main methods:

  • __init__: where I’m loading the openvino compiled model, and saving a few parameters to use later, such as minimum confidence threshold and overlap threshold. for the non-maximum suppression algo.
  • preprocess: Method responsible to make sure the raw BRG image will match the requirements to be fed into the loaded model;
  • inference: The method that we are going to use most, basically is where things get all together, preprocess first, raw model inference, and then the postprocessing step;
  • posprocessing: Aimed to convert the raw output to a more usual format, such as the top-left corner in pixel coord. for the bbox part, and also apply non-maximum suppression to remove redundancies bounding boxes predictions.

Here’s an example where I’m running the full demo script on a generic video. I could achieve almost 100 FPS using only the CPU (it goes a little bit more with I turn off the recording program).

Feel free to get the full code on my Github.

--

--

Luís Condados

A Computer Engineer with a background in robotics, computer vision, and deep learning.