來源:https://www.pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/
原文作者有提供原始碼,索取需要購買訂閱專案。
## 前言
今天這個貼文會使用到
* opencv
* python
* deep learning
深度學習式的人臉識別有(1)高準確度(2)快速這兩個特點,可以用在靜態影像與動態影像。
## 用 opencv, python, deep learning 做人臉識別
在這個教程裡,你們會學到如何用 opencv, python, deep learning 做人臉識別。
我們會簡單地討論深度學習式的人臉識別如何運作,包含 "deep metric learning" 的概念。
接下來會安裝所需要的函式庫。
最後會實作適用於靜態影像與動態影像的人臉識別。
最後可發現,我們實作的人臉識別有即時處理的能力。
## 了解深度學習式人臉識別
所以,深度學習與人臉識別是如何做到的?
秘密是 "deep metric learning" 的技術。
如果你曾用過其他的深度學習的技術,一般的做法是:
* 接受一個影像
* 輸出一個 分類/標籤 給那個影像
然而,deep metric learning 不一樣。
deep metric learning 會輸出一個實數的特徵向量。
dlib 這個臉部識別網路,會輸出 128-d 的特徵向量(也就是一串數字有 128 個),該特徵向量就是用來數量化臉部特徵。訓練這個網路使用名叫 triplets 的方式來達成:
* 找三張照片,A人有兩張,
* B人有一張,調整權重讓 B人之間的兩張照片的特徵向量比較近,A人與 B人之間的特徵向量比較遠。
套用到實際例子,有三張照片,一張是 Chad Smith,兩張是 Will Ferrell。
我們的網路會數量化這些臉,為每個臉建立出 128-d 的特徵向量( embedding、quantification)
接下來,一般的想法是調整我們神經網路的權重,讓兩張 Will Ferrell 比較靠近,與 Chad Smith 比較遠。
我們的人臉識別的網路架構是取自 ResNet-34,來自 [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) 作者是 He et al.,但是層數比較少,filter 也減半。
網路是由 [Davis King](https://www.pyimagesearch.com/2017/03/13/an-interview-with-davis-king-creator-of-the-dlib-toolkit/) 所訓練,他的資料集約有 3百萬張影像,在 [Labeled Faces in the Wild](http://vis-www.cs.umass.edu/lfw/) 相較於其他現代手法有達到 99.38% 的準確度。
Davis King ([dlib](http://dlib.net/)作者) 與 [Adam Geitgey](https://www.adamgeitgey.com/) ([face_recognition](https://github.com/ageitgey/face_recognition)作者,此模組我們待會會用到) 兩人有詳細文章說明深度學習式的人臉識別的作法。
* [High Quality Face Recognition with Deep Metric Learning](http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html) (Davis)
* [Modern Face Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78) (Adam)
非常推薦你們閱讀上述兩篇文章。
## 安裝人臉識別函式庫
除了 python 與 opencv 之外,還需要兩個函式庫
* [dlib](http://dlib.net/)
* [face_recognition](https://github.com/ageitgey/face_recognition)
dlib 由 [Davis King](https://www.pyimagesearch.com/2017/03/13/an-interview-with-davis-king-creator-of-the-dlib-toolkit/) 維護,包含我們人臉識別工作所需要的 "deep metric learning" 的實作。
face_recognition 由 [Adam Geitgey](https://www.adamgeitgey.com/) 所創,包裝了 dlib 的人臉識別的功能,讓它更方便使用。
我假設你已經裝了 opencv,如果沒有,我的文章 [OpenCV install tutorials](https://www.pyimagesearch.com/opencv-tutorials-resources-guides/) 有介紹。
接下來,來安裝 dlib 與 face_recognition 吧。
> 原文作者非常建議使用 `virtualenv` 加上 `virtualenvwrapper`,以免有 package 污染的問題。
### 安裝 dlib
> 有可能需要安裝 cmake,這個也可以用 `pip install cmake` 安裝
> 現在新的安裝包會自動看環境內有沒有足夠的函式庫,若有就會自己編譯成支援 GPU 的版本。
> Nvidia GPU 需要的有 CUDA Development Tools 與 cuDNN Library(這個要註冊 nvidia 開發者帳號,只要 email 即可申請)
使用 pip 安裝
`pip install dlib`
結束 (時代進步真方便)
### 安裝 face_recognition
使用 pip 安裝
`pip install face_recogntition`
結束
### 安裝 imutils
[imutils](https://github.com/jrosebr1/imutils)這個是方便包,一些 opencv 的組合招式都打包成函式供人取用,原文作者推薦。
使用 pip 安裝
`pip install imutils`
## 我們的人臉識別資料集
因為 Jurassic Park (1993) 是我最喜愛的電影,為了致敬 Jurassic World: Fallen Kingdom (2018) 在美國上映,我們將人臉識別用在這電影的幾個角色上:
* Alan Grant
* Claire Dearing
* Ellie Sattler
* Ian Malcolm
* John Hammond
* Owen Grady
資料集可以在 30 分鐘內使用我的方法建構。參閱 [How to (quickly) build a deep learning image dataset](https://pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-learning-image-dataset/)。
有了這資料集,我們將會:
* 建立每個臉的 128-d 特徵向量
* 用這些特徵向量從靜態影像與動態影像中識別出角色們的臉
## 人臉識別專案架構
```
.
├── dataset
│ ├── alan_grant [22 entries]
│ ├── claire_dearing [53 entries]
│ ├── ellie_sattler [31 entries]
│ ├── ian_malcolm [41 entries]
│ ├── john_hammond [36 entries]
│ └── owen_grady [35 entries]
├── examples
│ ├── example_01.png
│ ├── example_02.png
│ └── example_03.png
├── output
│ └── lunch_scene_output.avi
├── videos
│ └── lunch_scene.mp4
├── search_bing_api.py
├── encode_faces.py
├── recognize_faces_image.py
├── recognize_faces_video.py
├── recognize_faces_video_file.py
└── encodings.pickle
```
我們專案有 4 個上層目錄:
* dataset/: 包含六個角色的臉的影像,依據名字放置
* examples/: 三個人臉影像,不在 dataset 裡,用來測試。
* output/: 這裡會存放處理後的人臉識別的動態影像
* videos/: 輸入動態影像會放在這裡。
我們也有六個檔案放在根目錄:
* search_bing_api.py: 第一步是建立 dataset,(原文作者已經寫好程式,直接執行即可)。要學如何使用 Bing API 建立資料集,參閱:[這貼文](https://pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-learning-image-dataset/)
* encode_faces.py:用來將人臉編碼成特徵向量。
* recognize_faces_image.py:識別靜態影像中的人臉(依據你的資料集的人臉特徵向量)。
* recognize_faces_video.py:識別來自 webcam 的動態影像中的人臉,並輸出成動態影像。
* recognize_faces_video_file.py:識別來自硬碟的動態影像中的人臉,並輸出成動態影像。但今天不會討論這個,因為其骨架跟 video stream file 一樣。
* encodings.pickle:人臉識別編碼,由 encode_faces.py 處理你的資料集後產生,並序列化到硬碟之中。
在建立完資料集後,我們會使用 encode_faces.py 建立特徵向量。
## 使用 opencv 與 深度學習 建立人臉特徵向量
在我們識別人臉之前,我們首先需要將人臉編碼。這裡並沒有真的訓練識別的網路,而是使用 dlib 已經訓練好的模型。
我們當然可以自己從頭開始訓練自己的模型,或調整已存在的模型。但在這個專案來說太超過了。從頭訓練需要許多的影像。
然後,在分類時,我們可以使用簡單 k-NN 模型加上投票的方式做出人臉分類。其他傳統機器學習模型也有這樣用。
### 建立臉部特徵模型,使用 encode_faces.py。
```
# import the necessary packages
from imutils import paths
import face_recognition
import argparse
import pickle
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--dataset", required=True,
help="path to input directory of faces + images")
ap.add_argument("-e", "--encodings", required=True,
help="path to serialized db of facial encodings")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())
# grab the paths to the input images in our dataset
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []
# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
# extract the person name from the image path
print("[INFO] processing image {}/{}".format(i + 1,
len(imagePaths)))
name = imagePath.split(os.path.sep)[-2]
# load the input image and convert it from BGR (OpenCV ordering)
# to dlib ordering (RGB)
image = cv2.imread(imagePath)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes
# corresponding to each face in the input image
boxes = face_recognition.face_locations(rgb,
model=args["detection_method"])
# compute the facial embedding for the face
encodings = face_recognition.face_encodings(rgb, boxes)
# loop over the encodings
for encoding in encodings:
# add each encoding + name to our set of known names and
# encodings
knownEncodings.append(encoding)
knownNames.append(name)
# dump the facial encodings + names to disk
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
f = open(args["encodings"], "wb")
f.write(pickle.dumps(data))
f.close()
```
### 從靜態影像中識別出角色
recognize_faces_image.py
```
# import the necessary packages
import face_recognition
import argparse
import pickle
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-e", "--encodings", required=True,
help="path to serialized db of facial encodings")
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(args["encodings"], "rb").read())
# load the input image and convert it from BGR to RGB
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb,
model=args["detection_method"])
encodings = face_recognition.face_encodings(rgb, boxes)
# initialize the list of names for each face detected
names = []
# loop over the facial embeddings
for encoding in encodings:
# attempt to match each face in the input image to our known
# encodings
matches = face_recognition.compare_faces(data["encodings"],
encoding)
name = "Unknown"
# check to see if we have found a match
if True in matches:
# find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# determine the recognized face with the largest number of
# votes (note: in the event of an unlikely tie Python will
# select first entry in the dictionary)
name = max(counts, key=counts.get)
# update the list of names
names.append(name)
# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
# draw the predicted face name on the image
cv2.rectangle(image, (left, top), (right, bottom), (0, 255, 0), 2)
y = top - 15 if top - 15 > 15 else top + 15
cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (0, 255, 0), 2)
# show the output image
cv2.imshow("Image", image)
cv2.waitKey(0)
```
### 從 webcam 識別出角色
recognize_faces_video.py
```
# import the necessary packages
from imutils.video import VideoStream
import face_recognition
import argparse
import imutils
import pickle
import time
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-e", "--encodings", required=True,
help="path to serialized db of facial encodings")
ap.add_argument("-o", "--output", type=str,
help="path to output video")
ap.add_argument("-y", "--display", type=int, default=1,
help="whether or not to display output frame to screen")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(args["encodings"], "rb").read())
# initialize the video stream and pointer to output video file, then
# allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
writer = None
time.sleep(2.0)
# loop over frames from the video file stream
while True:
# grab the frame from the threaded video stream
frame = vs.read()
# convert the input frame from BGR to RGB then resize it to have
# a width of 750px (to speedup processing)
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
rgb = imutils.resize(frame, width=750)
r = frame.shape[1] / float(rgb.shape[1])
# detect the (x, y)-coordinates of the bounding boxes
# corresponding to each face in the input frame, then compute
# the facial embeddings for each face
boxes = face_recognition.face_locations(rgb,
model=args["detection_method"])
encodings = face_recognition.face_encodings(rgb, boxes)
names = []
# loop over the facial embeddings
for encoding in encodings:
# attempt to match each face in the input image to our known
# encodings
matches = face_recognition.compare_faces(data["encodings"],
encoding)
name = "Unknown"
# check to see if we have found a match
if True in matches:
# find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# determine the recognized face with the largest number
# of votes (note: in the event of an unlikely tie Python
# will select first entry in the dictionary)
name = max(counts, key=counts.get)
# update the list of names
names.append(name)
# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
# rescale the face coordinates
top = int(top * r)
right = int(right * r)
bottom = int(bottom * r)
left = int(left * r)
# draw the predicted face name on the image
cv2.rectangle(frame, (left, top), (right, bottom),
(0, 255, 0), 2)
y = top - 15 if top - 15 > 15 else top + 15
cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (0, 255, 0), 2)
# if the video writer is None *AND* we are supposed to write
# the output video to disk initialize the writer
if writer is None and args["output"] is not None:
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(args["output"], fourcc, 20,
(frame.shape[1], frame.shape[0]), True)
# if the writer is not None, write the frame with recognized
# faces to disk
if writer is not None:
writer.write(frame)
# check to see if we are supposed to display the output frame to
# the screen
if args["display"] > 0:
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
# check to see if the video writer point needs to be released
if writer is not None:
writer.release()
```
### 從影像檔中識別出角色
先前提過,recognize_faces_video_file.py 基本上跟前一個程式一模一樣,差別只在影像來源是影像檔而不是 webcam。
## 能否在樹莓派執行這些程式?
基本上可以,但是有些限制
1. 樹莓派記憶體不夠使用 CNN-based 臉部偵測
2. 所以只能用 HOG 臉部偵測
3. HOG 在樹莓派上太慢,無法勝任即時臉部偵測
4. 所以需要使用 opencv haar cascades
(譯註:我的電腦 16G 也沒辦法做 CNN-based 臉部偵測)
在樹莓派上的速度約是 1-2 FPS。好消息時之後我會回來討論如何在樹莓派上執行這些程式,敬請期待。
## 結論
在這教程,你們學到了如何使用 opencv, python, deep learning 執行人臉識別。
我們利用了 Davis King 的 dlib 與 Adam Geitgey 的 face_recognition,讓實作更方便。
我們也看到,這裡提出的程式在準確度與有 GPU 的情況下即時運算的能力皆有達到水準。
希望你們喜歡這則人臉識別的貼文。
To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!