At this moment you have explored your webcam, you understand how to load the drivers, adjust the resolutions and other settings on your camera, you are aware of the type of formats (encodes) that are supported, and you have idea how Video4Linux works.
Now it is time to start exercising some applications created using OpenCV.
As mentioned in the beginning of this chapter, the topic of OpenCV is worthy of a whole book, and there are several books available with this purpose. The idea here is to learn what is possible with Intel Galileo and OpenCV, to compare performance between C++ and Python, and to identify when a problem is related to OpenCV or if it is related to the wrong settings in Video4Linux.
Note
The examples demonstrated in this chapter are in C++ and Python. OpenCV also supports C, which is not explored here. This is because the C++ interface created for OpenCV is simpler than the C language interface, which requires you to manage memory allocations.
Building Programs with OpenCV
To build programs that run OpenCV, you must follow the same process you followed when compiling the program for Video4Linux in the previous sections. In order words, it is necessary to set the toolchain and run the proper command line.
The procedure is the same as the one outlined in the “Building and Transferring the Video Capture Program” section of this chapter, except the command line changes a little bit because the programs are written in C++ instead C, and it is necessary to invoke the OpenCV libs instead of the V4L2.
For example, to build the program used in the next section listed as Listing 7-1 and named opencv_capimage.cpp, use the following line:
${CXX} -O2 'pkg-config --cflags --libs opencv' opencv_capimage.cpp -o opencv_capimage
${CXX} invokes the C++ compiler (g++) of the toolchain and pkg-config invokes the opencv libs.
Once this compiles, you just transfer the program to Intel Galileo if the toolchain is not installed directly on your Intel Galileo.
Capturing an Image with OpenCV
Capturing an image using OpenCV is very simple because all the complexity is abstracted by OpenCV, which uses V4L2 as a baseline.
Figure 7-7 shows the flowchart used to capture images and videos and process the images.
Listing 7-1 shows an example of how to capture an image and store it in the file system as a JPEG file.
Listing 7-1. opencv_capimage.cpp
#include <opencv2/opencv.hpp>
using namespace cv;
using namespace std;
int main()
{
VideoCapture cap(-1);
//check if the file was opened properly
if(!cap.isOpened())
{
cout << "Webcam could not be opened succesfully" << endl;
exit(-1);
}
else
{
cout << "p n" << endl;
}
int w =
960
;
int h =
544
;
cap.set(CV_CAP_PROP_FRAME_WIDTH, w);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, h);
Mat frame;
cap >>frame;
imwrite("opencv.jpg", frame);
cap.release();
return 0;
}
Reviewing opencv_capimage.cpp
This first example uses a few objects based in the following classes. VideoCapture is used to create the capture objects that open and configure the devices, capture images and videos, and release the devices when they are not in use any more. Mat receives frames read and works with some algorithms to process the images. It can apply filters, change colors, and transform the images according to mathematical and statistical algorithms. In the next example, Mat is used only to read the image. However, in the next couple of examples, Mat will be used to process images as well.
To understand the code in Listing 7-1, you need a quick overview of each class used.
VideoCapture:: VideoCapture
The first thing to do is to use the VideoCapture class to create a video capture object and open the device or some video stored in the files system.
For more information regarding theVideoCapture class, see
http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html
.
In the case of the webcam, you will create the object with the parameter -1 in the constructor, as follows:
VideoCapture cap(-1);
The value -1 means, “open the current device enumerated in the system,” so if you have the camera enumerated as /dev/video0 or /dev/video1, the webcam will be opened anyway. Otherwise, if you want to be specific regarding which device to open, you have to pass to the constructor the index of the enumerated device. For example, to open the device /dev/video0, you must pass the number 0 to the constructor like this:
VideoCapture cap(0);
If you’re using Intel Galileo and one camera, I recommend you use -1 to avoid problems with camera enumeration indexes versus the hardcoded number you use in the constructor.
VideoCapture::isOpened()
You can check if the webcam was opened and initiated with success by invoking theisOpened() method. It returns a Boolean as true if the webcam was opened and false if not.
VideoCapture::set(const int prop, int value)
This method sets a property (prop) to a specific value (value). You can set the image’s width, height, frames per second, and several other properties. In the code example, the video width and height are set to 960x544:
int w =
960
;
int h =
544
;
cap.set(CV_CAP_PROP_FRAME_WIDTH, w);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, h);
For more information about the properties supported, visit
http://nullege.com/codes/search/opencv.highgui.CV_CAP_PROP_FPS
.
VideoCapture::read(Mat & image) or operator >> (Mat & image)
This method reads the image from the device. It grabs the image in one single call. The return is aMat object that is explained shortly.
This example uses the operator >>:
Mat frame;
cap >>frame;
VideoCapture::release( )
Once the video is captured, if the destructor of the object is not called, you must release the camera by invoking therelease() method.
cap.release();
At a glance, you can see how simple this is, compared to the software used when we were focusing on Video4Linux.
cv::Mat::Mat
Mat is an awesome class used for matrix operations and it is constantly used in OpenCV applications. Mat is used to organize images in the format of matrixes responsible for saving details of each pixel, including color intensity, position in the image, image dimension, and so on.
The Mat class is organized into two parts—one part contains the image headers with generic information about the image and the second part contains the sequence of bytes representing the image.
In the code example, Mat is called only as Mat instead of as cv::Mat because the namespace was defined in the beginning of the code:
using namespace cv;
Also, in the code example, there is a Mat object created with the simple constructors available in the class:
Mat frame;
In the next examples, other methods will be used and properly discussed. For now, keep in mind what the Mat class is for and this simple constructor.
For more details regarding the Mat class, visit
http://docs.opencv.org/modules/core/doc/basic_structures.html#mat-mat
. The tutorial maintained by docs.opencv.org is also recommended at
http://docs.opencv.org/doc/tutorials/core/mat_the_basic_image_container/mat_the_basic_image_container.html
.
cv::imwrite( const string& filename, InputArray img, const vector<int>& params=vector<int>())
This method saves an image to the file system. In the code example, the file is opencv.jpg, the input array is intrinsically casted as my Mat class with the object frame, and the optional vector of theparams argument is omitted.
Mat frame;
cap >>frame;
imwrite("opencv.jpg", frame);
In this case, with the omission of the vector of the params argument, the encode used to capture the image is based on the file extension .jpg. Remember that the camera does not support capturing images in the JPEG format. It captures streaming in Motion JPEG but JPEG is not extracted from Motion JPEG because there is segment called DHT that’s not present in this stream (check out
http://www.digitalpreservation.gov/formats/fdd/fdd000063.shtml
). You can extract a series of JPEG images using ffmpeg from a Motion JPEG streaming file, but they will not be viewable in any image software due to the DHT segment missing.
In other words, when the file extension is specified and not supported by the webcam, the OpenCV framework converts the file.
The extensions supported besides JPEG are PNG, PPM, PGM, and PBM.
The docs.opencv.org site maintains a nice tutorial about how to load, modify, and save an image at
http://docs.opencv.org/doc/tutorials/introduction/load_save_image/load_save_image.html
.
Running opencv_capimage.cpp
Compile the code and transfer the file to Intel Galileo. Make sure the uvcvideo driver is loaded and the webcam is connected to the USB port (read the section called “Connecting the Webcam” in this chapter). Finally, smile at your webcam and run the software:
root@clanton:∼#
./opencv_capimage
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
Webcam is OK! I found it!
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
You should have a file named opencv.jpg in the same folder. Now, you might asking what the VIDIOC_QUERYMENU: Invalid argument message mean. Such messages are not related to OpenCV and there is nothing wrong with the code. It is simply OpenCV using the Video4Linux framework to understand the capabilities and controls offered by the webcam. When some control or capability is not offered, V4L informs you with these warning messages.
If you do not want to see these messages, you can redirect them using a stderr stream to a null device. For example:
root@clanton:∼#
./opencv_capimage 2> /dev/null
Webcam is OK! I found it!
The Same Software Written in Python
You can use Python with OpenCV because the Python Opencv development packages are part of the BSP SD card images introduced in this chapter.
The program in Listing 7-1 can easily be converted to Python, as demonstrated by Listing 7-2.
Listing 7-2. opencv_capimage.py
import cv2
import cv
import sys
cap = cv2.VideoCapture(-1)
w, h =
960, 544
cap.set(cv.CV_CAP_PROP_FRAME_WIDTH, w)
cap.set(cv.CV_CAP_PROP_FRAME_HEIGHT, h)
if not cap.isOpened():
print "Webcam could not be opened successfully"
sys.exit(-1)
else:
print "Webcam is OK! I found it!"
ret, frame = cap.read()
cv2.imwrite('pythontest.jpg', frame)
cap.release()
As you can see, the objects are quite the same. To run the software, transfer to Intel Galileo board and run the following in the terminal shell:
root@clanton:∼#
python opencv_capimage.py 2> /dev/null
Webcam is OK! I found it!
However, the examples in this chapter are written in C++. This is because code written in C++ runs significantly faster than the same code written in Python.
Performance of OpenCV C++ versus OpenCV Python
To check for performance issues, suppose you have the Python program shown in Listing 7-2 and the C++ program shown in Listing 7-1 properly installed on Intel Galileo. You can measure performance using the bash terminal with the command date +%s, which returns the number of seconds passed since 00:00:00 1970-01-01 UTC. Execute the program and evaluate the time difference.
First, run the Python program with the following command:
root@clanton:∼#
s=$(date +%s);python opencv_capimage.py; echo $(expr 'date +%s' - $s)
Webcam is OK! I found it!
8
Python took eight seconds to take the picture. Do the same thing with the C++ program:
root@clanton:∼#
s=$(date +%s);./opencv_capimage 2> /dev/null; echo $(expr 'date +%s' - $s)
Webcam is OK! I found it!
4
The same program written in C++ took only four seconds. Even the programs running in the userspace context suffer some time execution variation because it’s not a real-time system. The OpenCV applications created in C++ are much faster than the same applications running in Python.
Processing Images
In the previous section, you captured images from the webcam and saved them into the files system, but no image processing was done. The next examples explore some of the infinite possibilities of image processing using OpenCV. Some of them use a huge algorithm in the background and it is not in the scope of this book to discuss the details of each one. However, references are included for more information.
Detecting Edges
For the first example of image processing, you’ll learn about how to detect images using the Canny edge algorithm developed by John F. Canny in 1986.
OpenCV has a function called Canny() that implements such an algorithm. For details about this algorithm, see
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html
.
With a few changes to Listing 7-1, the Canny algorithm is shown in Listing 7-3.
Listing 7-3. opencv_capimage_canny.cpp
#include <opencv2/opencv.hpp>
using namespace cv;
using namespace std;
int main()
{
VideoCapture cap(-1);
//check if the file was opened properly
if(!cap.isOpened())
{
cout << "Webcam could not be opened succesfully" << endl;
exit(-1);
}
else
{
cout << "Webcam is OK! I found it!\n" << endl;
}
int w = 960;
int h = 544;
cap.set(CV_CAP_PROP_FRAME_WIDTH, w);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, h);
Mat frame;
cap >>frame;
// converts the image to grayscale
Mat frame_in_gray;
cvtColor(frame, frame_in_gray, CV_BGR2GRAY);
// process the Canny algorithm
cout << "processing image with Canny..." << endl;
int threshold1 = 0;
int threshold2 = 28;
Canny(frame_in_gray, frame_in_gray, threshold1, threshold1);
// saving the images in the files system
cout << "Saving the images..." << endl;
imwrite("captured.jpg", frame);
imwrite("captured_with_edges.jpg", frame_in_gray);
// release the camera
cap.release();
return 0;
}
Reviewing opencv_capimage_canny.cpp
In this example the following is changed in Listing 7-3:
-
1.
A new static method called cvtColor() is added.
-
2.
The Canny() function is used for image processing.
The image originally captured by the camera and the image processed with Canny algorithm are both stored in the file system as captured.jpg and captured_with_edges.jpg using the imwrite() function explained previously.
void cv::cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0)
Converts the image space color to another one. In the following code example:
Mat frame_in_gray;
cvtColor(frame, frame_in_gray, CV_BGR2GRAY);
The input image was the one captured by the webcam and stored in the Mat object frame. The frame_in_gray object was created to receive the image converted in gray space color as requested by code CV_BGR2GRAY.
For more detail about thecvtColor() function and color in general, visit
http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#cvtcolor
.
void cv::Canny(InputArray image, OutputArray edges, double threshold1, double threshold2, int apertureSize=3, bool L2gradient=false)
The Canny function takes the image input array as a source image and transforms the edges into sharp ones. It stores the output array in edges. The input and output image in the example is the same object (frame_in_gray); for best effect, a grayscale image is used.
TheapertureSize argument is the size of the Sobel operator used in the algorithm (see
http://en.wikipedia.org/wiki/Sobel_operator
for more details) and the code keeps the default value of 3.
The L2gradient argument is a Boolean; when it’s true, the image gradient magnitude is used and when it’s false, only the normative equation is considered. This example used the default value of false.
Two hysteresis thresholds are represented by the arguments threshold1 and threshold2 and the values 0 and 28 were used, respectively. These values are based on my experiments with changing these values until I got results I considered good. You can change these values and check the effects you get.
int threshold1 = 0;
int threshold2 = 28;
Canny(frame_in_gray, frame_in_gray, threshold1, threshold1);
The official documentation about Canvas function is found on this link “
http://docs.opencv.org/modules/imgproc/doc/feature_detection.html?highlight=canny#canny
.”
Running opencv_capimage_canny.cpp
Compile the code and transfer the file to Intel Galileo. Make sure the uvcvideo driver is loaded and the webcam is connected to the USB port (read the section entitled “Connecting the Webcam” in this chapter). Point your webcam to some object rich in edges, like the image shown in Figures 7-8 and 7-9.
root@clanton:∼#
./opencv_capimage_canny 2> /dev/null
Webcam is OK! I found it!
processing image with Canny...
Saving the images...
You should have two images stored in the file system as captured.jpg and captured_with_edges.jpg.
Face and Eyes Detection
This next example detects multiples faces and eyes in a picture captured using the webcam. The class used to detect the faces and eyes is named CascadeClassifier.
The basic concept is that this class loads some XML files that use the classifier model. In the code, two files are loaded—called haarcascade_frontalface_alt.
xml and haarcascade_eye.
xml—during the creation of the CascadeClassifier objects. Each file brings a series of models that defines how specific objects are represented in the image based on the sum of intensity of pixels in a series of rectangles. The difference of these sums is evaluated in the image. Both files have characteristics about faces and eyes from an image and the class CascadeClassifier can determine the detections when the method detectMultiScale() is invoked.
For more information related to CascadeClassifier(), visit
http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=cascadeclassifier#cascadeclassifier
.
Also read “Global Haar-Like Features: A New Extension of Classic Haar Features for Efficient Face Detection in Noisy Images,” 6th Pacific-Rim. Symposium on Image and Video Technology, PSIVT 2013,” by Mahdi Rezaei, Hossein Ziaei Nafchi, and Sandino Morales.
When a face is detected a rectangle will be drawn around the face and when the eyes are detected circles will be drawn around the eyes. These drawing are done using very basic draw functions in OpenCV called rectangle() and circle().
Listing 7-4 shows the code for this example.
Listing 7-4. opencv_face_and_eyes_detection.cpp
#include <opencv2/opencv.hpp>
#include "opencv2/core/core.hpp"
using namespace cv;
using namespace std;
String face_cascade_name = "haarcascade_frontalface_alt.xml";
String eye_cascade_name = "haarcascade_eye.xml";
void faceDetect(Mat img);
CascadeClassifier face_cascade;
CascadeClassifier eyes_cascade;
using namespace cv;
using namespace std;
int main(int argc, const char *argv[])
{
if( !face_cascade.load( face_cascade_name ) )
{
cout << face_cascade_name << " not found!! aborting..." << endl;
exit(-1);
};
if( !eyes_cascade.load( eye_cascade_name ) )
{
cout << eye_cascade_name << " not found!! aborting..." << endl;
exit(-1);
};
// 0 is the ID of the built-in laptop camera, change if you want to use other camera
VideoCapture cap(-1);
//check if the file was opened properly
if(!cap.isOpened())
{
cout << "Capture could not be opened succesfully" << endl;
return -1;
}
else
{
cout << "camera is ok\n" << endl;
}
int w = 432;
int h = 240;
cap.set(CV_CAP_PROP_FRAME_WIDTH, w);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, h);
Mat frame;
cap >>frame;
cout << "processing the image...." << endl;
faceDetect(frame);
imwrite("face_and_eyes.jpg", frame);
// release the camera
cap.release();
cout << "done!" << endl;
return 0;
}
void faceDetect(Mat img)
{
std::vector<Rect> faces;
std::vector<Rect> eyes;
bool two_eyes = false;
bool any_eye_detected = false;
//detecting faces
face_cascade.detectMultiScale( img, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
if (faces.size() == 0)
{
cout << "Try again.. I did not dectected any faces..." << endl;
return;
}
// it is possible to face more than one human face in the image
for( size_t i = 0; i < faces.size(); i++ )
{
// rectangle in the face
rectangle( img, faces[i], Scalar( 255, 100, 0 ), 4, 8, 0 );
Mat frame_gray;
cvtColor( img, frame_gray, CV_BGR2GRAY );
// croping only the face in region defined by faces[i]
std::vector<Rect> eyes;
Mat faceROI = frame_gray( faces[i] );
// In each face, detect eyes
eyes_cascade.detectMultiScale( faceROI, eyes, 1.1, 2, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30) );
for( size_t j = 0; j < eyes.size(); j++ )
{
Point center( faces[i].x + eyes[j].x + eyes[j].width*0.5, faces[i].y + eyes[j].y + eyes[j].height*0.5 );
int radius = cvRound( (eyes[j].width + eyes[j].height)*0.25 );
circle( img, center, radius, Scalar( 255, 0, 0 ), 4, 8, 0 );
}
}
}
Reviewing opencv_face_and_eyes_detection.cpp
In this example there are a few new components:
-
Introduction of the CascadeClassifier class.
-
Usage of the Point class
-
The cvRound() function
-
Usage of the rectangle() and circle() functions
-
The Rect class and vectors
The following sections provide an explanation of each item used in the code.
cv::CascadeClassifier::CascadeClassifier( )
Creates theCascadeClassifier object. In the example code, two objects are created, one to detect the face and the other to detect the eyes.
CascadeClassifier face_cascade;
CascadeClassifier eyes_cascade;
cv::CascadeClassifier::load(const string & filename)
Loads the file with the classifier to the object. In the code, two classifiers were used, one to detect the face and the other to detect the eyes.
if( !face_cascade.load( face_cascade_name ) )
{
cout << face_cascade_name << " not found!! aborting..." << endl;
exit(-1);
};
if( !eyes_cascade.load( eye_cascade_name ) )
{
cout << eye_cascade_name << " not found!! aborting..." << endl;
exit(-1);
};
void cv::CascadeClassifier::detectMultiScale(const Mat& image, vector<Rect>& objects, double scaleFactor=1.1, int minNeighbors=3, int flags=0, SizeminSize=Size(), Size maxSize=Size())
ThedetectMultiScale() method is where the magic happens in terms of detections. A description of each argument follows:
-
image is the image source.
-
vector<Rect>& objects is a vector of rectangles and is where the object detects are in the image.
-
scaleFactor is a factor that determine if the image is reduced.
-
minNeighbors determines how many neighbors each candidate rectangle has. If 0 is passed, there is a risk of other objects in the image being detected incorrectly, which results in false positives in the detection. For example, if you have a clock on your wall it might be detected as a face (a false positive). During my practical experiments, specifying 2 or 3 is good. More than 3 and there is a risk of losing true positives and faces not being detected properly.
-
Flags is related to the type of optimization. CV_HAAR_SCALE_IMAGE tells the algorithm to be in charge of the scaled image. This flags accepts CV_HAAR_DO_CANNY_PRUNNING, which skips flat regions, CV_HAAR_FIND_BIGGEST_OBJECT if there is interest in finding the biggest object in the image, and CV_HAAR_DO_ROUGH_SEARCH, which must be used only with CV_HAAR_FIND_BIGGEST_OBJECT lile "0|CV_HAAR_DO_ROUGH_SEARCH |CV_HAAR_FIND_BIGGEST_OBJECT".
-
SizeminSize defines the minimum object size and objects smaller than this are ignored. If it’s not defined this argument is not considered.
-
maxSize defines the maximum object size and objects bigger than this are ignored. If it’s not defined this argument is not considered.
//detecting faces
face_cascade.detectMultiScale( img, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
...
...
...
//In each face, detect eyes
eyes_cascade.detectMultiScale( faceROI, eyes, 1.1, 2, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30)
In this code, the scaling factor used is 1.1, minNeighbors is 2 (a kind of hint), the flags are optimized for performance using CV_HAAR_SCALE_IMAGE, and the minimum size of the object to detect is 30x30 pixels. No maximum size is defined, so you can put your face very close to the webcam.
The code detected the faces in the image. For each face that’s detected, a rectangle is drawn delimiting the region.
// rectangle in the face
rectangle( img, faces[i], Scalar( 255, 100, 0 ), 4, 8, 0 );
The resulting regions containing the faces detected are stored in vector<Rect> faces
. For example, faces[0] is the first face in the picture. If there is more than one person, you will have face[1], face[2], and so on. The object type Rect means rectangle so the faces vector is a group of rectangles without graphical objects. They are objects that store the initial coordinates (upper-left points) in (Rec.x,Rec.y) and the width (Rec.w) and height (Rec.h) of the rectangle in the object class.
For each region detected, a new image is created with the image content delimited by the rectangle, which forms a small area. This small area is called the ROI (Region of Interest). For best performance and to normalize the image in the eye, detection is converted to grayscale using the cvColor() function.
Mat frame_gray;
cvtColor( img, frame_gray, CV_BGR2GRAY );
// croping only the face in region defined by faces[i]
std::vector<Rect> eyes;
Mat faceROI = frame_gray( faces[i] );
In this small area that contains only the face, the cascade classifier tries to identify the eyes. For each eye detected, a circle is drawn. So the while the face uses the whole image to be detected, the eyes are detected only in the face regions. This optimizes the algorithm.
// In each face, detect eyes
eyes_cascade.detectMultiScale( faceROI, eyes, 1.1, 2, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30) );
The resultant regions containing the eyes is stored in vector<Rect> eyes.
This process is done with the for loops in this code:
for( size_t i = 0; i < faces.size(); i++ )
{
...
...
...
for( size_t j = 0; j < eyes.size(); j++ )
{
...
...
... }
}
To draw the circles around the eyes, the Point class was used. It extracts information from vector<Rect> eyes and stores the exact center of the eyes (the central coordinates):
Point center
( faces[i].x + eyes[j].x + eyes[j].width*0.5, faces[i].y + eyes[j].y + eyes[j].height*0.5 );
int radius =
cvRound
( (eyes[j].width + eyes[j].height)*0.25 );
circle( img,
center
,
radius
, Scalar( 255, 0, 0 ), 4, 8, 0 );
Thus, thePoint center object is based on the rectangle’s dimension of the current face, identified by the center point of the eye and the variable radius. Using the function cvRound(), it determines the radius to be drawn around the eyes.
With those two information, it is possible to draw a circle using the function circle().
Figure 7-10 shows this code’s sequence.
Running opencv_face_and_eyes_detection.cpp
Compile the code and transfer the file to Intel Galileo. Make sure the uvcvideo driver was loaded and the webcam is connected to the USB port (read the section called “Connecting the Webcam” in this chapter) and copy the haarcascade_frontalface_alt.xml and haarcascade_eye.xml files to the same location you transferred the executable program. Stay in front of camera and look in the direction of the lens. Then run the software:
root@clanton:∼#
./opencv_face_and_eyes_detection 2> /dev/null
camera is ok
processing the image....
done!
A image named face_and_eyes.jpg is created in the files system wilt all faces and eyes detected, as shown in Figure 7-11.
Emotions Classification
The methods shown in this section and some of the scripts are based on the work of Phillip Wagner in article “Gender Classification with OpenCV,” which you can find at
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/tutorial/facerec_gender_classification.html
. Phillip Wagner kindly granted permission for the code adaptation and the techniques explored, keeping all the code under the BSD licenses as his original work in this book.
The original code was changed in order to:
-
Run on Intel Galileo and classify emotions instead of genders.
-
Use faces and eyes detection directly from the images captured by the webcam.
-
Crop the images dynamically based on human anatomy.
The emotions classifications in this example are divided into three categories:
The idea is that you take pictures with the webcam, and Intel Galileo will try to describe your emotional state.
You need to create a database with images of you showing each emotional state. This database will contain images that use specific algorithms explained later. These images are used as references to allow Intel Galileo to determine your emotions while you look at the webcam through a model named fisherface.
The database in this chapter is based on my face, but there are instructions for recreating the database based on your face. If you run the program using this section, there is a remote chance that it will recognize your emotions (if you are lucky enough to look like me). Okay, if you look like me, you are not necessarily lucky (ha ha).
Preparing the Desktop
You need to create a database with a few pictures of you. The process for generating this database is explained in detail in conjunction with some scripts that run in Python.
It’s necessary to have Python installed on your computer, with the pillow and setuptools modules installed.
Pillow is used to treat images using Python scripts and setuptools is a dependence that pillow requests. You should install the setuptools module first.
Pillow can be downloaded from
https://pypi.python.org/pypi/Pillow
and the setuptools module can be downloaded from
https://pypi.python.org/pypi/setuptools
. Both sites include information on how to install these modules on Linux, Windows, and MacOSX.
You will also need an image editor because it’s necessary to take some pictures of your face with different emotions and identify the coordinates of the center of each of your eyes. You can use Paint in Windows, Gimp on Linux/OSX and Windows, or any other software that allows you to move the mouse cursor in the image and obtain the coordinates.
You can download Gimp from
http://www.gimp.org/
.
Creating the Database
Follow these steps to create the database:
-
1.
Obtain the initial images.
-
2.
Crop the images.
-
3.
Organize the images in directories.
-
4.
Create the CSV file.
Let’s look at each step in more detail.
Obtaining the Initial Images
This example uses three emotions—happy, sad, and surprised. That means the database must contain at least three pictures of you of each state.
Such pictures must be obtained using your webcam. It doesn’t matter if you obtain the images with Intel Galileo using the code examples described previously, or if you connect the webcam to your computer and take the pictures using other software. The most important thing is to take at least three pictures of each emotion—sad, surprised, and happy—totalizing nine pictures. I recommend you take these pictures at a resolution of 1280x1024 or 1280x720. The images will be cropped and reduced and it is important to maintain the images with good definition after these changes.
In the initial_pictures subfolder of the code folder of this chapter, there are some pictures of me of each emotion. For each picture the pixel coordinates of the center of my eyes were taken—see Table 7-2.
Table 7-2. Central Coordinate of Each Eye on Each Emotional State
Be expressive when you take the pictures. Otherwise, it will be more difficult for the program to guess your emotional states.
Cropping the Images
The next step is to crop the images, removing the ears and hair, and try to generate 20x20 images with only the faces showing. The Python script that was initially created for gender classification was adapted to emotion classification, as shown in Listing 7-5.
Listing 7-5. align_faces.py
#!/usr/bin/env python
# Software License Agreement (BSD License)
#
# Copyright (c) 2012, Philipp Wagner
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or other materials provided
# with the distribution.
# * Neither the name of the author nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#
# Manoel Ramon 06/11/2014- changed the code to support images used
# as example of emotion classification
#
import sys, math, Image
def Distance(p1,p2):
dx = p2[0] - p1[0]
dy = p2[1] - p1[1]
return math.sqrt(dx*dx+dy*dy)
def ScaleRotateTranslate(image, angle, center = None, new_center = None, scale = None, resample=Image.BICUBIC):
if (scale is None) and (center is None):
return image.rotate(angle=angle, resample=resample)
nx,ny = x,y = center
sx=sy=1.0
if new_center:
(nx,ny) = new_center
if scale:
(sx,sy) = (scale, scale)
cosine = math.cos(angle)
sine = math.sin(angle)
a = cosine/sx
b = sine/sx
c = x-nx*a-ny*b
d = -sine/sy
e = cosine/sy
f = y-nx*d-ny*e
return image.transform(image.size, Image.AFFINE, (a,b,c,d,e,f), resample=resample)
def CropFace(image, eye_left=(0,0), eye_right=(0,0), offset_pct=(0.2,0.2), dest_sz = (70,70)):
# calculate offsets in original image
offset_h = math.floor(float(offset_pct[0])*dest_sz[0])
offset_v = math.floor(float(offset_pct[1])*dest_sz[1])
# get the direction
eye_direction = (eye_right[0] - eye_left[0], eye_right[1] - eye_left[1])
# calc rotation angle in radians
rotation = -math.atan2(float(eye_direction[1]),float(eye_direction[0]))
# distance between them
dist = Distance(eye_left, eye_right)
# calculate the reference eye-width
reference = dest_sz[0] - 2.0*offset_h
# scale factor
scale = float(dist)/float(reference)
# rotate original around the left eye
image = ScaleRotateTranslate(image, center=eye_left, angle=rotation)
# crop the rotated image
crop_xy = (eye_left[0] - scale*offset_h, eye_left[1] - scale*offset_v)
crop_size = (dest_sz[0]*scale, dest_sz[1]*scale)
image = image.crop((int(crop_xy[0]), int(crop_xy[1]), int(crop_xy[0]+crop_size[0]), int(crop_xy[1]+crop_size[1])))
# resize it
image = image.resize(dest_sz, Image.ANTIALIAS)
return image
if __name__ == "__main__":
#Serious_01.jpg
#left -> 528, 423
#right -> 770, 431
image = Image.open("serious_01.jpg")
CropFace(image, eye_left=(528,423), eye_right=(770,431), offset_pct=(0.2,0.2)).save("serious01_20_20_70_70.jpg")
#Serious_02.jpg
#left -> 522,412
#right -> 758, 415
image = Image.open("serious_02.jpg")
CropFace(image, eye_left=(522,412), eye_right=(758,415), offset_pct=(0.2,0.2)).save("serious02_20_20_70_70.jpg")
#Serious_03.jpg
#left -> 518, 423
#right -> 754, 425
image = Image.open("serious_03.jpg")
CropFace(image, eye_left=(518,423), eye_right=(754,425), offset_pct=(0.2,0.2)).save("serious03_20_20_70_70.jpg")
#Smile_01.jpg
#left -> 516, 377
#right -> 753, 379
image = Image.open("smile_01.jpg")
CropFace(image, eye_left=(516,377), eye_right=(753,379), offset_pct=(0.2,0.2)).save("smile01_20_20_70_70.jpg")
#Smile_02.jpg
#left -> 533, 374
#right -> 763, 380
image = Image.open("smile_02.jpg")
CropFace(image, eye_left=(533,374), eye_right=(763,380), offset_pct=(0.2,0.2)).save("smile02_20_20_70_70.jpg")
#Smile_03.jpg
#left -> 518, 379
#right -> 749, 381
image = Image.open("smile_03.jpg")
CropFace(image, eye_left=(518,379), eye_right=(749,381), offset_pct=(0.2,0.2)).save("smile03_20_20_70_70.jpg")
#surprised_01.jpg
#left -> 516,356
#right -> 754,355
image = Image.open("surprised_01.jpg")
CropFace(image, eye_left=(516,356), eye_right=(754,355), offset_pct=(0.2,0.2)).save("surprised01_20_20_70_70.jpg")
#surprised_02.jpg
#left -> 548, 364
#right -> 793, 364
image = Image.open("surprised_02.jpg")
CropFace(image, eye_left=(548,364), eye_right=(793,364), offset_pct=(0.2,0.2)).save("surprised02_20_20_70_70.jpg")
#surprised_03.jpg
#left -> 528, 377
#right -> 770, 378
image = Image.open("surprised_03.jpg")
CropFace(image, eye_left=(528,377), eye_right=(770,378), offset_pct=(0.2,0.2)).save("surprised03_20_20_70_70.jpg")
If you use the same filenames in your pictures, the only thing that you must change are the coordinates of your eyes for each picture. Then you copy the script into the same folder your pictures are in and run this in the computer shell:
mcramon@ubuntu:∼/tmp/opencv/emotion/mypics$
python align_faces.py
A series of images with the suffix _20_20_70_70 is created:
mcramon@ubuntu:∼/tmp/opencv/emotion/mypics$
ls *20*
serious01_20_20_70_70.jpg smile01_20_20_70_70.jpg surprised01_20_20_70_70.jpg
serious02_20_20_70_70.jpg smile02_20_20_70_70.jpg surprised02_20_20_70_70.jpg
serious03_20_20_70_70.jpg smile03_20_20_70_70.jpg surprised03_20_20_70_70.jpg
If you use different filenames and a different number of pictures, you need to change the script accordingly.
Do not worry about the details of this code; only keep in mind that this script uses the pillow module to create an image object that, using the CropFace() function, crops the image according to the scale reduction. For example, to crop the image file surprised_02.jpg to a scale of 20% x 20%, the following line of code is necessary:
image = Image.open("surprised_02.jpg")
CropFace(image, eye_left=(548,364), eye_right=(793,364), offset_pct=(0.2,0.2)).save("surprised02_20_20_70_70.jpg")
As a result, all the images will contain only your face, as shown in Figure 7-12.
The next step is to transfer these cropped images to Intel Galileo. A quick way to do that if you are using Linux, MacOSX, or Windows Cygwin and have Intel Galileo with a valid IP address on your network is to use scp. Run the following in the command line in the directory containing your images:
mcramon@ubuntu:∼/tmp/opencv/emotion/mypics$
for i in $(ls *20*);do scp $i root@192.254.1.1:/home/root/. ;done
All the images are transferred to the /home/root directory.
Organizing the Images in Directories
With the images transferred to Intel Galileo, organize the images by creating a directory for each type of emotion and transfer the pictures to the corresponding directory. For example, use the mkdir command to create the serious
,
smile, and surprised directories. Move each picture with the mv command to the corresponding directory. The result is something like this:
.
├──
serious
│ ├──
serious01_20_20_70_70.jpg
│ ├──
serious02_20_20_70_70.jpg
│ └──
serious03_20_20_70_70.jpg
├──
smile
│ ├──
smile01_20_20_70_70.jpg
│ ├──
smile02_20_20_70_70.jpg
│ └──
smile03_20_20_70_70.jpg
└──
surprised
├──
surprised01_20_20_70_70.jpg
├──
surprised02_20_20_70_70.jpg
└──
surprised03_20_20_70_70.
jpg
Creating the CSV File
The last step in creating the database is to create a CSV (comma-separated values) file. This is a simple text file that describes the exact location of each image and categorizes each image by emotion based on the directory.
An example of a CV file is shown in Listing 7-6.
Listing 7-6. my_csv.csv
/home/root/emotion/pics/
smile
/smile01_20_20_70_70.jpg
;0
/home/root/emotion/pics/
smile
/smile02_20_20_70_70.jpg
;0
/home/root/emotion/pics/
smile
/smile03_20_20_70_70.jpg
;0
/home/root/emotion/pics/
surprised
/surprised01_20_20_70_70.jpg
;1
/home/root/emotion/pics/
surprised
/surprised02_20_20_70_70.jpg
;1
/home/root/emotion/pics/
surprised
/surprised03_20_20_70_70.jpg
;1
/home/root/emotion/pics/
serious
/serious01_20_20_70_70.jpg
;2
/home/root/emotion/pics/
serious
/serious02_20_20_70_70.jpg
;2
/home/root/emotion/pics/
serious
/serious03_20_20_70_70.jpg
;2
Note that each image is delimited by ; with an index that represents the emotional state of the picture. In Listing 7-6, 0 represents smiling, 1 represents surprise, and 2 represents seriousness.
The script that helps create CSV files is shown in Listing 7-7.
Listing 7-7. create_csv.py
#!/usr/bin/env python
# Software License Agreement (BSD License)
#
# Copyright (c) 2012, Philipp Wagner
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or other materials provided
# with the distribution.
# * Neither the name of the author nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
import sys
import os.path
# This is a tiny script to help you creating a CSV file from a face
# database with a similar hierarchie:
#
# philipp@mango:∼/facerec/data/at$ tree
# .
# |-- README
# |-- s1
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
# |-- s2
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
# ...
# |-- s40
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
#
if __name__ == "__main__":
if len(sys.argv) != 2:
print "usage: create_csv <base_path>"
sys.exit(1)
BASE_PATH=sys.argv[1]
SEPARATOR=";"
label = 0
for dirname, dirnames, filenames in os.walk(BASE_PATH):
for subdirname in dirnames:
subject_path = os.path.join(dirname, subdirname)
for filename in os.listdir(subject_path):
abs_path = "%s/%s" % (subject_path, filename)
print "%s%s%d" % (abs_path, SEPARATOR, label)
label = label + 1
Transfer this file to Intel Galileo and run the following command line:
python create_csv.py
<the ABSOLUTE directory path> > <your file name>
For example:
root@clanton:∼/emotion# python create_csv.py $(pwd)/pics/ > my_csv.csv
And check the file:
root@clanton
:∼/emotion# cat my_csv.csv
/home/root/emotion/pics/smile/smile01_20_20_70_70.jpg;0
/home/root/emotion/pics/smile/smile02_20_20_70_70.jpg;0
/home/root/emotion/pics/smile/smile03_20_20_70_70.jpg;
0
/home/root/emotion/pics/surprised/surprised01_20_20_70_70.jpg;1
/home/root/emotion/pics/surprised/surprised02_20_20_70_70.jpg;1
/home/root/emotion/pics/surprised/surprised03_20_20_70_70.jpg;1
/home/root/emotion/pics/serious/serious01_20_20_70_70.jpg;2
/home/root/emotion/pics/serious/serious02_20_20_70_70.jpg;2
/home/root/emotion/pics/serious/serious03_20_20_70_70.jpg;2
The Code for Emotion Classification
The code for emotion classification uses a class called FaceRecognizer, which is responsible for reading your models. In other words, it reads the pictures and each state index in the database and, using a model called fisherface, feeds (or trains) the model in order to be able to predict emotions.
The code in this section is based on the code presented in Listing 7-7. Listing 7-8 shows the code with the new parts in bold.
Listing 7-8. opencv_emotion_classification.cpp
/*
* Copyright (c) 2011. Philipp Wagner <bytefish[at]gmx[dot]de>.
* Released to public domain under terms of the BSD Simplified license.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the organization nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* See <
http://www.opensource.org/licenses/bsd-license
>
*
* Manoel Ramon - 06/15/2014
* manoel.ramon@gmail.com
* code changed from original facerec_fisherface.cpp
* added:
* - adaption to emotions detection instead gender
* - picture took from the default video device
* - added face and eyes recognition
* - crop images based in human anatomy
* - prediction based in face recognized
*
*/
#include <opencv2/opencv.hpp>
#include <stdio.h>
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/core/core.hpp"
#include "opencv2/contrib/contrib.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <fstream>
#include <sstream>
using namespace cv;
using namespace std;
String face_cascade_name = "haarcascade_frontalface_alt.xml";
String eye_cascade_name = "haarcascade_eye.xml";
Mat faceDetect(Mat img);
CascadeClassifier face_cascade;
CascadeClassifier eyes_cascade;
using namespace cv;
using namespace std;
enum EmotionState_t {
SMILE =0, // 0
SURPRISED, // 1
SERIOUS, // 2
};
static void read_csv(const string&
filename, vector<Mat>&
images, vector<int>&
labels, char separator = ';') {
std::ifstream file(filename.c_str(), ifstream::in);
if (!file) {
string error_message = "No valid input file was given, please check the given filename.";
CV_Error(CV_StsBadArg, error_message);
}
string line, path, classlabel;
while (getline(file, line)) {
stringstream liness(line);
getline(liness, path, separator);
getline(liness, classlabel);
if(!path.empty() && !classlabel.empty()) {
images.push_back(imread(path, 0));
labels.push_back(atoi(classlabel.c_str()));
}
}
}
int main(int argc, const char *argv[])
{
EmotionState_t emotion;
// Check for valid command line arguments, print usage
// if no arguments were given.
if (argc < 2) {
cout << "usage: " << argv[0] << " <csv.ext> <output_folder> " << endl;
exit(1);
}
if( !face_cascade.load( face_cascade_name ) ){ printf("--(!)Error loading\n"); return -1; };
if( !eyes_cascade.load( eye_cascade_name ) ){ printf("--(!)Error loading\n"); return -1; };
// 0 is the ID of the built-in laptop camera, change if you want to use other camera
VideoCapture cap(-1);
//check if the file was opened properly
if(!cap.isOpened())
{
cout << "Capture could not be opened succesfully" << endl;
return -1;
}
else
{
cout << "camera is ok.. Stay 2 ft away from your camera\n" << endl;
}
int w = 432;
int h = 240;
cap.set(CV_CAP_PROP_FRAME_WIDTH, w);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, h);
Mat frame;
cap >>frame;
cout << "processing the image...." << endl;
Mat testSample = faceDetect(frame);
// Get the path to your CSV.
string fn_csv = string(argv[1]);
// These vectors hold the images and corresponding labels.
vector<Mat> images;
vector<int> labels;
// Read in the data. This can fail if no valid
// input filename is given.
try
{
read_csv(fn_csv, images, labels);
} catch (cv::Exception&
e) {
cerr << "Error opening file \"" << fn_csv << "\". Reason: " << e.msg << endl;
// nothing more we can do
exit(1);
}
// Quit if there are not enough images for this demo.
if(images.size() <= 1)
{
string error_message = "This demo needs at least 2 images to work. Please add more images to your data set!";
CV_Error(CV_StsError, error_message);
}
// Get the height from the first image. We'll need
this
// later in code to reshape the images to their original
// size:
int height = images[0].rows;
// The following lines create an Fisherfaces model for
// face recognition and train it with the images and
// labels read from the given CSV file.
// If you just want to keep 10 Fisherfaces, then call
// the factory method like this:
//
// cv::createFisherFaceRecognizer(10);
//
// However it is not useful to discard Fisherfaces! Please
// always try to use _all_ available Fisherfaces for
// classification.
//
// If you want to create a FaceRecognizer with a
// confidence threshold (e.g. 123.0) and use _all_
// Fisherfaces, then call it with:
//
// cv::createFisherFaceRecognizer(0, 123.0);
//
Ptr<FaceRecognizer> model = createFisherFaceRecognizer();
model->train(images, labels);
// The following line predicts the label of a given
// test image:
int predictedLabel = model->predict(testSample);
// To get the confidence of a prediction call the model with:
//
// int predictedLabel = -1;
// double confidence = 0.0;
// model->predict(testSample, predictedLabel, confidence);
//
string result_message = format("Predicted class = %d", predictedLabel);
cout << result_message << endl;
// giving the result
switch (predictedLabel)
{
case SMILE:
cout << "You are happy!" << endl;
break;
case SURPRISED:
cout << "You are surprised!" << endl;
break;
case SERIOUS:
cout << "You are serious!" << endl;
break;
}
return 0;
cap.release();
return 0;
}
Mat faceDetect(Mat img)
{
std::vector<Rect> faces;
std::vector<Rect> eyes;
bool two_eyes = false;
bool any_eye_detected = false;
//detecting faces
face_cascade.detectMultiScale( img, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
if (faces.size() == 0)
{
cout << "Try again.. I did not dectected any faces..." << endl;
exit(-1); // abort everything
}
Point p1 = Point(0,0);
for( size_t i = 0; i < faces.size(); i++ )
{
// we cannot draw in the image !!! otherwise will mess with the prediction
// rectangle( img, faces[i], Scalar( 255, 100, 0 ), 4, 8, 0 );
Mat frame_gray;
cvtColor( img, frame_gray, CV_BGR2GRAY );
// croping only the face in region defined by faces[i]
std::vector<Rect> eyes;
Mat faceROI = frame_gray( faces[i] );
//In each face, detect eyes
eyes_cascade.detectMultiScale( faceROI, eyes, 1.1, 3, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30) );
for( size_t j = 0; j < eyes.size(); j++ )
{
Point center( faces[i].x + eyes[j].x + eyes[j].width*0.5, faces[i].y + eyes[j].y + eyes[j].height*0.5 );
// we cannot draw in the image !!! otherwise will mess with the prediction
// int radius = cvRound( (eyes[j].width + eyes[j].height)*0.25 );
// circle( img, center, radius, Scalar( 255, 0, 0 ), 4, 8, 0 );
if (j==0)
{
p1 = center;
any_eye_detected = true;
}
else
{
two_eyes = true;
}
}
}
cout << "SOME DEBUG" << endl;
cout << "-------------------------" << endl;
cout << "faces detected:" << faces.size() << endl;
cout << "x: " << faces[0].x << endl;
cout << "y: " << faces[0].y << endl;
cout << "w: " << faces[0].width << endl;
cout << "h: " << faces[0].height << endl << endl;
Mat imageInRectangle;
imageInRectangle = img(faces[0]);
Size recFaceSize = imageInRectangle.size();
cout << recFaceSize << endl;
// for debug
imwrite("imageInRectangle.jpg", imageInRectangle);
int rec_w = 0;
int rec_h = faces[0].height * 0.64;
// checking the (x,y) for cropped rectangle
// based in human anatomy
int px = 0;
int py = 2 * 0.125 * faces[0].height;
Mat cropImage;
cout << "faces[0].x:" << faces[0].x << endl;
p1.x = p1.x - faces[0].x;
cout << "p1.x:" << p1.x << endl;
if (any_eye_detected)
{
if (two_eyes)
{
cout << "two eyes detected" << endl;
// we have detected two eyes
// we have p1 and p2
// left eye
px = p1.x / 1.35;
}
else
{
// only one eye was found.. need to check if the
// left or right eye
// we have only p1
if (p1.x > recFaceSize.width/2)
{
// right eye
cout << "only right eye detected" << endl;
px = p1.x / 1.75;
}
else
{
// left eye
cout << "only left eye detected" << endl;
px = p1.x / 1.35;
}
}
}
else
{
// no eyes detected but we have a face
px = 25;
py = 25;
rec_w = recFaceSize.width-50;
rec_h = recFaceSize.height-30;
}
rec_w = (faces[0].width - px) * 0.75;
cout << "px :" << px << endl;
cout << "py :" << py << endl;
cout << "rec_w:" << rec_w << endl;
cout << "rec_h:" << rec_h << endl;
cropImage = imageInRectangle(Rect(px, py, rec_w, rec_h));
Size dstImgSize(70,70); // same image size of db
Mat finalSizeImg;
resize(cropImage, finalSizeImg, dstImgSize);
// for debug
imwrite("onlyface.jpg", finalSizeImg);
cvtColor( finalSizeImg, finalSizeImg, CV_BGR2GRAY );
return finalSizeImg;
}
Reviewing opencv_emotion_classification.cpp
In the beginning of the code, there is an enumerator created to define the emotional state. Note the value of each element on this enum matches the emotion index in the CSV file.
enum EmotionState_t {
SMILE =0, // 0
SURPRISED, // 1
SERIOUS, // 2
};
In themain() function, a variable of type EmotionState_t is created and it is expected to receive the name of the CSV file as an argument.
int main(int argc, const char *argv[])
{
EmotionState_t emotion;
// Check for valid command line arguments, print usage
// if no arguments were given.
if (argc < 2) {
cout << "usage: " << argv[0] << " <csv.ext> <output_folder> " << endl;
exit(1);
}
When the webcam is opened, the picture is collected as before. ThefaceDetect() method changes, compared to the faceDetect() method shown earlier:
Mat testSample = faceDetect(frame);
This new object stored in testSample contains the cropped face. This cropped image is the same size as the images in the database. This image returned is in grayscale and is cropped like the images shown in Figure 7-12.
The frame contains an 432x240 image and thetestSample image is 70x70. For now, let’s continue with the main() function. faceDetect() will be discussed in more detail later.
With the image prepared to be analyzed, new components are used to predict the emotional state:
Ptr<FaceRecognizer> model = createFisherFaceRecognizer();
model->train(images, labels);
// The following line predicts the label of a given
// test image:
int predictedLabel = model->predict(testSample);
class FaceRecognizer : public Algorithm
At a glance, FaceRecognizer looks very simple, but in fact it’s very powerful and complex. This class allows you to set different algorithms, including your own, to perform different kinds of image recognitions.
The model used in the code is fisherface and it’s created by the line:
Ptr<FaceRecognizer> model = createFisherFaceRecognizer();
void FaceRecognizer::train(InputArrayOfArrays src, InputArray labels)
This method trains the model based on your database. The code passes the images and index (or labels):
model->train(images, labels);
int FaceRecognizer::predict(InputArray src) const = 0
This method predicts the classification index (label) based on the image casted as the input array src.
For example, if the emotion “happy” is labeled as 0 in the CSV file and the FaceRecognizer was trained, the prediction will return 0 if the image src is a picture of you smiling.
This is represented by the following snippet:
int predictedLabel = model->predict(testSample);
...
...
...
// giving the result
switch (predictedLabel)
{
case SMILE:
cout << "You are happy!" << endl;
break;
case SURPRISED:
cout << "You are surprised!" << endl;
break;
case SERIOUS:
cout << "You are serious!" << endl;
break;
}
If the image returned by faceDetect()is cropped properly and your expression is similar to the expression in the database, the algorithm will predict accurately.
The faceDetect() method basically does what was done before, as explained in the flowchart in Figure 7-10. In other words, it detects the face and eyes.
After this example detects the face and eyes, an algorithm containing a few simple concepts about human anatomy was introduced that crops the image to the face area.
It tries to crop the image captured by the webcam dynamically and do the same thing that was done manually by the script in Listing 7-7, but instead crop the image exclusively in the area the face is detected.
To understand how the logic works, see Figure 7-13.
Take a look at Figure 7-13 and follow this logic. When a face is recognized the whole face is typically captured, including the ears, head or hat, part of the neck, and part of the background image (imageRectangle). However, these elements are not interesting to the emotion classifier and must be removed (the red arrows area) and only the portion containing eyes, nose, and mouth are cropped (cropImage).
The cropped image has the initial px and py coordinates with the extensions rec_w and rec_h, which forms a triangle with perfect dimensions for cropping the area. Such a rectangle corresponds to the ROI (Region of Interest) area.
To reach the ROI area, the eyes are detected and then the human proportions are found using the px, px, rec_w, and rec_h values in the image and crop the image.
When the eyes are detected, it is possible to define a point object p1 that corresponds to the center of the eye. The point object p1 has two members x and y that represent the distance in pixels to the original image. There are a couple of problems, however. Sometimes only one eye is detected and the algorithm must determine if it’s the right or the left. Other times, no eye is detected.
//detecting faces
face_cascade.detectMultiScale
( img, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
Point p1 = Point(0,0);
for( size_t i = 0; i < faces.size(); i++ )
{
...
...
...
//
In each face, detect eyes
eyes_cascade.detectMultiScale
( faceROI, eyes, 1.1, 3, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30) );
for( size_t j = 0; j < eyes.size(); j++ )
{
Point center( faces[i].x + eyes[j].x + eyes[j].width*0.5, faces[i].y + eyes[j].y + eyes[j].height*0.5 );
...
...
...
if (j==0)
{
p1 = center;
any_eye_detected = true;
}
else
{
two_eyes = true;
}
}
}
At the moment, you might have the center of one of the eyes and it is known if one, two, or none eyes were detected. Now it is necessary to find the px and py coordinates, as well as the ROI dimensions, rec_w and rec_h.
In human anatomy, the eyes are located at the top of the horizontal red line that splits the human face in half. If you split the middle horizontal line equally in four parts, the eyes are separated by each other by half of the largest horizontal proportion and one-fourth from the laterals.
The nose and mouth are centralized in the middle of the face, with the inferior proportional divided in five equal parts. The eyebrows are 12.5% above the lines of the eyes because they are 50%/4 of the superior part of the face.
If no eye is detected then it is not possible to crop using a simple algorithm. With these proportions in mind, the following lines were created:
int rec_w = 0;
int rec_h = faces[0].height * 0.64;
// checking the (x,y) for cropped rectangle
// based in human anatomy
int px = 0;
int py = 2 * 0.125 * faces[0].height;
Mat cropImage;
cout << "faces[0].x:" << faces[0].x << endl;
p1.x = p1.x - faces[0].x;
cout << "p1.x:" << p1.x << endl;
if (any_eye_detected)
{
if (two_eyes)
{
cout << "two eyes detected" << endl;
// we have detected two eyes
// we have p1 and p2
// left eye
px = p1.x / 1.35;
}
else
{
// only one eye was found.. need to check if the
// left or right eye
// we have only p1
if (p1.x > recFaceSize.width/2)
{
// right eye
cout << "only right eye detected" << endl;
px = p1.x / 1.75;
}
else
{
// left eye
cout << "only left eye detected" << endl;
px = p1.x / 1.35;
}
}
}
else
{
// no eyes detected but we have a face
px = 25;
py = 25;
rec_w = recFaceSize.width-50;
rec_h = recFaceSize.height-30;
}
rec_w = (faces[0].width - px) * 0.75;
cout << "px :" << px << endl;
cout << "py :" << py << endl;
cout << "rec_w:" << rec_w << endl;
cout << "rec_h:" << rec_h << endl;
cropImage = imageInRectangle(Rect(px, py, rec_w, rec_h));
For debugging purposes, the faceDetect() method saves two images in the file system every time the software runs. One is called onlyface.jpg and it contains the cropped image. The other is called imageInRectangle.jpg and it contains the detected image.
Mat imageInRectangle;
imageInRectangle = img(faces[0]);
...
...
...
// for debug
imwrite("imageInRectangle.jpg", imageInRectangle);
cropImage = imageInRectangle(Rect(px, py, rec_w, rec_h));
...
...
...
Size dstImgSize(70,70); // same image size of db
Mat finalSizeImg;
resize(cropImage, finalSizeImg, dstImgSize);
Running opencv_emotion_classification.cpp
Compile the code and transfer the file to Intel Galileo. Make sure theuvcvideo driver is loaded and the webcam is connected to the USB port (read the “Connecting the Webcam” section in this chapter), and transfer the program to the same location of your CSV file. Stay in front your camera, preferably two feet away, make some emotional expressions, and then run the following command:
root@clanton:∼/emotion#
./opencv_emotion_classification my_csv.csv 2> /dev/null
camera is ok.. Stay 2 ft away from your camera
processing the image....
SOME DEBUG
-------------------------
faces detected:1
x: 172
y: 25
w: 132
h: 132
[132 x 132]
faces[0].x:172
p1.x:-172
px :25
py :25
rec_w:80
rec_h:102
Predicted class = 0
You are happy!
The software classifies the image as happy. Extracting the debug images onlyface.jpg and imageInRectangle.jpg from the file system, it is possible to observe my expression in the cropped image, shown in Figure 7-14.
Note in Figure 7-14 the areas that are cropped out, including the background, the hair, and the ears.
root@clanton:∼/emotion#
./opencv_emotion_classification my_csv.csv 2> /dev/null
camera is ok.. Stay 2 ft away from your camera
processing the image....
SOME DEBUG
-------------------------
faces detected:1
x: 178
y: 3
w: 143
h: 143
[143 x 143]
faces[0].x:178
p1.x:43
two eyes detected
px :31
py :35
rec_w:84
rec_h:91
Predicted class = 1
You are surprised!
The software classifies this image as surprised. Extracting the debug images onlyface.jpg and imageInRectangle.jpg from the file system, it observes my expression and crops the image, as shown in Figure 7-15.
Keep varying the expression and checking the efficiency of the captured images.