Want to download the code instead of writing it down from this website? get it from HERE
For those following this tutorial or buying the code from the link above I want to clarify a few things:
1. The youtube video you see was created with the code in these pages. I didn't leave anything out.
2. I did not create some kind of adaptive light filter in this code. Yes, that means that in each different lighting, the thresholds have to be set up manually to get good recognition.
3. This algorithm doesn't work well with images. It was meant to do real time video face detection. The video doesn't scan for different sized faces.
4. Read all comments on all pages. There has been mistakes found. However, even with mistakes, this algorithm worked fine. Why? The mistakes are additions and subtractions in the Haar features. Technically, by adding the integral images vs subtracting them, you are still creating a Haar feature.
5. I can't/wont sell the images with the code. You can download the images HERE.
6. I sell the code to help support my server costs. You don't have to buy it. You can follow the whole tutorial here. I just make it easier for both of us.
7. I try my best to respond to all e-mails, but lately I am receiving a lot of e-mails and I might not be able to respond to you.
IMPORTANT - This algorithm was written in MATLAB! Not the best language to write this algorithm in, but I do not think anybody else has done it in MATLAB, for a reason!
This is my first try at machine learning. I knew very little about it when I started, but I feel that I'm ok enough to give you a tutorial on what I did and how i perceived this algorithm should be implemented.
I decided that I would implement my version of the Viola and Jones method for face detection. Using Haar like features and adaptive boosting... blah blah, we don't care about the technicals right? Just want to understand how this is done. so here is my video of my implementation. I still need to tweak it some more to make some improvements.
Yes that is me! I should have hired some beautiful model instead of having my face here, then maybe this page would get more hits.
Anyways, lets get started.
To do face detection first we need a dataset of images of faces and non-faces. You need as many as you can get ( in the thousands or ten thousands range). Because you will do supervised training, which means you are already telling the computer if each picture is a face or non-face, you need to get those images.
In my case, I obtained a database from MIT which was composed of 2429 faces and 4547 non faces. The images are 19 X 19 ( later I realized that I should have looked for something in the range of 25 X 25, it seems to capture features better). They are all black and white images. Here are some of them:
Once you have your own nice database, you need to understand the Haar features. They are rectangles that map over the faces of people and tell you if you are a face or non face. I will explain how this is accomplished later. In here I will just focus on what took me a long time to figure out. Why are there so many Haar features in a 19 x 19 Image? So here are what they look like:
There are so many more variations, but these are the only ones I used. So lets take the first one in the list here. A white rectangle above a black rectangle. If you see your image as a 19 x 19 matrix ( that is from x1 to x19 and y1 to y19), and you start with this feature being a 1 x 2 in size and in position x1,y1, then you have your first feature:
You would do all your calculations based on this classifier, then move on to the next size, which would be a 1 x 4 in position x1, y1, your second feature:
And you get the point, it would keep increasing in size.It would be 1 x 8, 1 x 16 and well, it cannot go farther than that.
Then the classifiers would start with a 2 x 2.
And it would continue 2 x 6, 2 x 8, and so on. Eventually, you will come to an end of this classifier. The code for this would be:
The feature matrix explains the first sizes that each of the 5 classifiers can be.
- Each feature (5 total), this is i.
- They all must start at 1 x 2, this is sizeX and sizeY.
- They cannot go over the size of 19 x 19, this is x and y.
- winLength and winWidth are parameters that each feature will increase in size through the image.
CalcBestThresh() is the function called that will see if the feature being tested is a good one or not.