Tutorials Archives - PyImageSearch

Struggling to get started with machine learning using Python? In this step-by-step, hands-on tutorial you will learn how to perform machine learning using Python on numerical data and image data.

By the time you are finished reading this post, you will be able to get your start in machine learning.

To launch your machine learning in Python education, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Machine Learning in Python

Inside this tutorial, you will learn how to perform machine learning in Python on numerical data and image data.

You will learn how to operate popular Python machine learning and deep learning libraries, including two of my favorites:

scikit-learn
Keras

Specifically, you will learn how to:

Examine your problem
Prepare your data (raw data, feature extraction, feature engineering, etc.)
Spot-check a set of algorithms
Examine your results
Double-down on the algorithms that worked best

Using this technique you will be able to get your start with machine learning and Python!

Along the way, you’ll discover popular machine learning algorithms that you can use in your own projects as well, including:

k-Nearest Neighbors (k-NN)
Naïve Bayes
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees
Random Forests
Perceptrons
Multi-layer, feedforward neural networks
Convolutional Neural Networks (CNNs)

This hands-on experience will give you the knowledge (and confidence) you need to apply machine learning in Python to your own projects.

Install the required Python machine learning libraries

Before we can get started with this tutorial you first need to make sure your system is configured for machine learning. Today’s code requires the following libraries:

NumPy: For numerical processing with Python.
PIL: A simple image processing library. OpenCV is not a requirement today!
scikit-learn: Contains the machine learning algorithms we’ll cover today (we’ll need version 0.20+ which is why you see the
```
--upgrade
```
flag below).
Keras and TensorFlow: For deep learning. The CPU version of TensorFlow is fine for today’s example.
imutils: My personal package of image processing/computer vision convenience functions

Each of these can be installed in your environment (virtual environments recommended) with pip:

$ pip install numpy
$ pip install pillow
$ pip install --upgrade scikit-learn
$ pip install tensorflow # or tensorflow-gpu
$ pip install keras
$ pip install --upgrade imutils

Datasets

In order to help you gain experience performing machine learning in Python, we’ll be working with two separate datasets.

The first one, the Iris dataset, is the machine learning practitioner’s equivalent of “Hello, World!” (likely one of the first pieces of software you wrote when learning how to program).

The second dataset, 3-scenes, is an example image dataset I put together — this dataset will help you gain experience working with image data, and most importantly, learn what techniques work best for numerical/categorical datasets vs. image datasets.

Let’s go ahead and get a more intimate look at these datasets.

The Iris dataset

Figure 1: The Iris dataset is a numerical dataset describing Iris flowers. It captures measurements of their sepal and petal length/width. Using these measurements we can attempt to predict flower species with Python and machine learning. (source)

The Iris dataset is arguably one of the most simplistic machine learning datasets — it is often used to help teach programmers and engineers the fundamentals of machine learning and pattern recognition.

We call this dataset the “Iris dataset” because it captures attributes of three Iris flower species:

Iris Setosa
Iris Versicolor
Iris Virginica

Each species of flower is quantified via four numerical attributes, all measured in centimeters:

Sepal length
Sepal width
Petal length
Petal width

Our goal is to train a machine learning model to correctly predict the flower species from the measured attributes.

It’s important to note that one of the classes is linearly separable from the other two — the latter are not linearly separable from each other.

In order to correctly classify these the flower species, we will need a non-linear model.

It’s extremely common to need a non-linear model when performing machine learning with Python in the real world — the rest of this tutorial will help you gain this experience and be more prepared to conduct machine learning on your own datasets.

The 3-scenes image dataset

Figure 2: The 3-scenes dataset consists of pictures of coastlines, forests, and highways. We’ll use Python to train machine learning and deep learning models.

The second dataset we’ll be using to train machine learning models is called the 3-scenes dataset and includes 948 total images of 3 scenes:

Coast (360 of images)
Forest (328 of images)
Highway (260 of images)

The 3-scenes dataset was created by sampling the 8-scenes dataset from Oliva and Torralba’s 2001 paper, Modeling the shape of the scene: a holistic representation of the spatial envelope.

Our goal will be to train machine learning and deep learning models with Python to correctly recognize each of these scenes.

I have included the 3-scenes dataset in the “Downloads” section of this tutorial. Make sure you download the dataset + code to this blog post before continuing.

Steps to perform machine learning in Python

Figure 3: Creating a machine learning model with Python is a process that should be approached systematically with an engineering mindset. These five steps are repeatable and will yield quality machine learning and deep learning models.

Whenever you perform machine learning in Python I recommend starting with a simple 5-step process:

Examine your problem
Prepare your data (raw data, feature extraction, feature engineering, etc.)
Spot-check a set of algorithms
Examine your results
Double-down on the algorithms that worked best

This pipeline will evolve as your machine learning experience grows, but for beginners, this is the machine learning process I recommend for getting started.

To start, we must examine the problem.

Ask yourself:

What type of data am I working with? Numerical? Categorical? Images?
What is the end goal of my model?
How will I define and measure “accuracy”?
Given my current knowledge of machine learning, do I know any algorithms that work well on these types of problems?

The last question, in particular, is critical — the more you apply machine learning in Python, the more experience you will gain.

Based on your previous experience you may already know an algorithm that works well.

From there, you need to prepare your data.

Typically this step involves loading your data from disk, examining it, and deciding if you need to perform feature extraction or feature engineering.

Feature extraction is the process of applying an algorithm to quantify your data in some manner.

For example, when working with images we may wish to compute histograms to summarize the distribution of pixel intensities in the image — in this manner, we can characterize the color of the image.

Feature engineering, on the other hand, is the process of transforming your raw input data into a representation that better represents the underlying problem.

Feature engineering is a more advanced technique and one I recommend you explore once you already have some experience with machine learning and Python.

Next, you’ll want to spot-check a set of algorithms.

What do I mean by spot-checking?

Simply take a set of machine learning algorithms and apply them to the dataset!

You’ll likely want to stuff the following machine learning algorithms in your toolbox:

A linear model (ex. Logistic Regression, Linear SVM),
A few non-linear models (ex. RBF SVMs, SGD classifiers),
Some tree and ensemble-based models (ex. Decision Trees, Random Forests).
A few neural networks, if applicable (Multi-layer Perceptrons, Convolutional Neural Networks)

Try to bring a robust set of machine learning models to the problem — your goal here is to gain experience on your problem/project by identifying which machine learning algorithms performed well on the problem and which ones did not.

Once you’ve defined your set of models, train them and evaluate the results.

Which machine learning models worked well? Which models performed poorly?

Take your results and use them to double-down your efforts on the machine learning models that performed while discarding the ones that didn’t.

Over time you will start to see patterns emerge across multiple experiments and projects.

You’ll start to develop a “sixth sense” of what machine learning algorithms perform well and in what situation.

For example, you may discover that Random Forests work very well when applied to projects that have many real-valued features.

On the other hand, you might note that Logistic Regression can handle sparse, high-dimensional spaces well.

You may even find that Convolutional Neural Networks work great for image classification (which they do).

Use your knowledge here to supplement traditional machine learning education — the best way to learn machine learning with Python is to simply roll up your sleeves and get your hands dirty!

A machine learning education based on practical experience (supplemented with some super basic theory) will take you a long way on your machine learning journey!

Let’s get our hands dirty!

Now that we have discussed the fundamentals of machine learning, including the steps required to perform machine learning in Python, let’s get our hands dirty.

In the next section, we’ll briefly review our directory and project structure for this tutorial.

Note: I recommend you use the “Downloads” section of the tutorial to download the source code and example data so you can easily follow along.

Once we’ve reviewed the directory structure for the machine learning project we will implement two Python scripts:

The first script will be used to train machine learning algorithms on numerical data (i.e., the Iris dataset)
The second Python script will be utilized to train machine learning on image data (i.e., the 3-scenes dataset)

As a bonus we’ll implement two more Python scripts, each of these dedicated to neural networks and deep learning:

We’ll start by implementing a Python script that will train a neural network on the Iris dataset
Secondly, you’ll learn how to train your first Convolutional Neural Network on the 3-scenes dataset

Let’s get started by first reviewing our project structure.

Our machine learning project structure

Be sure to grab the “Downloads” associated with this blog post.

From there you can unzip the archive and inspect the contents:

$ tree --dirsfirst --filelimit 10
.
├── 3scenes
│   ├── coast [360 entries]
│   ├── forest [328 entries]
│   └── highway [260 entries]
├── classify_iris.py
├── classify_images.py
├── nn_iris.py
└── basic_cnn.py

4 directories, 4 files

The Iris dataset is built into scikit-learn. The 3-scenes dataset, however, is not. I’ve included it in the

3scenes/

directory and as you can see there are three subdirectories (classes) of images.

We’ll be reviewing four Python machine learning scripts today:

```
classify_iris.py
```
: Loads the Iris dataset and can apply any one of seven machine learning algorithms with a simple command line argument switch.
```
classify_images.py
```
: Gathers our image dataset (3-scenes) and applies any one of seven Python machine learning algorithms
```
nn_iris.py
```
: Applies a simple multi-layer neural network to the Iris dataset
```
basic_cnn.py
```
: Builds a Convolutional Neural Network (CNN) and trains a model using the 3-scenes dataset

Implementing Python machine learning for numerical data

Figure 4: Over time, many statistical machine learning approaches have been developed. You can use this map from the scikit-learn team as a guide for the most popular methods. Expand.

The first script we are going to implement is

classify_iris.py

— this script will be used to spot-check machine learning algorithms on the Iris dataset.

Once implemented, we’ll be able to use

classify_iris.py

to run a suite of machine learning algorithms on the Iris dataset, look at the results, and decide on which algorithm works best for the project.

Let’s get started — open up the

classify_iris.py

file and insert the following code:

# import the necessary packages
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.datasets import load_iris
import argparse

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, default="knn",
	help="type of python machine learning model to use")
args = vars(ap.parse_args())

Lines 2-12 import our required packages, specifically:

Our Python machine learning methods from scikit-learn (Lines 2-8)
A dataset splitting method used to separate our data into training and testing subsets (Line 9)
The classification report utility from scikit-learn which will print a summarization of our machine learning results (Line 10)
Our Iris dataset, built into scikit-learn (Line 11)
A tool for command line argument parsing called
```
argparse
```
(Line 12)

Using

argparse

, let’s parse a single command line argument flag,

--model

on Lines 15-18. The

--model

switch allows us to choose from any of the following models:

# define the dictionary of models our script can use, where the key
# to the dictionary is the name of the model (supplied via command
# line argument) and the value is the model itself
models = {
	"knn": KNeighborsClassifier(n_neighbors=1),
	"naive_bayes": GaussianNB(),
	"logit": LogisticRegression(solver="lbfgs", multi_class="auto"),
	"svm": SVC(kernel="rbf", gamma="auto"),
	"decision_tree": DecisionTreeClassifier(),
	"random_forest": RandomForestClassifier(n_estimators=100),
	"mlp": MLPClassifier()
}

The

models

dictionary on Lines 23-31 defines the suite of models we will be spot-checking (we’ll review the results of each of these algorithms later in the post):

k-Nearest Neighbor (k-NN)
Naïve Bayes
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees
Random Forests
Perceptrons

The keys can be entered directly in the terminal following the

--model

switch. Here’s an example:

$ python classify_irs.py --model knn

From there the

KNeighborClassifier

will be loaded automatically. This conveniently allows us to call any one of 7 machine learning models one-at-a-time and on demand in a single Python script (no editing the code required)!

Moving on, let’s load and split our data:

# load the Iris dataset and perform a training and testing split,
# using 75% of the data for training and 25% for evaluation
print("[INFO] loading data...")
dataset = load_iris()
(trainX, testX, trainY, testY) = train_test_split(dataset.data,
	dataset.target, random_state=3, test_size=0.25)

Our dataset is easily loaded with the dedicated

load_iris

method on Line 36. Once the data is in memory, we go ahead and call

train_test_split

to separate the data into 75% for training and 25% for testing (Lines 37 and 38).

The final step is to train and evaluate our model:

# train the model
print("[INFO] using '{}' model".format(args["model"]))
model = models[args["model"]]
model.fit(trainX, trainY)

# make predictions on our data and show a classification report
print("[INFO] evaluating...")
predictions = model.predict(testX)
print(classification_report(testY, predictions,
	target_names=dataset.target_names))

Lines 42 and 43 train the Python machine learning

model

(also known as “fitting a model”, hence the call to

.fit

From there, we evaluate the

model

on the testing set (Line 47) and then

print

classification_report

to our terminal (Lines 48 and 49).

Implementing Python machine learning for images

Figure 5: A linear classifier example for implementing Python machine learning for image classification (Inspired by Karpathy’s example in the CS231n course).

The following script,

classify_images.py

, is used to train the same suite of machine learning algorithms above, only on the 3-scenes image dataset.

It is very similar to our previous Iris dataset classification script, so be sure to compare the two as you follow along.

Let’s implement this script now:

# import the necessary packages
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from PIL import Image
from imutils import paths
import numpy as np
import argparse
import os

First, we import our necessary packages on Lines 2-16. It looks like a lot, but you’ll recognize most of them from the previous script. The additional imports for this script include:

The
```
LabelEncoder
```
will be used to transform textual labels into numbers (Line 9).
A basic image processing tool called PIL/Pillow (Line 12). We’re using this in place of OpenCV today, mainly because it is easier to install.
My handy module,
```
paths
```
, for easily grabbing image paths from disk (Line 13). This is included in my personal imutils package which I’ve released to GitHub and PyPi.
NumPy will be used for numerical computations (Line 14).
Python’s built-in
```
os
```
module (Line 16). We’ll use it for accommodating path separators among different operating systems.

You’ll see how each of the imports is used in the coming lines of code.

Next let’s define a function called

extract_color_stats

def extract_color_stats(image):
	# split the input image into its respective RGB color channels
	# and then create a feature vector with 6 values: the mean and
	# standard deviation for each of the 3 channels, respectively
	(R, G, B) = image.split()
	features = [np.mean(R), np.mean(G), np.mean(B), np.std(R),
		np.std(G), np.std(B)]

	# return our set of features
	return features

Most machine learning algorithms perform very poorly on raw pixel data. Instead, we perform feature extraction to characterize the contents of the images.

Here we seek to quantify the color of the image by extracting the mean and standard deviation for each color channel in the image.

Given three channels of the image (Red, Green, and Blue), along with two features for each (mean and standard deviation), we have 3 x 2 = 6 total features to quantify the image. We form a feature vector by concatenating the values.

In fact, that’s exactly what the

extract_color_stats

function is doing:

We split the three color channels from the
```
image
```
on Line 22.
And then the feature vector is built on Lines 23 and 24 where you can see we’re using NumPy to calculate the mean and standard deviation for each channel

We’ll be using this function to calculate a feature vector for each image in the dataset.

Let’s go ahead and parse two command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, default="3scenes",
	help="path to directory containing the '3scenes' dataset")
ap.add_argument("-m", "--model", type=str, default="knn",
	help="type of python machine learning model to use")
args = vars(ap.parse_args())

Where the previous script had one argument, this script has two command line arguments:

```
--dataset
```
: The path to the 3-scenes dataset residing on disk.
```
--model
```
: The Python machine learning model to employ.

Again, we have seven machine learning models to choose from with the

--model

argument:

# define the dictionary of models our script can use, where the key
# to the dictionary is the name of the model (supplied via command
# line argument) and the value is the model itself
models = {
	"knn": KNeighborsClassifier(n_neighbors=1),
	"naive_bayes": GaussianNB(),
	"logit": LogisticRegression(solver="lbfgs", multi_class="auto"),
	"svm": SVC(kernel="linear"),
	"decision_tree": DecisionTreeClassifier(),
	"random_forest": RandomForestClassifier(n_estimators=100),
	"mlp": MLPClassifier()
}

After defining the

models

dictionary, we’ll need to go ahead and load our images into memory:

# grab all image paths in the input dataset directory, initialize our
# list of extracted features and corresponding labels
print("[INFO] extracting image features...")
imagePaths = paths.list_images(args["dataset"])
data = []
labels = []

# loop over our input images
for imagePath in imagePaths:
	# load the input image from disk, compute color channel
	# statistics, and then update our data list
	image = Image.open(imagePath)
	features = extract_color_stats(image)
	data.append(features)

	# extract the class label from the file path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-2]
	labels.append(label)

Our

imagePaths

are extracted on Line 53. This is just a list of the paths themselves, we’ll load each actual image shortly.

I’ve defined two lists,

data

and

labels

(Lines 54 and 55). The

data

list will hold our image feature vectors and the class

labels

corresponding to them. Knowing the label for each image allows us to train our machine learning model to automatically predict class labels for our test images.

Lines 58-68 consist of a loop over the

imagePaths

in order to:

Load each
```
image
```
(Line 61).
Extract a color stats feature vector (mean and standard deviation of each channel) from the
```
image
```
using the function previously defined (Line 62).
Then on Line 63 the feature vector is added to our
```
data
```
list.
Finally, the class
```
label
```
is extracted from the path and appended to the corresponding
```
labels
```
list (Lines 67 and 68).

Now, let’s encode our

labels

and construct our data splits:

# encode the labels, converting them from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels)

# perform a training and testing split, using 75% of the data for
# training and 25% for evaluation
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25)

Our textual

labels

are transformed into an integer representing the label using the

LabelEncoder

(Lines 71 and 72).

Just as in our Iris classification script, we split our data into 75% for training and 25% for testing (Lines 76 and 77).

Finally, we can train and evaluate our model:

# train the model
print("[INFO] using '{}' model".format(args["model"]))
model = models[args["model"]]
model.fit(trainX, trainY)

# make predictions on our data and show a classification report
print("[INFO] evaluating...")
predictions = model.predict(testX)
print(classification_report(testY, predictions,
	target_names=le.classes_))

These lines are nearly identical to the Iris classification script. We’re fitting (training) our

model

and evaluating it (Lines 81-86). A

classification_report

is printed in the terminal so that we can analyze the results (Lines 87 and 88).

Speaking of results, now that we’re finished implementing both

classify_irs.py

and

classify_images.py

, let’s put them to the test using each of our 7 Python machine learning algorithms.

k-Nearest Neighbor (k-NN)

Figure 6: The k-Nearest Neighbor (k-NN) method is one of the simplest machine learning algorithms.

The k-Nearest Neighbors classifier is by far the most simple image classification algorithm.

In fact, it’s so simple that it doesn’t actually “learn” anything. Instead, this algorithm relies on the distance between feature vectors. Simply put, the k-NN algorithm classifies unknown data points by finding the most common class among the k closest examples.

Each data point in the k closest data points casts a vote and the category with the highest number of votes wins!

Or, in plain English: “Tell me who your neighbors are, and I’ll tell you who you are.”

For example, in Figure 6 above we see three sets of our flowers:

Daises
Pansies
Sunflowers

We have plotted each of the flower images according to their lightness of the petals (color) and the size of the petals (this is an arbitrary example so excuse the non-formality).

We can clearly see the image the new image is a sunflower, but what does k-NN think given our new image is equal distance to one pansy and two sunflowers?

Well, k-NN would examine the three closest neighbors (k=3) and since there are two votes for sunflowers versus one vote for pansies, the sunflower class would be selected.

To put k-NN in action, make sure you’ve used the “Downloads” section of the tutorial to download the source code and example datasets.

From there, open up a terminal and execute the following command:

$ python classify_iris.py 
[INFO] loading data...
[INFO] using 'knn' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.92      0.92      0.92        12
   virginica       0.91      0.91      0.91        11

   micro avg       0.95      0.95      0.95        38
   macro avg       0.94      0.94      0.94        38

Here you can see that k-NN is obtaining 95% accuracy on the Iris dataset, not a bad start!

Let’s look at our 3-scenes dataset:

python classify_images.py --model knn
[INFO] extracting image features...
[INFO] using 'knn' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.84      0.68      0.75       105
      forest       0.78      0.77      0.77        78
     highway       0.56      0.78      0.65        54

   micro avg       0.73      0.73      0.73       237
   macro avg       0.72      0.74      0.72       237
weighted avg       0.75      0.73      0.73       237

On the 3-scenes dataset, the k-NN algorithm is obtaining 75% accuracy.

In particular, k-NN is struggling to recognize the “highway” class (~56% accuracy).

We’ll be exploring methods to improve our image classification accuracy in the rest of this tutorial.

For more information on how the k-Nearest Neighbors algorithm works, be sure to refer to this post.

Naïve Bayes

Figure 7: The Naïve Bayes machine learning algorithm is based upon Bayes’ theorem (source).

After k-NN, Naïve Bayes is often the first true machine learning algorithm a practitioner will study.

The algorithm itself has been around since the 1950s and is often used to obtain baselines for future experiments (especially in domains related to text retrieval).

The Naïve Bayes algorithm is made possible due to Bayes’ theorem (Figure 7).

Essentially, Naïve Bayes formulates classification as an expected probability.

Given our input data, D, we seek to compute the probability of a given class, C.

Formally, this becomes P(C | D).

To actually compute the probability we compute the numerator of Figure 7 (ignoring the denominator).

The expression can be interpreted as:

Computing the probability of our input data given the class (ex., the probability of a given flower being Iris Setosa having a sepal length of 4.9cm)
Then multiplying by the probability of us encountering that class throughout the population of the data (ex. the probability of even encountering the Iris Setosa class in the first place)

Let’s go ahead and apply the Naïve Bayes algorithm to the Iris dataset:

$ python classify_iris.py --model naive_bayes
[INFO] loading data...
[INFO] using 'naive_bayes' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.92      0.96        12
   virginica       0.92      1.00      0.96        11

   micro avg       0.97      0.97      0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

We are now up to 98% accuracy, a marked increase from the k-NN algorithm!

Now let’s apply Naïve Bayes to the 3-scenes dataset for image classification:

$ python classify_images.py --model naive_bayes
[INFO] extracting image features...
[INFO] using 'naive_bayes' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.69      0.40      0.50        88
      forest       0.68      0.82      0.74        84
     highway       0.61      0.78      0.68        65

   micro avg       0.65      0.65      0.65       237
   macro avg       0.66      0.67      0.64       237
weighted avg       0.66      0.65      0.64       237

Uh oh!

It looks like we only obtained 66% accuracy here.

Does that mean that k-NN is better than Naïve Bayes and that we should always use k-NN for image classification?

Not so fast.

All we can say here is that for this particular project and for this particular set of extracted features the k-NN machine learning algorithm outperformed Naive Bayes.

We cannot say that k-NN is better than Naïve Bayes and that we should always use k-NN instead.

Thinking that one machine learning algorithm is always better than the other is a trap I see many new machine learning practitioners fall into — don’t make that mistake.

For more information on the Naïve Bayes machine learning algorithm, be sure to refer to this excellent article.

Logistic Regression

Figure 8: Logistic Regression is a machine learning algorithm based on a logistic function always in the range [0, 1]. Similar to linear regression, but based on a different function, every machine learning and Python enthusiast needs to know Logistic Regression (source).

The next machine learning algorithm we are going to explore is Logistic Regression.

Logistic Regression is a supervised classification algorithm often used to predict the probability of a class label (the output of a Logistic Regression algorithm is always in the range [0, 1]).

Logistic Regression is heavily used in machine learning and is an algorithm any machine learning practitioner needs Logistic Regression in their Python toolbox.

Let’s apply Logistic Regression to the Iris dataset:

$ python classify_iris.py --model logit
[INFO] loading data...
[INFO] using 'logit' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.92      0.96        12
   virginica       0.92      1.00      0.96        11

   micro avg       0.97      0.97      0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

Here we are able to obtain 98% classification accuracy!

And furthermore, note that both the Setosa and Versicolor classes are classified 100% correctly!

Now let’s apply Logistic Regression to the task of image classification:

$ python classify_images.py --model logit
[INFO] extracting image features...
[INFO] using 'logit' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.67      0.67      0.67        92
      forest       0.79      0.82      0.80        82
     highway       0.61      0.57      0.59        63

   micro avg       0.70      0.70      0.70       237
   macro avg       0.69      0.69      0.69       237
weighted avg       0.69      0.70      0.69       237

Logistic Regression performs slightly better than Naive Bayes here, obtaining 69% accuracy but in order to beat k-NN we’ll need a more powerful Python machine learning algorithm.

Support Vector Machines (SVMs)

Figure 9: Python machine learning practitioners will often apply Support Vector Machines (SVMs) to their problems. SVMs are based on the concept of a hyperplane and the perpendicular distance to it as shown in 2-dimensions (the hyperplane concept applies to higher dimensions as well).

Support Vector Machines (SVMs) are extremely powerful machine learning algorithms capable of learning separating hyperplanes on non-linear datasets through the kernel trick.

If a set of data points are not linearly separable in an N-dimensional space we can project them to a higher dimension — and perhaps in this higher dimensional space the data points are linearly separable.

The problem with SVMs is that it can be a pain to tune the knobs on an SVM to get it to work properly, especially for a new Python machine learning practitioner.

When using SVMs it often takes many experiments with your dataset to determine:

The appropriate kernel type (linear, polynomial, radial basis function, etc.)
Any parameters to the kernel function (ex. degree of the polynomial)

If, at first, your SVM is not obtaining reasonable accuracy you’ll want to go back and tune the kernel and associated parameters — tuning those knobs of the SVM is critical to obtaining a good machine learning model. With that said, let’s apply an SVM to our Iris dataset:

$ python classify_iris.py --model svm
[INFO] loading data...
[INFO] using 'svm' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.92      0.96        12
   virginica       0.92      1.00      0.96        11

   micro avg       0.97      0.97      0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

Just like Logistic Regression, our SVM obtains 98% accuracy — in order to obtain 100% accuracy on the Iris dataset with an SVM, we would need to further tune the parameters to the kernel.

Let’s apply our SVM to the 3-scenes dataset:

$ python classify_images.py --model svm
[INFO] extracting image features...
[INFO] using 'svm' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.84      0.76      0.80        92
      forest       0.86      0.93      0.89        84
     highway       0.78      0.80      0.79        61

   micro avg       0.83      0.83      0.83       237
   macro avg       0.83      0.83      0.83       237

Wow, 83% accuracy!

That’s the best accuracy we’ve seen thus far!

Clearly, when tuned properly, SVMs lend themselves well to non-linearly separable datasets.

Decision Trees

Figure 10: The concept of Decision Trees for machine learning classification can easily be explained with this figure. Given a feature vector and “set of questions” the bottom leaf represents the class. As you can see we’ll either “Go to the movies” or “Go to the beach”. There are two leaves for “Go to the movies” (nearly all complex decision trees will have multiple paths to arrive at the same conclusion with some shortcutting others).

The basic idea behind a decision tree is to break classification down into a set of choices about each entry in our feature vector.

We start at the root of the tree and then progress down to the leaves where the actual classification is made.

Unlike many machine learning algorithms such which may appear as a “black box” learning algorithm (where the route to the decision can be hard to interpret and understand), decision trees can be quite intuitive — we can actually visualize and interpret the choice the tree is making and then follow the appropriate path to classification.

For example, let’s pretend we are going to the beach for our vacation. We wake up the first morning of our vacation and check the weather report — sunny and 90 degrees Fahrenheit.

That leaves us with a decision to make: “What should we do today? Go to the beach? Or see a movie?”

Subconsciously, we may solve the problem by constructing a decision tree of our own (Figure 10).

First, we need to know if it’s sunny outside.

A quick check of the weather app on our smartphone confirms that it is indeed sunny.

We then follow the Sunny=Yes branch and arrive at the next decision — is it warmer than 70 degrees out?

Again, after checking the weather app we can confirm that it will be > 70 degrees outside today.

Following the >70=Yes branch leads us to a leaf of the tree and the final decision — it looks like we are going to the beach!

Internally, decision trees examine our input data and look for the best possible nodes/values to split on using algorithms such as CART or ID3. The tree is then automatically built for us and we are able to make predictions.

Let’s go ahead and apply the decision tree algorithm to the Iris dataset:

$ python classify_iris.py --model decision_tree
[INFO] loading data...
[INFO] using 'decision_tree' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.92      0.92      0.92        12
   virginica       0.91      0.91      0.91        11

   micro avg       0.95      0.95      0.95        38
   macro avg       0.94      0.94      0.94        38
weighted avg       0.95      0.95      0.95        38

Our decision tree is able to obtain 95% accuracy.

What about our image classification project?

$ python classify_images.py --model decision_tree
[INFO] extracting image features...
[INFO] using 'decision_tree' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.71      0.74      0.72        85
      forest       0.76      0.80      0.78        83
     highway       0.77      0.68      0.72        69

   micro avg       0.74      0.74      0.74       237
   macro avg       0.75      0.74      0.74       237
weighted avg       0.74      0.74      0.74       237

Here we obtain 74% accuracy — not the best but certainly not the worst either.

Random Forests

Figure 11: A Random Forest is a collection of decision trees. This machine learning method injects a level of “randomness” into the algorithm via bootstrapping and random node splits. The final classification result is calculated by tabulation/voting. Random Forests tend to be more accurate than decision trees. (source)

Since a forest is a collection of trees, a Random Forest is a collection of decision trees.

However, as the name suggestions, Random Forests inject a level of “randomness” that is not present in decision trees — this randomness is applied at two points in the algorithm.

Bootstrapping — Random Forest classifiers train each individual decision tree on a bootstrapped sample from the original training data. Essentially, bootstrapping is sampling with replacement a total of D times. Bootstrapping is used to improve the accuracy of our machine learning algorithms while reducing the risk of overfitting.
Randomness in node splits — For each decision tree a Random Forest trains, the Random Forest will only give the decision tree a portion of the possible features.

In practice, injecting randomness into the Random Forest classifier by bootstrapping training samples for each tree, followed by only allowing a subset of the features to be used for each tree, typically leads to a more accurate classifier.

At prediction time, each decision tree is queried and then the meta-Random Forest algorithm tabulates the final results.

Let’s try our Random Forest on the Iris dataset:

$ python classify_iris.py --model random_forest
[INFO] loading data...
[INFO] using 'random_forest' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.83      0.91        12
   virginica       0.85      1.00      0.92        11

   micro avg       0.95      0.95      0.95        38
   macro avg       0.95      0.94      0.94        38
weighted avg       0.96      0.95      0.95        38

As we can see, our Random Forest obtains 96% accuracy, slightly better than using just a single decision tree.

But what about for image classification?

Do Random Forests work well for our 3-scenes dataset?

$ python classify_images.py --model random_forest
[INFO] extracting image features...
[INFO] using 'random_forest' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.80      0.83      0.81        84
      forest       0.92      0.84      0.88        90
     highway       0.77      0.81      0.79        63

   micro avg       0.83      0.83      0.83       237
   macro avg       0.83      0.83      0.83       237
weighted avg       0.84      0.83      0.83       237

Using a Random Forest we’re able to obtain 84% accuracy, a full 10% better than using just a decision tree.

In general, if you find that decision trees work well for your machine learning and Python project, you may want to try Random Forests as well!

Neural Networks

Figure 12: Neural Networks are machine learning algorithms which are inspired by how the brains work. The Perceptron, a linear model, accepts a set of weights, computes the weighted sum, and then applies a step function to determine the class label.

One of the most common neural network models is the Perceptron, a linear model used for classification.

A Perceptron accepts a set of inputs, takes the dot product between the inputs and the weights, computes a weighted sum, and then applies a step function to determine the output class label.

We typically don’t use the original formulation of Perceptrons as we now have more advanced machine learning and deep learning models. Furthermore, since the advent of the backpropagation algorithm, we can train multi-layer Perceptrons (MLP).

Combined with non-linear activation functions, MLPs can solve non-linearly separable datasets as well.

Let’s apply a Multi-layer Perceptron machine learning algorithm to our Iris dataset using Python and scikit-learn:

$ python classify_iris.py --model mlp
[INFO] loading data...
[INFO] using 'mlp' model
[INFO] evaluating...
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      0.92      0.96        12
   virginica       0.92      1.00      0.96        11

   micro avg       0.97      0.97      0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

Our MLP performs well here, obtaining 98% classification accuracy.

Let’s move on to image classification with an MLP:

$ python classify_images.py --model mlp
[INFO] extracting image features...
[INFO] using 'mlp' model
[INFO] evaluating...
              precision    recall  f1-score   support

       coast       0.72      0.91      0.80        86
      forest       0.92      0.89      0.90        79
     highway       0.79      0.58      0.67        72

   micro avg       0.80      0.80      0.80       237
   macro avg       0.81      0.79      0.79       237
weighted avg       0.81      0.80      0.80       237

The MLP reaches 81% accuracy here — quite respectable given the simplicity of the model!

Deep Learning and Deep Neural Networks

Figure 13: Python is arguably the most popular language for Deep Learning, a subfield of machine learning. Deep Learning consists of neural networks with many hidden layers. The process of backpropagation tunes the weights iteratively as data is passed through the network. (source)

If you’re interested in machine learning and Python then you’ve likely encountered the term deep learning as well.

What exactly is deep learning?

And what makes it different than standard machine learning?

Well, to start, it’s first important to understand that deep learning is a subfield of machine learning, which is, in turn, a subfield of the larger Artificial Intelligence (AI) field.

The term “deep learning” comes from training neural networks with many hidden layers.

In fact, in the 1990s it was extremely challenging to train neural networks with more than two hidden layers due to (paraphrasing Geoff Hinton):

Our labeled datasets being too small
Our computers being far too slow
Not being able to properly initialize our neural network weights prior to training
Using the wrong type of nonlinearity function

It’s a different story now. We now have:

Faster computers
Highly optimized hardware (i.e., GPUs)
Large, labeled datasets
A better understanding of weight initialization
Superior activation functions

All of this has culminated at exactly the right time to give rise to the latest incarnation of deep learning.

And chances are, if you’re reading this tutorial on machine learning then you’re most likely interested in deep learning as well!

To gain some experience with neural networks, let’s implement one using Python and Keras.

Open up the

nn_iris.py

and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.datasets import load_iris

# load the Iris dataset and perform a training and testing split,
# using 75% of the data for training and 25% for evaluation
print("[INFO] loading data...")
dataset = load_iris()
(trainX, testX, trainY, testY) = train_test_split(dataset.data,
	dataset.target, test_size=0.25)

# encode the labels as 1-hot vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

Let’s import our packages.

Our Keras imports are for creating and training our simple neural network (Lines 2-4). You should recognize the scikit-learn imports by this point (Lines 5-8).

We’ll go ahead and load + split our data and one-hot encode our labels on Lines 13-20. A one-hot encoded vector consists of binary elements where one of them is “hot” such as

[0, 0, 1]

[1, 0, 0]

in the case of our three flower classes.

Now let’s build our neural network:

# define the 4-3-3-3 architecture using Keras
model = Sequential()
model.add(Dense(3, input_shape=(4,), activation="sigmoid"))
model.add(Dense(3, activation="sigmoid"))
model.add(Dense(3, activation="softmax"))

Our neural network consists of two fully connected layers using sigmoid activation.

The final layer has a “softmax classifier” which essentially means that it has an output for each of our classes and the outputs are probability percentages.

Let’s go ahead and train and evaluate our

model

# train the model using SGD
print("[INFO] training network...")
opt = SGD(lr=0.1, momentum=0.9, decay=0.1 / 250)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])
H = model.fit(trainX, trainY, validation_data=(testX, testY),
	epochs=250, batch_size=16)

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=16)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=dataset.target_names))

Our

model

is compiled on Lines 30-32 and then the training is initiated on Lines 33 and 34.

Just as with our previous two scripts, we’ll want to check on the performance by evaluating our network. This is accomplished by making predictions on our testing data and then printing a classification report (Lines 38-40).

There’s a lot going on under the hood in these short 40 lines of code. For an in-depth walkthrough of neural network fundamentals, please refer to the Starter Bundle of Deep Learning for Computer Vision with Python or the PyImageSearch Gurus course.

We’re down to the moment of truth — how will our neural network perform on the Iris dataset?

$ python nn_iris.py 
Using TensorFlow backend.
[INFO] loading data...
[INFO] training network...
Train on 112 samples, validate on 38 samples
Epoch 1/250
2019-01-04 10:28:19.104933: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
112/112 [==============================] - 0s 2ms/step - loss: 1.1454 - acc: 0.3214 - val_loss: 1.1867 - val_acc: 0.2368
Epoch 2/250
112/112 [==============================] - 0s 48us/step - loss: 1.0828 - acc: 0.3929 - val_loss: 1.2132 - val_acc: 0.5000
Epoch 3/250
112/112 [==============================] - 0s 47us/step - loss: 1.0491 - acc: 0.5268 - val_loss: 1.0593 - val_acc: 0.4737
...
Epoch 248/250
112/112 [==============================] - 0s 46us/step - loss: 0.1319 - acc: 0.9554 - val_loss: 0.0407 - val_acc: 1.0000
Epoch 249/250
112/112 [==============================] - 0s 46us/step - loss: 0.1024 - acc: 0.9643 - val_loss: 0.1595 - val_acc: 0.8947
Epoch 250/250
112/112 [==============================] - 0s 47us/step - loss: 0.0795 - acc: 0.9821 - val_loss: 0.0335 - val_acc: 1.0000
[INFO] evaluating network...
             precision    recall  f1-score   support

     setosa       1.00      1.00      1.00         9
 versicolor       1.00      1.00      1.00        10
  virginica       1.00      1.00      1.00        19

avg / total       1.00      1.00      1.00        38

Wow, perfect! We hit 100% accuracy!

This neural network is the first Python machine learning algorithm we’ve applied that’s been able to hit 100% accuracy on the Iris dataset.

The reason our neural network performed well here is because we leveraged:

Multiple hidden layers
Non-linear activation functions (i.e., the sigmoid activation function)

Given that our neural network performed so well on the Iris dataset we should assume similar accuracy on the image dataset as well, right? Well, we actually have a trick up our sleeve — to obtain even higher accuracy on image datasets we can use a special type of neural network called a Convolutional Neural Network.

Convolutional Neural Networks

Figure 14: Deep learning Convolutional Neural Networks (CNNs) operate directly on the pixel intensities of an input image alleviating the need to perform feature extraction. Layers of the CNN are stacked and patterns are learned automatically. (source)

Convolutional Neural Networks, or CNNs for short, are special types of neural networks that lend themselves well to image understanding tasks. Unlike most machine learning algorithms, CNNs operate directly on the pixel intensities of our input image — no need to perform feature extraction!

Internally, each convolution layer in a CNN is learning a set of filters. These filters are convolved with our input images and patterns are automatically learned. We can also stack these convolution operates just like any other layer in a neural network.

Let’s go ahead and learn how to implement a simple CNN and apply it to basic image classification.

Open up the

basic_cnn.py

script and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.optimizers import Adam
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from PIL import Image
from imutils import paths
import numpy as np
import argparse
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, default="3scenes",
	help="path to directory containing the '3scenes' dataset")
args = vars(ap.parse_args())

In order to build a Convolutional Neural Network for machine learning with Python and Keras, we’ll need five additional Keras imports on Lines 2-8.

This time, we’re importing convolutional layer types, max pooling operations, different activation functions, and the ability to flatten. Additionally, we’re using the

Adam

optimizer rather than SGD as we did in the previous simple neural network script.

You should be acquainted with the names of the scikit-learn and other imports by this point.

This script has a single command line argument,

--dataset

. It represents the path to the 3-scenes directory on disk again.

Let’s load the data now:

# grab all image paths in the input dataset directory, then initialize
# our list of images and corresponding class labels
print("[INFO] loading images...")
imagePaths = paths.list_images(args["dataset"])
data = []
labels = []

# loop over our input images
for imagePath in imagePaths:
	# load the input image from disk, resize it to 32x32 pixels, scale
	# the pixel intensities to the range [0, 1], and then update our
	# images list
	image = Image.open(imagePath)
	image = np.array(image.resize((32, 32))) / 255.0
	data.append(image)

	# extract the class label from the file path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-2]
	labels.append(label)

Similar to our

classify_images.py

script, we’ll go ahead and grab our

imagePaths

and build our data and labels lists.

There’s one caveat this time which you should not overlook:

We’re operating on the raw pixels themselves rather than a color statistics feature vector. Take the time to review

classify_images.py

once more and compare it to the lines of

basic_cnn.py

In order to operate on the raw pixel intensities, we go ahead and resize each image to 32×32 and scale to the range [0, 1] by dividing by

255.0

(the max value of a pixel) on Lines 36 and 37. Then we add the resized and scaled

image

to the

data

list (Line 38).

Let’s one-hot encode our labels and split our training/testing data:

# encode the labels, converting them from strings to integers
lb = LabelBinarizer()
labels = lb.fit_transform(labels)

# perform a training and testing split, using 75% of the data for
# training and 25% for evaluation
(trainX, testX, trainY, testY) = train_test_split(np.array(data),
	np.array(labels), test_size=0.25)

And then build our image classification CNN with Keras:

# define our Convolutional Neural Network architecture
model = Sequential()
model.add(Conv2D(8, (3, 3), padding="same", input_shape=(32, 32, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(16, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(3))
model.add(Activation("softmax"))

On Lines 55-67, demonstrate an elementary CNN architecture. The specifics aren’t important right now, but if you’re curious, you should:

Read my Keras Tutorial which will keep you get up to speed with Keras
Read through my book Deep Learning for Computer Vision with Python, which includes super practical walkthrough and hands-on tutorials
Go through my blog post on the Keras Conv2D parameters, including what each parameter does and when to utilize that specific parameter

Let’s go ahead and train + evaluate our CNN model:

# train the model using the Adam optimizer
print("[INFO] training network...")
opt = Adam(lr=1e-3, decay=1e-3 / 50)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])
H = model.fit(trainX, trainY, validation_data=(testX, testY),
	epochs=50, batch_size=32)

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=lb.classes_))

Our model is trained and evaluated similarly to our previous script.

Let’s give our CNN a try, shall we?

$ python basic_cnn.py 
Using TensorFlow backend.
[INFO] loading images...
[INFO] training network...
Train on 711 samples, validate on 237 samples
Epoch 1/50
711/711 [==============================] - 0s 629us/step - loss: 1.0647 - acc: 0.4726 - val_loss: 0.9920 - val_acc: 0.5359
Epoch 2/50
711/711 [==============================] - 0s 313us/step - loss: 0.9200 - acc: 0.6188 - val_loss: 0.7778 - val_acc: 0.6624
Epoch 3/50
711/711 [==============================] - 0s 308us/step - loss: 0.6775 - acc: 0.7229 - val_loss: 0.5310 - val_acc: 0.7553
...
Epoch 48/50
711/711 [==============================] - 0s 307us/step - loss: 0.0627 - acc: 0.9887 - val_loss: 0.2426 - val_acc: 0.9283
Epoch 49/50
711/711 [==============================] - 0s 310us/step - loss: 0.0608 - acc: 0.9873 - val_loss: 0.2236 - val_acc: 0.9325
Epoch 50/50
711/711 [==============================] - 0s 307us/step - loss: 0.0587 - acc: 0.9887 - val_loss: 0.2525 - val_acc: 0.9114
[INFO] evaluating network...
             precision    recall  f1-score   support

      coast       0.85      0.96      0.90        85
     forest       0.99      0.94      0.97        88
    highway       0.91      0.80      0.85        64

avg / total       0.92      0.91      0.91       237

Using machine learning and our CNN we are able to obtain 92% accuracy, far better than any of the previous machine learning algorithms we’ve tried in this tutorial!

Clearly, CNNs lend themselves very well to image understanding problems.

What do our Python + Machine Learning results mean?

On the surface, you may be tempted to look at the results of this post and draw conclusions such as:

“Logistic Regression performed poorly on image classification, I should never use Logistic Regression.”
“k-NN did fairly well at image classification, I’ll always use k-NN!”

Be careful with those types of conclusions and keep in mind the 5-step machine learning process I detailed earlier in this post:

Examine your problem
Prepare your data (raw data, feature extraction, feature engineering, etc.)
Spot-check a set of algorithms
Examine your results
Double-down on the algorithms that worked best

Each and every problem you encounter is going to be different in some manner.

Over time, and through lots of hands-on practice and experience, you will gain a “sixth sense” as to what machine learning algorithms will work well in a given situation.

However, until you reach that point you need to start by applying various machine learning algorithms, examining what works, and re-doubling your efforts on the algorithms that showed potential.

No two problems will be the same and, in some situations, a machine learning algorithm you once thought was “poor” will actually end up performing quite well!

Here’s how you can learn Machine Learning in Python

If you’ve made it this far in the tutorial, congratulate yourself!

It’s okay if you didn’t understand everything. That’s totally normal.

The goal of today’s post is to expose you to the world of machine learning and Python.

It’s also okay if you don’t have an intimate understanding of the machine learning algorithms covered today.

I’m a huge champion of “learning by doing” — rolling up your sleeves and doing hard work.

One of the best possible ways you can be successful in machine learning with Python is just to simply get started.

You don’t need a college degree in computer science or mathematics.

Sure, a degree like that can help at times but once you get deep into the machine learning field you’ll realize just how many people aren’t computer science/mathematics graduates.

They are ordinary people just like yourself who got their start in machine learning by installing a few Python packages, opening a text editor, and writing a few lines of code.

Ready to continue your education in machine learning, deep learning, and computer vision?

If so, click here to join the PyImageSearch Newsletter.

As a bonus, I’ll send you my FREE 17-page Computer Vision and OpenCV Resource Guide PDF.

Inside the guide, you’ll find my hand-picked tutorials, books, and courses to help you continue your machine learning education.

Sound good?

Just click the button below to get started!

Grab your free Computer Vision and Machine Learning Resource Guide

Summary

In this tutorial, you learned how to get started with machine learning and Python.

Specifically, you learned how to train a total of nine different machine learning algorithms:

k-Nearest Neighbors (k-NN)
Naive Bayes
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees
Random Forests
Perceptrons
Multi-layer, feedforward neural networks
Convolutional Neural Networks

We then applied our set of machine learning algorithms to two different domains:

Numerical data classification via the Iris dataset
Image classification via the 3-scenes dataset

I would recommend you use the Python code and associated machine learning algorithms in this tutorial as a starting point for your own projects.

Finally, keep in mind our five-step process of approaching a machine learning problem with Python (you may even want to print out these steps and keep them next to you):

Examine your problem
Prepare your data (raw data, feature extraction, feature engineering, etc.)
Spot-check a set of algorithms
Examine your results
Double-down on the algorithms that worked best

By using the code in today’s post you will be able to get your start in machine learning with Python — enjoy it and if you want to continue your machine learning journey, be sure to check out the PyImageSearch Gurus course, as well as my book, Deep Learning for Computer Vision with Python, where I cover machine learning, deep learning, and computer vision in detail.

To download the source code this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below.

Downloads:

The post Machine Learning in Python appeared first on PyImageSearch.

In this tutorial, you will learn how to perform regression using Keras and Deep Learning. You will learn how to train a Keras neural network for regression and continuous value prediction, specifically in the context of house price prediction.

Today’s post kicks off a 3-part series on deep learning, regression, and continuous value prediction.

We’ll be studying Keras regression prediction in the context of house price prediction:

Part 1: Today we’ll be training a Keras neural network to predict house prices based on categorical and numerical attributes such as the number of bedrooms/bathrooms, square footage, zip code, etc.
Part 2: Next week we’ll train a Keras Convolutional Neural Network to predict house prices based on input images of the houses themselves (i.e., frontal view of the house, bedroom, bathroom, and kitchen).
Part 3: In two weeks we’ll define and train a neural network that combines our categorical/numerical attributes with our images, leading to better, more accurate house price prediction than the attributes or images alone.

Unlike classification (which predicts labels), regression enables us to predict continuous values.

For example, classification may be able to predict one of the following values: {cheap, affordable, expensive}.

Regression, on the other hand, will be able to predict an exact dollar amount, such as “The estimated price of this house is $489,121”.

In many real-world situations, such as house price prediction or stock market forecasting, applying regression rather than classification is critical to obtaining good predictions.

To learn how to perform regression with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Regression with Keras

In the first part of this tutorial, we’ll briefly discuss the difference between classification and regression.

We’ll then explore the house prices dataset we’re using for this series of Keras regression tutorials.

From there, we’ll configure our development environment and review our project structure.

Along the way, we will learn how to use Pandas to load our house price dataset and define a neural network that for Keras regression prediction.

Finally, we’ll train our Keras network and then evaluate the regression results.

Classification vs. Regression

Figure 1: Classification networks predict labels (top). In contrast, regression networks can predict numerical values (bottom). We’ll be performing regression with Keras on a housing dataset in this blog post.

Typically on the PyImageSearch blog, we discuss Keras and deep learning in the context of classification — predicting a label to characterize the contents of an image or an input set of data.

Regression, on the other hand, enables us to predict continuous values. Let’s again consider the task of house price prediction.

As we know, classification is used to predict a class label.

For house price prediction we may define our categorical labels as:

labels = {very cheap, cheap, affordable, expensive, very expensive}

If we performed classification, our model could then learn to predict one of those five values based on a set of input features.

However, those labels are just that — categories that represent a potential range of prices for the house but do nothing to represent the actual cost of the home.

In order to predict the actual cost of a home, we need to perform regression.

Using regression we can train a model to predict a continuous value.

For example, while classification may only be able to predict a label, regression could say:

“Based on my input data, I estimate the cost of this house to be $781,993.”

Figure 1 above provides a visualization of performing both classification and regression.

In the rest of this tutorial, you’ll learn how to train a neural network for regression using Keras.

The House Prices Dataset

Figure 2: Performing regression with Keras on the house pricing dataset (Ahmed and Moustafa) will ultimately allow us to predict the price of a house given its image.

The dataset we’ll be using today is from 2016 paper, House price estimation from visual and textual features, by Ahmed and Moustafa.

The dataset includes both numerical/categorical attributes along with images for 535 data points, making it and excellent dataset to study for regression and mixed data prediction.

The house dataset includes four numerical and categorical attributes:

Number of bedrooms
Number of bathrooms
Area (i.e., square footage)
Zip code

These attributes are stored on disk in CSV format.

We’ll be loading these attributes from disk later in this tutorial using

pandas

, a popular Python package used for data analysis.

A total of four images are also provided for each house:

Bedroom
Bathroom
Kitchen
Frontal view of the house

The end goal of the houses dataset is to predict the price of the home itself.

In today’s tutorial, we’ll be working with just the numerical and categorical data.

Next week’s blog post will discuss working with the image data.

And finally, two weeks from now we’ll combine the numerical/categorical data with the images to obtain our best performing model.

But before we can train our Keras model for regression, we first need to configure our development environment and grab the data.

Configuring Your Development Environment

Figure 3: To perform regression with Keras, we’ll be taking advantage of several popular Python libraries including Keras + TensorFlow, scikit-learn, and pandas.

For this 3-part series of blog posts, you’ll need to have the following packages installed:

NumPy
scikit-learn
pandas
Keras with the TensorFlow backend (CPU or GPU)
OpenCV (for the next two blog posts in the series)

Luckily most of these are easily installed with pip, a Python package manager.

Let’s install the packages now, ideally into a virtual environment as shown (you’ll need to create the environment):

$ workon house_prices
$ pip install numpy
$ pip install scikit-learn
$ pip install pandas
$ pip install tensorflow # or tensorflow-gpu

Notice that I haven’t instructed you to install OpenCV yet. The OpenCV install can be slightly involved — especially if you are compiling from source. Let’s look at our options:

Compiling from source gives us the full install of OpenCV and provides access to optimizations, patented algorithms, custom software integrations, and more. The good news is that all of my OpenCV install tutorials are meticulously put together and updated regularly. With patience and attention to detail, you can compile from source just like I and many of my readers do.
Using pip to install OpenCV is hands-down the fastest and easiest way to get started with OpenCV and essentially just checks prerequisites and places a precompiled binary that will work on most systems into your virtual environment site-packages. Optimizations may or may not be active. The big caveat is that the maintainer has elected not to include patented algorithms for fear of lawsuits. There’s nothing wrong with using patented algorithms for educational and research purposes, but you should use alternative algorithms commercially. Nevertheless, the pip method is a great option for beginners just remember that you don’t have the full install.

Pip is sufficient for this 3-part series of blog posts. You can install OpenCV in your environment via:

$ workon house_prices
$ pip install opencv-contrib-python

Please reach out to me if you have any difficulties getting your environment established.

Downloading the House Prices Dataset

Before you download the dataset, go ahead and grab the source code to this post by using “Downloads” section.

From there, unzip the file and navigate into the directory:

$ cd path/to/downloaded/zip
$ unzip keras-regression.zip
$ cd keras-regression

From there, you can download the House Prices Dataset using the following command:

$ git clone https://github.com/emanhamed/Houses-dataset

When we are ready to train our Keras regression network you’ll then need to supply the path to the

Houses-dataset

directory via command line argument.

Project structure

Now that you have the dataset, go ahead and use the

tree

command with the same arguments shown below to print a directory + file listing for the project:

$ tree --dirsfirst --filelimit 10
.
├── Houses-dataset
│   ├── Houses Dataset [2141 entries]
│   └── README.md
├── pyimagesearch
│   ├── __init__.py
│   ├── datasets.py
│   └── models.py
└── mlp_regression.py

3 directories, 5 files

The dataset downloaded from GitHub now resides in the

Houses-dataset/

folder.

The

pyimagesearch/

directory is actually a module included with the code “Downloads” where inside, you’ll find:

```
datasets.py
```
: Our script for loading the numerical/categorical data from the dataset
```
models.py
```
: Our Multi-Layer Perceptron architecture implementation

These two scripts will be reviewed today. Additionally, we’ll be reusing both

datasets.py

and

models.py

(with modifications) in the next two tutorials to keep our code organized and reusable.

The regression + Keras script is contained in

mlp_regression.py

which we’ll be reviewing it as well.

Loading the House Prices Dataset

Figure 4: We’ll use Python and pandas to read a CSV file in this blog post.

Before we can train our Keras regression model we first need to load the numerical and categorical data for the houses dataset.

Open up the

datasets.py

file an insert the following code:

# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
import glob
import cv2
import os

def load_house_attributes(inputPath):
	# initialize the list of column names in the CSV file and then
	# load it using Pandas
	cols = ["bedrooms", "bathrooms", "area", "zipcode", "price"]
	df = pd.read_csv(inputPath, sep=" ", header=None, names=cols)

We begin by importing libraries and modules from scikit-learn, pandas, NumPy and OpenCV. OpenCV will be used next week as we’ll be adding the ability to load images to this script.

On Line 10, we define the

load_house_attributes

function which accepts the path to the input dataset.

Inside the function we start off by defining the names of the columns in the CSV file (Line 13). From there, we use pandas’ function,

read_csv

to load the CSV file into memory as a date frame (

df

) on Line 14.

Below you can see an example of our input data, including the number of bedrooms, number of bathrooms, area (i.e., square footage), zip code, code, and finally the target price our model should be trained to predict:

bedrooms  bathrooms  area  zipcode     price
0         4        4.0  4053    85255  869500.0
1         4        3.0  3343    36372  865200.0
2         3        4.0  3923    85266  889000.0
3         5        5.0  4022    85262  910000.0
4         3        4.0  4116    85266  971226.0

Let’s finish up the rest of the

load_house_attributes

function:

# determine (1) the unique zip codes and (2) the number of data
	# points with each zip code
	zipcodes = df["zipcode"].value_counts().keys().tolist()
	counts = df["zipcode"].value_counts().tolist()

	# loop over each of the unique zip codes and their corresponding
	# count
	for (zipcode, count) in zip(zipcodes, counts):
		# the zip code counts for our housing dataset is *extremely*
		# unbalanced (some only having 1 or 2 houses per zip code)
		# so let's sanitize our data by removing any houses with less
		# than 25 houses per zip code
		if count < 25:
			idxs = df[df["zipcode"] == zipcode].index
			df.drop(idxs, inplace=True)

	# return the data frame
	return df

In the remaining lines, we:

Determine the unique set of zip codes and then count the number of data points with each unique zip code (Lines 18 and 19).
Filter out zip codes with low counts (Line 28). For some zip codes we only have one or two data points, making it extremely challenging, if not impossible, to obtain accurate house price estimates.
Return the data frame to the calling function (Line 33).

Now let’s create the

process_house_attributes

function used to preprocess our data:

def process_house_attributes(df, train, test):
	# initialize the column names of the continuous data
	continuous = ["bedrooms", "bathrooms", "area"]

	# performin min-max scaling each continuous feature column to
	# the range [0, 1]
	cs = MinMaxScaler()
	trainContinuous = cs.fit_transform(train[continuous])
	testContinuous = cs.transform(test[continuous])

We define the function on Line 35. The

process_house_attributes

function accepts three parameters:

```
df
```
: Our data frame generated by pandas (the previous function helps us to drop some records from the data frame)
```
train
```
: Our training data for the House Prices Dataset
```
test
```
: Our testing data.

Then on Line 37, we define the columns of our our continuous data, including bedrooms, bathrooms, and size of the home.

We’ll take these values and use scikit-learn’s

MinMaxScaler

to scale the continuous features to the range [0, 1] (Lines 41-43).

Now we need to pre-process our categorical features, namely the zip code:

# one-hot encode the zip code categorical data (by definition of
	# one-hot encoing, all output features are now in the range [0, 1])
	zipBinarizer = LabelBinarizer().fit(df["zipcode"])
	trainCategorical = zipBinarizer.transform(train["zipcode"])
	testCategorical = zipBinarizer.transform(test["zipcode"])

	# construct our training and testing data points by concatenating
	# the categorical features with the continuous features
	trainX = np.hstack([trainCategorical, trainContinuous])
	testX = np.hstack([testCategorical, testContinuous])

	# return the concatenated training and testing data
	return (trainX, testX)

First, we’ll one-hot encode the zip codes (Lines 47-49).

Then we’ll concatenate the categorical features with the continuous features using NumPy’s

hstack

function (Lines 53 and 54), returning the resulting training and testing sets as a tuple (Line 57).

Keep in mind that now both our categorical features and continuous features are all in the range [0, 1].

Implementing a Neural Network for Regression

Figure 5: Our Keras regression architecture. The input to the network is a datapoint including a home’s # Bedrooms, # Bathrooms, Area/square footage, and zip code. The output of the network is a single neuron with a linear activation function. Linear activation allows the neuron to output the predicted price of the home.

Before we can train a Keras network for regression, we first need to define the architecture itself.

Today we’ll be using a simple Multilayer Perceptron (MLP) as shown in Figure 5.

Open up the

models.py

file and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.layers import Flatten
from keras.layers import Input
from keras.models import Model

def create_mlp(dim, regress=False):
	# define our MLP network
	model = Sequential()
	model.add(Dense(8, input_dim=dim, activation="relu"))
	model.add(Dense(4, activation="relu"))

	# check to see if the regression node should be added
	if regress:
		model.add(Dense(1, activation="linear"))

	# return our model
	return model

First, we’ll import all of the necessary modules from Keras (Lines 2-11). We’ll be adding a Convolutional Neural Network to this file in next week’s tutorial, hence the additional imports that aren’t utilized here today.

Let’s define the MLP architecture by writing a function to generate it called

create_mlp

The function accepts two parameters:

```
dim
```
: Defines our input dimensions
```
regress
```
: A boolean defining whether or not our regression neuron should be added

We’ll go ahead and start construction our MLP with a

dim-8-4

architecture (Lines 15-17).

If we are performing regression, we add a

Dense

layer containing a single neuron with a linear activation function (Lines 20 and 21). Typically we use ReLU-based activations, but since we are performing regression we need a linear activation.

Finally, our

model

is returned on Line 24.

Implementing our Keras Regression Script

It’s now time to put all the pieces together!

Open up the

mlp_regression.py

file and insert the following code:

# import the necessary packages
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from pyimagesearch import datasets
from pyimagesearch import models
import numpy as np
import argparse
import locale
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, required=True,
	help="path to input dataset of house images")
args = vars(ap.parse_args())

We begin by importing necessary packages, modules, and libraries.

Namely, we’ll need the

Adam

optimizer from Keras,

train_test_split

from scikit-learn, and our

datasets

models

functions from the

pyimagesearch

module.

Additionally, we’ll use math features from NumPy for collecting statistics when we evaluate our model.

The

argparse

module is for parsing command line arguments.

Our script requires just one command line argument

--dataset

(Lines 12-15). You’ll need to provide the

--dataset

switch and the actual path to the dataset when you go to run the training script in your terminal.

Let’s load the house dataset attributes and construct our training and testing splits:

# construct the path to the input .txt file that contains information
# on each house in the dataset and then load the dataset
print("[INFO] loading house attributes...")
inputPath = os.path.sep.join([args["dataset"], "HousesInfo.txt"])
df = datasets.load_house_attributes(inputPath)

# construct a training and testing split with 75% of the data used
# for training and the remaining 25% for evaluation
print("[INFO] constructing training/testing split...")
(train, test) = train_test_split(df, test_size=0.25, random_state=42)

Using our handy

load_house_attributes

function, and by passing the

inputPath

to the dataset itself, our data is loaded into memory (Lines 20 and 21).

Our training (75%) and testing (25%) data is constructed via Line 26 and scikit-learn’s

train_test_split

method.

Let’s scale our house pricing data:

# find the largest house price in the training set and use it to
# scale our house prices to the range [0, 1] (this will lead to
# better training and convergence)
maxPrice = train["price"].max()
trainY = train["price"] / maxPrice
testY = test["price"] / maxPrice

As stated in the comment, scaling our house prices to the range [0, 1] will allow our model to more easily train and converge. Scaling the output targets to [0, 1] will reduce the range of our output predictions (versus [0,

maxPrice

]) and make it not only easier and faster to train our network but enable our model to obtain better results as well.

Thus, we grab the maximum price in the training set (Line 31), and proceed to scale our training and testing data accordingly (Lines 32 and 33).

Let’s process the house attributes now:

# process the house attributes data by performing min-max scaling
# on continuous features, one-hot encoding on categorical features,
# and then finally concatenating them together
print("[INFO] processing data...")
(trainX, testX) = datasets.process_house_attributes(df, train, test)

Recall from the

datasets.py

script that the

process_house_attributes

function:

Pre-processes our categorical and continuous features.
Scales our continuous features to the range [0, 1] via min-max scaling.
One-hot encodes our categorical features.
Concatenates the categorical and continuous features to form the final feature vector.

Now let’s go ahead and fit our MLP model to the data:

# create our MLP and then compile the model using mean absolute
# percentage error as our loss, implying that we seek to minimize
# the absolute percentage difference between our price *predictions*
# and the *actual prices*
model = models.create_mlp(trainX.shape[1], regress=True)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)

# train the model
print("[INFO] training model...")
model.fit(trainX, trainY, validation_data=(testX, testY),
	epochs=200, batch_size=8)

Our

model

is initialized with the

Adam

optimizer (Lines 45 and 46) and then compiled (Line 47). Notice that we’re using mean absolute percentage error as our loss function, indicating that we seek to minimize the mean percentage difference between the predicted price and the actual price.

The actual training process is kicked off on Lines 51 and 52.

After training is complete we can evaluate our model and summarize our results:

# make predictions on the testing data
print("[INFO] predicting house prices...")
preds = model.predict(testX)

# compute the difference between the *predicted* house prices and the
# *actual* house prices, then compute the percentage difference and
# the absolute percentage difference
diff = preds.flatten() - testY
percentDiff = (diff / testY) * 100
absPercentDiff = np.abs(percentDiff)

# compute the mean and standard deviation of the absolute percentage
# difference
mean = np.mean(absPercentDiff)
std = np.std(absPercentDiff)

# finally, show some statistics on our model
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
print("[INFO] avg. house price: {}, std house price: {}".format(
	locale.currency(df["price"].mean(), grouping=True),
	locale.currency(df["price"].std(), grouping=True)))
print("[INFO] mean: {:.2f}%, std: {:.2f}%".format(mean, std))

Line 56 instructs Keras to make predictions on our testing set.

Using the predictions, we compute the:

Difference between predicted house prices and the actual house prices (Line 61).
Percentage difference (Line 62).
Absolute percentage difference (Line 63).

From there, on Lines 67 and 68, we calculate the mean and standard deviation of the absolute percentage difference.

The results are printed via Lines 72-75.

Regression with Keras wasn’t so tough, now was it?

Let’s train the model and analyze the results!

Keras Regression Results

Figure 6: For today’s blog post, our Keras regression model takes four numerical inputs, producing one numerical output: the predicted value of a home.

To train our own Keras network for regression and house price prediction make sure you have:

Configured your development environment according to the guidance above.
Used the “Downloads” section of this tutorial to download the source code.
Downloaded the house prices dataset based on the instructions in the “The House Prices Dataset” section above.

From there, open up a terminal and supply the following command (making sure the

--dataset

command line argument points to where you downloaded the house prices dataset):

$ python mlp_regression.py --dataset Houses-dataset/Houses\ Dataset/
[INFO] loading house attributes...
[INFO] constructing training/testing split...
[INFO] processing data...
[INFO] training model...
Train on 271 samples, validate on 91 samples
Epoch 1/200
271/271 [==============================] - 0s 680us/step - loss: 84.0388 - val_loss: 61.7484
Epoch 2/200
271/271 [==============================] - 0s 110us/step - loss: 49.6822 - val_loss: 50.4747
Epoch 3/200
271/271 [==============================] - 0s 112us/step - loss: 42.8826 - val_loss: 43.5433
Epoch 4/200
271/271 [==============================] - 0s 112us/step - loss: 38.8050 - val_loss: 40.4323
Epoch 5/200
271/271 [==============================] - 0s 112us/step - loss: 36.4507 - val_loss: 37.1915
Epoch 6/200
271/271 [==============================] - 0s 112us/step - loss: 34.3506 - val_loss: 35.5639
Epoch 7/200
271/271 [==============================] - 0s 111us/step - loss: 33.2662 - val_loss: 37.5819
Epoch 8/200
271/271 [==============================] - 0s 108us/step - loss: 32.8633 - val_loss: 30.9948
Epoch 9/200
271/271 [==============================] - 0s 110us/step - loss: 30.4942 - val_loss: 30.6644
Epoch 10/200
271/271 [==============================] - 0s 107us/step - loss: 28.9909 - val_loss: 28.8961
...
Epoch 195/200
271/271 [==============================] - 0s 111us/step - loss: 20.8431 - val_loss: 21.4466
Epoch 196/200
271/271 [==============================] - 0s 109us/step - loss: 22.2301 - val_loss: 21.8503
Epoch 197/200
271/271 [==============================] - 0s 112us/step - loss: 20.5079 - val_loss: 21.5884
Epoch 198/200
271/271 [==============================] - 0s 108us/step - loss: 21.0525 - val_loss: 21.5993
Epoch 199/200
271/271 [==============================] - 0s 112us/step - loss: 20.4717 - val_loss: 23.7256
Epoch 200/200
271/271 [==============================] - 0s 107us/step - loss: 21.7630 - val_loss: 26.0129
[INFO] predicting house prices...
[INFO] avg. house price: $533,388.27, std house price: $493,403.08
[INFO] mean: 26.01%, std: 18.11%

As you can see from our output, our initial mean absolute percentage error starts off as high as 84% and then quickly drops to under 30%.

By the time we finish training we can see our network starting to overfit a bit. Our training loss is as low as ~21%; however, our validation loss is at ~26%.

Computing our final mean absolute percentage error we obtain a final value of 26.01%.

What does this value mean?

Our final mean absolute percentage error implies, that on average, our network will be ~26% off in its house price predictions with a standard deviation of ~18%.

Limitations of the House Price Dataset

Being 26% off in a house price prediction is a good start but is certainly not the type of accuracy we are looking for.

That said, this prediction accuracy can also be seen as a limitation of the house price dataset itself.

Keep in mind that the dataset only includes four attributes:

Number of bedrooms
Number of bathrooms
Area (i.e., square footage)
Zip code

Most other house price datasets include many more attributes.

For example, the Boston House Prices Dataset includes a total of fourteen attributes which can be leveraged for house price prediction (although that dataset does have some racial discrimination).

The Ames House Dataset includes over 79 different attributes which can be used to train regression models.

When you think about it, the fact that we are able to even obtain 26% mean absolute percentage error without the knowledge of an expert real estate agent is fairly reasonable given:

There are only 535 total houses in the dataset (we only used 362 total houses for the purpose of this guide).
We only have four attributes to train our regression model on.
The attributes themselves, while important in describing the home itself, do little to characterize the area surrounding the house.
The house prices are incredibly varied with a mean of $533K and a standard deviation of $493K (based on our filtered dataset of 362 homes).

With all that said, learning how to perform regression with Keras is an important skill!

In the next two posts in this series I’ll be showing you how to:

Leverage the images provided with the house price dataset to train a CNN on them.
Combine our numerical/categorical data with the house images, leading to a model that outperforms all of our previous Keras regression experiments.

Summary

In this tutorial, you learned how to use the Keras deep learning library for regression.

Specifically, we used Keras and regression to predict the price of houses based on four numerical and categorical attributes:

Number of bedrooms
Number of bathrooms
Area (i.e., square footage)
Zip code

Overall our neural network obtained a mean absolute percentage error of 26.01%, implying that, on average, our house price predictions will be off by 26.01%.

That raises the questions:

How can we better our house price prediction accuracy?
What if we leveraged images for each house? Would that improve accuracy?
Is there some way to combine both our categorical/numerical attributes with our image data?

To answer these questions you’ll need to stay tuned for the remaining to tutorials in this Keras regression series.

To download the source code to this post (and be notified when the next tutorial is published here on PyImageSearch), just enter your email address in the form below.

Downloads:

The post Regression with Keras appeared first on PyImageSearch.

In this tutorial, you will learn how to train a Convolutional Neural Network (CNN) for regression prediction with Keras. You’ll then train a CNN to predict house prices from a set of images.

Today is part two in our three-part series on regression prediction with Keras:

Part 1: Basic regression with Keras — predicting house prices from categorical and numerical data.
Part 2: Regression with Keras and CNNs — training a CNN to predict house prices from image data (today’s tutorial).
Part 3: Combining categorical, numerical, and image data into a single network (next week’s tutorial).

Today’s tutorial builds on last week’s basic Keras regression example, so if you haven’t read it yet make sure you go through it in order to follow along here today.

By the end of this guide, you’ll not only have a strong understanding of training CNNs for regression prediction with Keras, but you’ll also have a Python code template you can follow for your own projects.

To learn how to train a CNN for regression prediction with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras, Regression, and CNNs

In the first part of this tutorial, we’ll discuss our house prices dataset which consists of not only numerical/categorical data but also image data as well. From there we’ll briefly review our project structure.

We’ll then create two Python helper functions:

The first one will be used to load our house price images from disk
The second method will be used to construct our Keras CNN architecture

Finally, we’ll implement our training script and then train a Keras CNN for regression prediction.

We’ll also review our results and suggest further methods to improve our prediction accuracy.

Again, I want to reiterate that you should read last week’s tutorial on basic regression prediction before continuing — we’ll be building off not only the concepts from last week but the source code as well.

As you’ll find out in the rest of today’s tutorial, performing regression with CNNs and Keras is as simple as:

Removing the fully-connected softmax classifier layer typically used for classification
Replacing it with a fully-connected layer with a single node along with a linear activation function.
Training the model with a continuous value prediction loss function such as mean squared error, mean absolute error, mean absolute percentage error, etc.

Let’s go ahead get started!

Predicting house prices…with images?

Figure 1: Our CNN takes input from multiple images of the inside and outside of a home and outputs a predicted price using Keras and regression.

The dataset we’re using for this series of tutorials was curated by Ahmed and Moustafa in their 2016 paper, House price estimation from visual and textual features.

As far as I know, this is the first publicly available dataset that includes both numerical/categorical attributes along with images.

The numerical and categorical attributes include:

Number of bedrooms
Number of bathrooms
Area (i.e., square footage)
Zip code

Four images of each house are also provided:

Bedroom
Bathroom
Kitchen
Frontal view of the house

A total of 535 houses are included in the dataset, therefore there are 535 x 4 = 2,140 total images in the dataset.

We’ll be pruning that number down to 362 houses (1,448 images) during our data cleaning.

To download the house prices dataset you can just clone Ahmed and Moustafa’s GitHub repository:

$ cd ~
$ git clone https://github.com/emanhamed/Houses-dataset

That single command will download both the numerical/categorical data along with the images themselves.

Make note of where you downloaded the repository on the disk (I put it in my home folder) as you’ll need to supply the path to the repo via command line argument later in this tutorial.

For more information on the house prices dataset please refer to last week’s blog post.

Project structure

Let’s look at the structure of today’s project:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   ├── datasets.py
│   └── models.py
└── cnn_regression.py

1 directory, 4 files

We will be updating both

datasets.py

and

models.py

from last week’s tutorial with additional functionality.

Our training script,

cnn_regression.py

, is completely new this week and it will take advantage of the aforementioned updates.

Loading the house prices image dataset

Figure 2: Our CNN accepts a single image — a montage of four images from the home. Using the montage, our CNN then uses regression to predict the value of the home with the Keras framework.

As we know, our house prices dataset includes four images associated with each house:

Bedroom
Bathroom
Kitchen
Frontal view of the house

But how are we going to use these images to train our CNN?

We essentially have three options:

Pass the images one at a time through the CNN and use the price of the house as the target value for each image
Utilize multiple inputs with Keras and have four independent CNN-like branches that eventually merge into a single output
Create a montage that combines/tiles all four images into a single image and then pass the montage through the CNN

The first option is a poor choice — we’ll have multiple images with the same target price.

If anything we’re just going to end up “confusing” our CNN, making it impossible for the network to learn how to correlate the prices with the input images.

The second option is also not a good idea — the network will be computationally wasteful and harder to train with four independent tensors as inputs. Each branch will then have its own set of CONV layers that will eventually need to be merged into a single output.

Instead, we should choose the third option where we combine all four images into a single image and then pass that image through the CNN (as depicted in Figure 2 above).

For each house in our dataset, we will create a corresponding tiled image that that includes:

The bathroom image in the top-left
The bedroom image in the top-right
The frontal view in the bottom-right
The kitchen in the bottom-left

This tiled image will then be passed through the CNN using the house price as the target predicted value.

The benefit of this approach is that we are:

Allowing the CNN to learn from all photos of the house rather than trying to pass the house photos through the CNN one at a time
Enabling the CNN to learn discriminative filters from all house photos at once (i.e., not “confusing” the CNN with different images with identical target predicted values)

To learn how we can tile the images for each house, let’s take a look at the

load_house_images

function in our

datasets.py

file:

def load_house_images(df, inputPath):
	# initialize our images array (i.e., the house images themselves)
	images = []

	# loop over the indexes of the houses
	for i in df.index.values:
		# find the four images for the house and sort the file paths,
		# ensuring the four are always in the *same order*
		basePath = os.path.sep.join([inputPath, "{}_*".format(i + 1)])
		housePaths = sorted(list(glob.glob(basePath)))

The

load_house_images

function accepts two parameters:

```
df
```
: The houses data frame.
```
inputPath
```
: Our dataset path.

Using these parameters, we proceed by initializing a list of

images

that will be returned to the calling function, once processed.

From there we begin looping (Line 64) over the indexes in our data frame (i.e., one unique index for each house). In the loop we:

Construct the
```
basePath
```
to the four images for the current index (Line 67).
Use
```
glob
```
to grab the four image paths (Line 68).

The

glob

function uses our input path with the wildcard and then finds all input paths that match our pattern.

In the next code block we’re going to populate a list containing the four images:

# initialize our list of input images along with the output image
		# after *combining* the four input images
		inputImages = []
		outputImage = np.zeros((64, 64, 3), dtype="uint8")

		# loop over the input house paths
		for housePath in housePaths:
			# load the input image, resize it to be 32 32, and then
			# update the list of input images
			image = cv2.imread(housePath)
			image = cv2.resize(image, (32, 32))
			inputImages.append(image)

Continuing in the loop, we proceed to:

Initialize our
```
inputImages
```
list and allocate memory for our tiled image,
```
outputImage
```
(Lines 72 and 73).
Create a nested loop over
```
housePaths
```
(Line 76) to load each
```
image
```
, resize to 32×32, and update the
```
inputImages
```
list (Lines 79-81).

And from there, we’ll tile the four images into one montage, eventually returning all of the montages:

# tile the four input images in the output image such the first
		# image goes in the top-right corner, the second image in the
		# top-left corner, the third image in the bottom-right corner,
		# and the final image in the bottom-left corner
		outputImage[0:32, 0:32] = inputImages[0]
		outputImage[0:32, 32:64] = inputImages[1]
		outputImage[32:64, 32:64] = inputImages[2]
		outputImage[32:64, 0:32] = inputImages[3]

		# add the tiled image to our set of images the network will be
		# trained on
		images.append(outputImage)

	# return our set of images
	return np.array(images)

To finish off the loop, we:

Tile the input images using NumPy array slicing (Lines 87-90).
Update
```
images
```
list (Line 94).

Once the process of creating the tiles is done, we go ahead and return the set of

images

to the calling function on Line 97.

Using Keras to implement a CNN for regression

Figure 3: If we’re performing regression with a CNN, we’ll add a fully connected layer with linear activation.

Let’s go ahead and implement our Keras CNN for regression prediction.

Open up the

models.py

file and insert the following code:

def create_cnn(width, height, depth, filters=(16, 32, 64), regress=False):
	# initialize the input shape and channel dimension, assuming
	# TensorFlow/channels-last ordering
	inputShape = (height, width, depth)
	chanDim = -1

Our

create_cnn

function will return our CNN model which we will compile and train in our training script.

The

create_cnn

function accepts five parameters:

```
width
```
: The width of the input images in pixels.
```
height
```
: How many pixels tall the input images are.
```
filters
```
: A tuple of progressively larger filters so that our network can learn more discriminate features.
```
regress
```
: A boolean indicating whether or not a fully-connected linear activation layer will be appended to the CNN for regression purposes.

The

inputShape

of our network is defined on Line 29. It assumes “channels last” ordering for the TensorFlow backend.

Let’s go ahead and define the input to the model and begin creating our

CONV => RELU > BN => POOL

layer set:

# define the model input
	inputs = Input(shape=inputShape)

	# loop over the number of filters
	for (i, f) in enumerate(filters):
		# if this is the first CONV layer then set the input
		# appropriately
		if i == 0:
			x = inputs

		# CONV => RELU => BN => POOL
		x = Conv2D(f, (3, 3), padding="same")(x)
		x = Activation("relu")(x)
		x = BatchNormalization(axis=chanDim)(x)
		x = MaxPooling2D(pool_size=(2, 2))(x)

Our model

inputs

are defined on Line 33.

From there, on Line 36, we loop over the filters and create a set of

CONV => RELU > BN => POOL

layers. Each iteration of the loop appends these layers. Be sure to check out Chapter 11 from the Starter Bundle of Deep Learning for Computer Vision with Python for more information on these layer types.

Let’s finish building our CNN:

# flatten the volume, then FC => RELU => BN => DROPOUT
	x = Flatten()(x)
	x = Dense(16)(x)
	x = Activation("relu")(x)
	x = BatchNormalization(axis=chanDim)(x)
	x = Dropout(0.5)(x)

	# apply another FC layer, this one to match the number of nodes
	# coming out of the MLP
	x = Dense(4)(x)
	x = Activation("relu")(x)

	# check to see if the regression node should be added
	if regress:
		x = Dense(1, activation="linear")(x)

	# construct the CNN
	model = Model(inputs, x)

	# return the CNN
	return model

Flatten

the next layer (Line 49) and then add a fully-connected layer with

BatchNormalization

and

Dropout

(Lines 50-53).

Another fully-connected layer is applied to match the four nodes coming out of the multi-layer perceptron (Lines 57 and 58).

On Line 61 and 62, a check is made to see if the regression node should be appended; it is then added it accordingly.

Finally, the model is constructed from our

inputs

and all the layers we’ve assembled together,

(Line 65).

We can then

return

the

model

to the calling function (Line 68).

Implementing the regression training script

Now that we’ve implemented our dataset loader utility function along with our Keras CNN for regression, let’s go ahead and create the training script.

Open up the

cnn_regression.py

file and insert the following code:

# import the necessary packages
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from pyimagesearch import datasets
from pyimagesearch import models
import numpy as np
import argparse
import locale
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, required=True,
	help="path to input dataset of house images")
args = vars(ap.parse_args())

The imports for our training script are taken care of on Lines 2-9. Most notably we’re importing our helper functions from

datasets

and

models

. The

locale

package will help us with formatting our currencies.

From there we parse a single argument using argparse:

--dataset

. This flag and the argument itself allows us to specify the path to the dataset from our terminal without modifying the script.

Now let’s load, preprocess, and split our data:

# construct the path to the input .txt file that contains information
# on each house in the dataset and then load the dataset
print("[INFO] loading house attributes...")
inputPath = os.path.sep.join([args["dataset"], "HousesInfo.txt"])
df = datasets.load_house_attributes(inputPath)

# load the house images and then scale the pixel intensities to the
# range [0, 1]
print("[INFO] loading house images...")
images = datasets.load_house_images(df, args["dataset"])
images = images / 255.0

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
split = train_test_split(df, images, test_size=0.25, random_state=42)
(trainAttrX, testAttrX, trainImagesX, testImagesX) = split

Our

inputPath

on Line 20 contains the path to our CSV file containing the numerical and categorical attributes along with the target price for each home.

Our dataset is loaded using the

load_house_attributes

convenience function we defined in last week’s tutorial (Line 21). The result is a pandas data frame,

df

, containing the numerical/categorical attributes.

The actual numerical and categorical attributes aren’t used in this tutorial, but we do use the data frame in order to load the

images

on Line 26 using the convenience function we defined earlier in today’s blog post.

We go ahead and scale our images’ pixel intensities to the range [0, 1] on Line 27.

Then our dataset training and testing splits are constructed using scikit-learn’s handy

train_test_split

function (Lines 31 and 32).

Again, we will not be using the numerical/categorical data here today, just the images themselves. The numerical/categorical data is used in part one (last week) and part three (next week) of this series.

Now let’s scale our pricing data and train our model:

# find the largest house price in the training set and use it to
# scale our house prices to the range [0, 1] (will lead to better
# training and convergence)
maxPrice = trainAttrX["price"].max()
trainY = trainAttrX["price"] / maxPrice
testY = testAttrX["price"] / maxPrice

# create our Convolutional Neural Network and then compile the model
# using mean absolute percentage error as our loss, implying that we
# seek to minimize the absolute percentage difference between our
# price *predictions* and the *actual prices*
model = models.create_cnn(64, 64, 3, regress=True)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)

# train the model
print("[INFO] training model...")
model.fit(trainImagesX, trainY, validation_data=(testImagesX, testY),
	epochs=200, batch_size=8)

Here we have:

Scaled the house prices to the range [0, 1] based on the
```
maxPrice
```
(Lines 37-39). Performing this scaling will lead to better training and faster convergence.
Created and compiled our model using the
```
Adam
```
optimizer (Lines 45-47). We are using mean absolute percentage error as our loss function and we’ve set
```
regress=True
```
indicating that we want to perform regression.
Kicked of the training process (Lines 51 and 52).

Now let’s evaluate the results!

# make predictions on the testing data
print("[INFO] predicting house prices...")
preds = model.predict(testImagesX)

# compute the difference between the *predicted* house prices and the
# *actual* house prices, then compute the percentage difference and
# the absolute percentage difference
diff = preds.flatten() - testY
percentDiff = (diff / testY) * 100
absPercentDiff = np.abs(percentDiff)

# compute the mean and standard deviation of the absolute percentage
# difference
mean = np.mean(absPercentDiff)
std = np.std(absPercentDiff)

# finally, show some statistics on our model
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
print("[INFO] avg. house price: {}, std house price: {}".format(
	locale.currency(df["price"].mean(), grouping=True),
	locale.currency(df["price"].std(), grouping=True)))
print("[INFO] mean: {:.2f}%, std: {:.2f}%".format(mean, std))

In order to evaluate our house prices model based on image data using regression, we:

Make predictions on test data (Line 56).
Compute absolute percentage difference (Lines 61-63) and use that to derive our final metrics (Lines 67 and 68).
Display evaluation information in our terminal (Lines 72-75).

That’s a wrap, but…

Don’t be fooled by how succinct this training script is!

There is a lot going on under the hood with our convenience functions to load the data + create the CNN and the training process which tunes all the weights to the neurons. To brush up on convolutional neural networks, please refer to the Starter Bundle of Deep Learning for Computer Vision with Python.

Training our regression CNN

Ready to train your Keras CNN for regression prediction?

Make sure you have:

Configured your development environment according to last week’s tutorial.
Used the “Downloads” section of this tutorial to download the source code.
Downloaded the house prices dataset using the instructions in the “Predicting house prices…with images?” section above.

From there, open up a terminal and execute the following command:

$ python cnn_regression.py --dataset ~/Houses-dataset/Houses\ Dataset/
[INFO] loading house attributes...
[INFO] loading house images...
[INFO] training model...
Train on 271 samples, validate on 91 samples
Epoch 1/200
271/271 [==============================] - 2s 8ms/step - loss: 2005.3643 - val_loss: 3911.4023
Epoch 2/200
271/271 [==============================] - 1s 5ms/step - loss: 1238.6622 - val_loss: 1440.2142
Epoch 3/200
271/271 [==============================] - 1s 5ms/step - loss: 1016.0744 - val_loss: 2473.1472
Epoch 4/200
271/271 [==============================] - 1s 5ms/step - loss: 822.4028 - val_loss: 1175.3730
Epoch 5/200
271/271 [==============================] - 1s 5ms/step - loss: 663.9282 - val_loss: 1278.4540
Epoch 6/200
271/271 [==============================] - 1s 5ms/step - loss: 670.1193 - val_loss: 860.3962
Epoch 7/200
271/271 [==============================] - 1s 5ms/step - loss: 555.5363 - val_loss: 313.4300
Epoch 8/200
271/271 [==============================] - 1s 5ms/step - loss: 395.9594 - val_loss: 182.3097
Epoch 9/200
271/271 [==============================] - 1s 5ms/step - loss: 347.1473 - val_loss: 217.1935
Epoch 10/200
271/271 [==============================] - 1s 5ms/step - loss: 345.0984 - val_loss: 219.0356
...
Epoch 195/200
271/271 [==============================] - 1s 5ms/step - loss: 29.3323 - val_loss: 73.7799
Epoch 196/200
271/271 [==============================] - 1s 5ms/step - loss: 31.5007 - val_loss: 71.6756
Epoch 197/200
271/271 [==============================] - 1s 5ms/step - loss: 31.0279 - val_loss: 56.3354
Epoch 198/200
271/271 [==============================] - 1s 5ms/step - loss: 31.5648 - val_loss: 63.1492
Epoch 199/200
271/271 [==============================] - 1s 5ms/step - loss: 36.0041 - val_loss: 62.7846
Epoch 200/200
271/271 [==============================] - 1s 5ms/step - loss: 30.4770 - val_loss: 56.9121
[INFO] predicting house prices...
[INFO] avg. house price: $533,388.27, std house price: $493,403.08
[INFO] mean: 56.91%, std: 58.98%

Our mean absolute percentage error starts off extremely high, in the order of 300-2,000% in the first ten epochs; however, by the time training is complete we are at a much lower training loss of 30%.

The problem though is that we’ve clearly overfit.

While our training loss is 30% our validation loss is at 56.91%, implying that, on average, our network will be ~57% off in its house price predictions.

How can we improve our prediction accuracy?

Overall, our CNN obtained a mean absolute error of 56.91%, implying, that on average, our CNN will be nearly 57% off in its predicted house value.

That’s a pretty poor result given that our simple MLP trained on the numerical and categorial data obtained a mean absolute error of 26.01%, far better than today’s 56.91%.

So, what does this mean?

Does it mean that CNNs are ill-suited for regression tasks and that we shouldn’t use them for regression?

Actually, no — it doesn’t mean that at all.

Instead, all it means is that the interior of a home doesn’t necessarily correlate with the price of a home.

For example, let’s suppose there is an ultra luxurious celebrity home in Beverly Hills, CA that is valued at $10,000,000.

Now, let’s take that same home and transplant it to Forest Park, one of the worst areas of Detroit.

In this neighborhood the median home price is $13,000 — do you think that gorgeous celebrity house with the decked out interior is still going to be worth $10,000,000?

Of course not.

There is more to the price of a home than just the interior. We also have to factor in the local real estate market itself.

There are a huge number of factors that go into the price of a home but by in large, one of the most important attributes is the locale itself.

Therefore, it shouldn’t be much of a surprise that our CNN trained on house images didn’t perform as well as the simple MLP trained on the numerical and categorical attributes.

But that does raise the question:

Is it possible to combine our numerical/categorical data with our image data and train a single end-to-end network?
And if so, would our house price prediction accuracy improve?

I’ll answer that question next week, stay tuned.

Summary

In today’s tutorial, you learned how to train a Convolutional Neural Network (CNN) for regression prediction with Keras.

Implementing a CNN for regression prediction is as simple as:

Removing the fully-connected softmax classifier layer typically used for classification
Replacing it a fully-connected layer with a single node along with a linear activation function.
Training the model with continuous value prediction loss function such as mean squared error, mean absolute error, mean absolute percentage error, etc.

What makes this method so powerful is that it implies that we can fine-tune existing models for regression prediction — simply remove the old FC + softmax layer, add in a single node FC layer with a linear activation, update your loss method, and start training!

If you’re interested in learning more about transfer learning and fine-tuning on pre-trained models, please refer to my book, Deep Learning for Computer Vision with Python, where I discuss transfer learning and fine-tuning in detail.

In next week’s tutorial, I’ll be showing you how to work with mixed data using Keras, including combining categorical, numerical, and image data into a single network.

To download the source code to this post, and be notified when next week’s blog post publishes, be sure to enter your email address in the form below!

Downloads:

The post Keras, Regression, and CNNs appeared first on PyImageSearch.

In this tutorial, you will learn how to use Keras for multi-input and mixed data.

You will learn how to define a Keras architecture capable of accepting multiple inputs, including numerical, categorical, and image data. We’ll then train a single end-to-end network on this mixed data.

Today is the final installment in our three part series on Keras and regression:

Basic regression with Keras
Training a Keras CNN for regression prediction
Multiple inputs and mixed data with Keras (today’s post)

In this series of posts, we’ve explored regression prediction in the context of house price prediction.

The house price dataset we are using includes not only numerical and categorical data, but image data as well — we call multiple types of data mixed data as our model needs to be capable of accepting our multiple inputs (that are not of the same type) and computing a prediction on these inputs.

In the remainder of this tutorial you will learn how to:

Define a Keras model capable of accepting multiple inputs, including numerical, categorical, and image data, all at the same time.
Train an end-to-end Keras model on the mixed data inputs.
Evaluate our model using the multi-inputs.

To learn more about multiple inputs and mixed data with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras: Multiple Inputs and Mixed Data

In the first part of this tutorial, we will briefly review the concept of both mixed data and how Keras can accept multiple inputs.

From there we’ll review our house prices dataset and the directory structure for this project.

Next, I’ll show you how to:

Load the numerical, categorical, and image data from disk.
Pre-process the data so we can train a network on it.
Prepare the mixed data so it can be applied to a multi-input Keras network.

Once our data has been prepared you’ll learn how to define and train a multi-input Keras model that accepts multiple types of input data in a single end-to-end network.

Finally, we’ll evaluate our multi-input and mixed data model on our testing set and compare the results to our previous posts in this series.

What is mixed data?

Figure 1: With the Keras’ flexible deep learning framework, it is possible define a multi-input model that includes both CNN and MLP branches to handle mixed data.

In machine learning, mixed data refers to the concept of having multiple types of independent data.

For example, let’s suppose we are machine learning engineers working at a hospital to develop a system capable of classifying the health of a patient.

We would have multiple types of input data for a given patient, including:

Numeric/continuous values, such as age, heart rate, blood pressure
Categorical values, including gender and ethnicity
Image data, such as any MRI, X-ray, etc.

All of these values constitute different data types; however, our machine learning model must be able to ingest this “mixed data” and make (accurate) predictions on it.

You will see the term “mixed data” in machine learning literature when working with multiple data modalities.

Developing machine learning systems capable of handling mixed data can be extremely challenging as each data type may require separate preprocessing steps, including scaling, normalization, and feature engineering.

Working with mixed data is still very much an open area of research and is often heavily dependent on the specific task/end goal.

We’ll be working with mixed data in today’s tutorial to help you get a feel for some of the challenges associated with it.

How can Keras accept multiple inputs?

Figure 2: As opposed to its Sequential API, Keras’ functional API allows for much more complex models. In this blog post we use the functional API to support our goal of creating a model with multiple inputs and mixed data for house price prediction.

Keras is able to handle multiple inputs (and even multiple outputs) via its functional API.

The functional API, as opposed to the sequential API (which you almost certainly have used before via the

Sequential

class), can be used to define much more complex models that are non-sequential, including:

Multi-input models
Multi-output models
Models that are both multiple input and multiple output
Directed acyclic graphs
Models with shared layers

For example, we may define a simple sequential neural network as:

model = Sequential()
model.add(Dense(8, input_shape=(10,), activation="relu"))
model.add(Dense(4, activation="relu"))
model.add(Dense(1, activation="linear"))

This network is a simple feedforward neural without with 10 inputs, a first hidden layer with 8 nodes, a second hidden layer with 4 nodes, and a final output layer used for regression.

We can define the sample neural network using the functional API:

inputs = Input(shape=(10,))
x = Dense(8, activation="relu")(inputs)
x = Dense(4, activation="relu")(x)
x = Dense(1, activation="linear")(x)
model = Model(inputs, x)

Notice how we are no longer relying on the

Sequential

class.

To see the power of Keras’ function API consider the following code where we create a model that accepts multiple inputs:

# define two sets of inputs
inputA = Input(shape=(32,))
inputB = Input(shape=(128,))

# the first branch operates on the first input
x = Dense(8, activation="relu")(inputA)
x = Dense(4, activation="relu")(x)
x = Model(inputs=inputA, outputs=x)

# the second branch opreates on the second input
y = Dense(64, activation="relu")(inputB)
y = Dense(32, activation="relu")(y)
y = Dense(4, activation="relu")(y)
y = Model(inputs=inputB, outputs=y)

# combine the output of the two branches
combined = concatenate([x.output, y.output])

# apply a FC layer and then a regression prediction on the
# combined outputs
z = Dense(2, activation="relu")(combined)
z = Dense(1, activation="linear")(z)

# our model will accept the inputs of the two branches and
# then output a single value
model = Model(inputs=[x.input, y.input], outputs=z)

Here you can see we are defining two inputs to our Keras neural network:

```
inputA
```
: 32-dim
```
inputB
```
: 128-dim

Lines 21-23 define a simple

32-8-4

network using Keras’ functional API.

Similarly, Lines 26-29 define a

128-64-32-4

network.

We then combine the outputs of both the

and

on Line 32. The outputs of

and

are both 4-dim so once we concatenate them we have a 8-dim vector.

We then apply two more fully-connected layers on Lines 36 and 37. The first layer has 2 nodes followed by a ReLU activation while the second layer has only a single node with a linear activation (i.e., our regression prediction).

The final step to building the multi-input model is to define a

Model

object which:

Accepts our two
```
inputs
```
Defines the
```
outputs
```
as the final set of FC layers (i.e.,
```
z
```
).

If you were to use Keras to visualize the model architecture it would look like the following:

Figure 3: This model has two input branches that ultimately merge and produce one output. The Keras functional API allows for this type of architecture and others you can dream up.

Notice how our model has two distinct branches.

The first branch accepts our 128-d input while the second branch accepts the 32-d input. These branches operate independently of each other until they are concatenated. From there a single value is output from the network.

In the remainder of this tutorial, you will learn how to create multiple input networks using Keras.

The House Prices dataset

Figure 4: The House Prices dataset consists of both numerical/categorical data and image data. Using Keras, we’ll build a model supporting the multiple inputs and mixed data types. The result will be a Keras regression model which predicts the price/value of houses.

In this series of posts, we have been using the House Prices dataset from Ahmed and Moustafa’s 2016 paper, House price estimation from visual and textual features.

This dataset includes both numerical/categorical data along with images data for each of the 535 example houses in the dataset.

The numerical and categorical attributes include:

Number of bedrooms
Number of bathrooms
Area (i.e., square footage)
Zip code

A total of four images are provided for each house as well:

Bedroom
Bathroom
Kitchen
Frontal view of the house

In the first post in this series, you learned how to train a Keras regression network on the numerical and categorical data.

Then, last week, you learned how to perform regression with a Keras CNN.

Today we are going to work with multiple inputs and mixed data with Keras.

We are going to accept both the numerical/categorical data along with our image data to the network.

Two branches of a network will be defined to handle each type of data. The branches will then be combined at the end to obtain our final house price prediction.

In this manner, we will be able to leverage Keras to handle both multiple inputs and mixed data.

Obtaining the House Prices dataset

To grab the source code for today’s post, use the “Downloads” section. Once you have the zip file, navigate to where you downloaded it, and extract it:

$ cd path/to/zip
$ unzip keras-multi-input.zip
$ cd keras-multi-input

And from there you can download the House Prices dataset via:

$ git clone https://github.com/emanhamed/Houses-dataset

The House Prices dataset should now be in the

keras-multi-input

directory which is the directory we are using for this project.

Project structure

Let’s take a look at how today’s project is organized:

$ tree --dirsfirst --filelimit 10
.
├── Houses-dataset
│   ├── Houses\ Dataset [2141 entries]
│   └── README.md
├── pyimagesearch
│   ├── __init__.py
│   ├── datasets.py
│   └── models.py
└── mixed_training.py

3 directories, 5 files

The Houses-dataset folder contains our House Prices dataset that we’re working with for this series. When we’re ready to run the

mixed_training.py

script, you’ll just need to provide a path as a command line argument to the dataset (I’ll show you exactly how this is done in the results section).

Today we’ll be reviewing three Python scripts:

```
pyimagesearch/datasets.py
```
: Handles loading and preprocessing our numerical/categorical data as well as our image data. We previously reviewed this script over the past two weeks, but I’ll be walking you through it again today.
```
pyimagesearch/models.py
```
: Contains our Multi-layer Perceptron (MLP) and Convolutional Neural Network (CNN). These components are the input branches to our multi-input, mixed data model. We reviewed this script last week and we’ll briefly review it today as well.
```
mixed_training.py
```
: Our training script will use the
```
pyimagesearch
```
module convenience functions to load + split the data and concatenate the two branches to our network + add the head. It will then train and evaluate the model.

Loading the numerical and categorical data

Figure 5: We use pandas, a Python package, to read CSV housing data.

We covered how to load the numerical and categorical data for the house prices dataset in our Keras regression post but as a matter of completeness, we will review the code (in less detail) here today.

Be sure to refer to the previous post if you want a detailed walkthrough of the code.

Open up the

datasets.py

file and insert the following code:

# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
import glob
import cv2
import os

def load_house_attributes(inputPath):
	# initialize the list of column names in the CSV file and then
	# load it using Pandas
	cols = ["bedrooms", "bathrooms", "area", "zipcode", "price"]
	df = pd.read_csv(inputPath, sep=" ", header=None, names=cols)

	# determine (1) the unique zip codes and (2) the number of data
	# points with each zip code
	zipcodes = df["zipcode"].value_counts().keys().tolist()
	counts = df["zipcode"].value_counts().tolist()

	# loop over each of the unique zip codes and their corresponding
	# count
	for (zipcode, count) in zip(zipcodes, counts):
		# the zip code counts for our housing dataset is *extremely*
		# unbalanced (some only having 1 or 2 houses per zip code)
		# so let's sanitize our data by removing any houses with less
		# than 25 houses per zip code
		if count < 25:
			idxs = df[df["zipcode"] == zipcode].index
			df.drop(idxs, inplace=True)

	# return the data frame
	return df

Our imports are handled on Lines 2-8.

From there we define the

load_house_attributes

function on Lines 10-33. This function reads the numerical/categorical data from the House Prices dataset in the form of a CSV file via Pandas’

pd.read_csv

on Lines 13 and 14.

The data is filtered to accommodate an imbalance. Some zipcodes only are represented by 1 or 2 houses, therefore we just go ahead and

drop

(Lines 23-30) any records where there are fewer than

houses from the zipcode. The result is a more accurate model later on.

Now let’s define the

process_house_attributes

function:

def process_house_attributes(df, train, test):
	# initialize the column names of the continuous data
	continuous = ["bedrooms", "bathrooms", "area"]

	# performin min-max scaling each continuous feature column to
	# the range [0, 1]
	cs = MinMaxScaler()
	trainContinuous = cs.fit_transform(train[continuous])
	testContinuous = cs.transform(test[continuous])

	# one-hot encode the zip code categorical data (by definition of
	# one-hot encoding, all output features are now in the range [0, 1])
	zipBinarizer = LabelBinarizer().fit(df["zipcode"])
	trainCategorical = zipBinarizer.transform(train["zipcode"])
	testCategorical = zipBinarizer.transform(test["zipcode"])

	# construct our training and testing data points by concatenating
	# the categorical features with the continuous features
	trainX = np.hstack([trainCategorical, trainContinuous])
	testX = np.hstack([testCategorical, testContinuous])

	# return the concatenated training and testing data
	return (trainX, testX)

This function applies min-max scaling to the continuous features via scikit-learn’s

MinMaxScaler

(Lines 41-43).

Then, one-hot encoding for the categorical features is computed, this time via scikit-learn’s

LabelBinarizer

(Lines 47-49).

The continuous and categorical features are then concatenated and returned (Lines 53-57).

Be sure to refer to the previous posts in this series for more details on the two functions we reviewed in this section:

Loading the image dataset

Figure 6: One branch of our model accepts a single image — a montage of four images from the home. Using the montage combined with the numerical/categorial data input to another branch, our model then uses regression to predict the value of the home with the Keras framework.

The next step is to define a helper function to load our input images. Again, open up the

datasets.py

file and insert the following code:

def load_house_images(df, inputPath):
	# initialize our images array (i.e., the house images themselves)
	images = []

	# loop over the indexes of the houses
	for i in df.index.values:
		# find the four images for the house and sort the file paths,
		# ensuring the four are always in the *same order*
		basePath = os.path.sep.join([inputPath, "{}_*".format(i + 1)])
		housePaths = sorted(list(glob.glob(basePath)))

The

load_house_images

function has three goals:

Load all photos from the House Prices dataset. Recall that we have four photos per house (Figure 6).
Generate a single montage image from the four photos. The montage will always be arranged as you see in the figure.
Append all of these home montages to a list/array and return to the calling function.

Beginning on Line 59, we define the function which accepts a Pandas dataframe and dataset

inputPath

From there, we proceed to:

Initialize the
```
images
```
list (Line 61). We’ll be populating this list with all of the montage images that we build.
Loop over houses in our data frame (Line 64). Inside the loop, we:
- Grab the paths to the four photos for the current house (Lines 67 and 68).

Let’s keep making progress in the loop:

# initialize our list of input images along with the output image
		# after *combining* the four input images
		inputImages = []
		outputImage = np.zeros((64, 64, 3), dtype="uint8")

		# loop over the input house paths
		for housePath in housePaths:
			# load the input image, resize it to be 32 32, and then
			# update the list of input images
			image = cv2.imread(housePath)
			image = cv2.resize(image, (32, 32))
			inputImages.append(image)

		# tile the four input images in the output image such the first
		# image goes in the top-right corner, the second image in the
		# top-left corner, the third image in the bottom-right corner,
		# and the final image in the bottom-left corner
		outputImage[0:32, 0:32] = inputImages[0]
		outputImage[0:32, 32:64] = inputImages[1]
		outputImage[32:64, 32:64] = inputImages[2]
		outputImage[32:64, 0:32] = inputImages[3]

		# add the tiled image to our set of images the network will be
		# trained on
		images.append(outputImage)

	# return our set of images
	return np.array(images)

The code so far has accomplished the first goal discussed above (grabbing the four house images per house). Let’s wrap up the

load_house_images

function:

Still inside the loop, we:
- Perform initializations (Lines 72 and 73). Our
```
inputImages
```
  will be in list form containing four photos of each record. Our
```
outputImage
```
  will be the montage of the photos (like Figure 6).
- Loop over 4 photos (Line 76):
  - Load, resize, and append each photo to
```
inputImages
```
    (Lines 79-81).
- Create the tiling (a montage) for the four house images (Lines 87-90) with:
  - The bathroom image in the top-left.
  - The bedroom image in the top-right.
  - The frontal view in the bottom-right.
  - The kitchen in the bottom-left.
- Append the tiling/montage
```
outputImage
```
  to
```
images
```
  (Line 94).
Jumping out of the loop, we
```
return
```
all the
```
images
```
in the form of a NumPy array (Line 57).

We’ll have as many

images

as there are records we’re training with (remember, we dropped a few of them in the

process_house_attributes

function).

Each of our tiled

images

will look like Figure 6 (without the overlaid text of course). You can see the four photos therein have been arranged in a montage (I’ve used larger image dimensions so we can better visualize what the code is doing). Just as our numerical and categorical attributes represent the house, these four photos (tiled into a single image) will represent the visual aesthetics of the house.

If you need to review this process in further detail, be sure to refer to last week’s post.

Defining our Multi-layer Perceptron (MLP) and Convolutional Neural Network (CNN)

Figure 7: Our Keras multi-input + mixed data model has one branch that accepts the numerical/categorical data (left) and another branch that accepts image data in the form a 4-photo montage (right).

As you’ve gathered thus far, we’ve had to massage our data carefully using multiple libraries: Pandas, scikit-learn, OpenCV, and NumPy.

We’ve organized and pre-processed the two modalities of our dataset at this point via

datasets.py

Numeric and categorical data
Image data

The skills we’ve used in order to accomplish this have been developed through experience + practice, machine learning best practices, and behind the scenes of this blog post, a little bit of debugging. Please don’t overlook what we’ve discussed so far using our data massaging skills as it is key to the rest of our project’s success.

Let’s shift gears and discuss our multi-input and mixed data network that we’ll build with Keras’ functional API.

In order to build our multi-input network we will need two branches:

The first branch will be a simple Multi-layer Perceptron (MLP) designed to handle the categorical/numerical inputs.
The second branch will be a Convolutional Neural Network to operate over the image data.
These branches will then be concatenated together to form the final multi-input Keras model.

We’ll handle building the final concatenated multi-input model in the next section — our current task is to define the two branches.

Open up the

models.py

file and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.layers import Flatten
from keras.layers import Input
from keras.models import Model

def create_mlp(dim, regress=False):
	# define our MLP network
	model = Sequential()
	model.add(Dense(8, input_dim=dim, activation="relu"))
	model.add(Dense(4, activation="relu"))

	# check to see if the regression node should be added
	if regress:
		model.add(Dense(1, activation="linear"))

	# return our model
	return model

Lines 2-11 handle our Keras imports. You’ll see each of the imported functions/classes going forward in this script.

Our categorical/numerical data will be processed by a simple Multi-layer Perceptron (MLP).

The MLP is defined by

create_mlp

on Lines 13-24.

Discussed in detail in the first post in this series, the MLP relies on the Keras

Sequential

API. Our MLP is quite simple having:

A fully connected (
```
Dense
```
) input layer with ReLU
```
activation
```
(Line 16).
A fully-connected hidden layer, also with ReLU
```
activation
```
(Line 17).
And finally, an optional regression output with linear activation (Lines 20 and 21).

While we used the regression output of the MLP in the first post, it will not be used in this multi-input, mixed data network. As you’ll soon see, we’ll be setting

regress=False

explicitly even though it is the default as well. Regression will actually be performed later on the head of the entire multi-input, mixed data network (the bottom of Figure 7).

The MLP branch is returned on Line 24.

Referring back to Figure 7, we’ve now built the top-left branch of our network.

Let’s now define the top-right branch of our network, a CNN:

def create_cnn(width, height, depth, filters=(16, 32, 64), regress=False):
	# initialize the input shape and channel dimension, assuming
	# TensorFlow/channels-last ordering
	inputShape = (height, width, depth)
	chanDim = -1

	# define the model input
	inputs = Input(shape=inputShape)

	# loop over the number of filters
	for (i, f) in enumerate(filters):
		# if this is the first CONV layer then set the input
		# appropriately
		if i == 0:
			x = inputs

		# CONV => RELU => BN => POOL
		x = Conv2D(f, (3, 3), padding="same")(x)
		x = Activation("relu")(x)
		x = BatchNormalization(axis=chanDim)(x)
		x = MaxPooling2D(pool_size=(2, 2))(x)

The

create_cnn

function handles the image data and accepts five parameters:

```
width
```
: The width of the input images in pixels.
```
height
```
: How many pixels tall the input images are.
```
depth
```
: The number of channels in our input images. For RGB color images, it is three.
```
filters
```
: A tuple of progressively larger filters so that our network can learn more discriminate features.
```
regress
```
: A boolean indicating whether or not a fully-connected linear activation layer will be appended to the CNN for regression purposes.

The

inputShape

of our network is defined on Line 29. It assumes “channels last” ordering for the TensorFlow backend.

The

Input

to the model is defined via the

inputShape

on (Line 33).

From there we begin looping over the filters and create a set of

CONV => RELU > BN => POOL

Let’s finish building the CNN branch of our network:

# flatten the volume, then FC => RELU => BN => DROPOUT
	x = Flatten()(x)
	x = Dense(16)(x)
	x = Activation("relu")(x)
	x = BatchNormalization(axis=chanDim)(x)
	x = Dropout(0.5)(x)

	# apply another FC layer, this one to match the number of nodes
	# coming out of the MLP
	x = Dense(4)(x)
	x = Activation("relu")(x)

	# check to see if the regression node should be added
	if regress:
		x = Dense(1, activation="linear")(x)

	# construct the CNN
	model = Model(inputs, x)

	# return the CNN
	return model

Flatten

the next layer (Line 49) and then add a fully-connected layer with

BatchNormalization

and

Dropout

(Lines 50-53).

Another fully-connected layer is applied to match the four nodes coming out of the multi-layer perceptron (Lines 57 and 58). Matching the number of nodes is not a requirement but it does help balance the branches.

On Lines 61 and 62, a check is made to see if the regression node should be appended; it is then added in accordingly. Again, we will not be conducting regression at the end of this branch either. Regression will be performed on the head of the multi-input, mixed data network (the very bottom of Figure 7).

Finally, the model is constructed from our

inputs

and all the layers we’ve assembled together,

(Line 65).

We can then

return

the CNN branch to the calling function (Line 68).

Now that we’ve defined both branches of the multi-input Keras model, let’s learn how we can combine them!

Multiple inputs with Keras

We are now ready to build our final Keras model capable of handling both multiple inputs and mixed data. This is where the branches come together and ultimately where the “magic” happens. Training will also happen in this script.

Create a new file named

mixed_training.py

, open it up, and insert the following code:

# import the necessary packages
from pyimagesearch import datasets
from pyimagesearch import models
from sklearn.model_selection import train_test_split
from keras.layers.core import Dense
from keras.models import Model
from keras.optimizers import Adam
from keras.layers import concatenate
import numpy as np
import argparse
import locale
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, required=True,
	help="path to input dataset of house images")
args = vars(ap.parse_args())

Our imports and command line arguments are handled first.

Notable imports include:

```
datasets
```
: Our three convenience functions for loading/processing the CSV data and loading/pre-processing the house photos from the Houses Dataset.
```
models
```
: Our MLP and CNN input branches which will serve as our multi-input, mixed data.
```
train_test_split
```
: A scikit-learn function to construct our training/testing data splits.
```
concatenate
```
: A special Keras function which will accept multiple inputs.
```
argparse
```
: Handles parsing command line arguments.

We have one command line argument to parse on Lines 15-18,

--dataset

, which is the path to where you downloaded the House Prices dataset.

Let’s load our numerical/categorical data and image data:

# construct the path to the input .txt file that contains information
# on each house in the dataset and then load the dataset
print("[INFO] loading house attributes...")
inputPath = os.path.sep.join([args["dataset"], "HousesInfo.txt"])
df = datasets.load_house_attributes(inputPath)

# load the house images and then scale the pixel intensities to the
# range [0, 1]
print("[INFO] loading house images...")
images = datasets.load_house_images(df, args["dataset"])
images = images / 255.0

Here we’ve loaded the House Prices dataset as a Pandas dataframe (Lines 23 and 24).

Then we’ve loaded our

images

and scaled them to the range [0, 1] (Lines 29-30).

Be sure to review the

load_house_attributes

and

load_house_images

functions above if you need a reminder on what these functions are doing under the hood.

Now that our data is loaded, we’re going to construct our training/testing splits, scale the prices, and process the house attributes:

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
print("[INFO] processing data...")
split = train_test_split(df, images, test_size=0.25, random_state=42)
(trainAttrX, testAttrX, trainImagesX, testImagesX) = split

# find the largest house price in the training set and use it to
# scale our house prices to the range [0, 1] (will lead to better
# training and convergence)
maxPrice = trainAttrX["price"].max()
trainY = trainAttrX["price"] / maxPrice
testY = testAttrX["price"] / maxPrice

# process the house attributes data by performing min-max scaling
# on continuous features, one-hot encoding on categorical features,
# and then finally concatenating them together
(trainAttrX, testAttrX) = datasets.process_house_attributes(df,
	trainAttrX, testAttrX)

Our training and testing splits are constructed on Lines 35 and 36. We’ve allocated 75% of our data for training and 25% of our data for testing.

From there, we find the

maxPrice

from the training set (Line 41) and scale the training and testing data accordingly (Lines 42 and 43). Having the pricing data in the range [0, 1] leads to better training and convergence.

Finally, we go ahead and process our house attributes by performing min-max scaling on continuous features and one-hot encoding on categorical features. The

process_house_attributes

function handles these actions and concatenates the continuous and categorical features together, returning the results (Lines 48 and 49).

Ready for some magic?

Okay, I lied. There isn’t actually any “magic” going on in this next code block! But we will

concatenate

the branches of our network and finish our multi-input Keras network:

# create the MLP and CNN models
mlp = models.create_mlp(trainAttrX.shape[1], regress=False)
cnn = models.create_cnn(64, 64, 3, regress=False)

# create the input to our final set of layers as the *output* of both
# the MLP and CNN
combinedInput = concatenate([mlp.output, cnn.output])

# our final FC layer head will have two dense layers, the final one
# being our regression head
x = Dense(4, activation="relu")(combinedInput)
x = Dense(1, activation="linear")(x)

# our final model will accept categorical/numerical data on the MLP
# input and images on the CNN input, outputting a single value (the
# predicted price of the house)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)

Handling multiple inputs with Keras is quite easy when you’ve organized your code and models.

On Lines 52 and 53, we create our

mlp

and

cnn

models. Notice that

regress=False

— our regression head comes later on Line 62.

We’ll then

concatenate

the

mlp.output

and

cnn.output

as shown on Line 57. I’m calling this our

combinedInput

because it is the input to the rest of the network (from Figure 3 this is

concatenate_1

where the two branches come together).

The

combinedInput

to the final layers in the network is based on the output of both the MLP and CNN branches’

8-4-1

FC layers (since each of the 2 branches outputs a 4-dim FC layer and then we concatenate them to create an 8-dim vector).

We tack on a fully connected layer with four neurons to the

combinedInput

(Line 61).

Then we add our

"linear"

activation

regression head (Line 62), the output of which is the predicted price.

Our

Model

is defined using the

inputs

of both branches as our multi-input and the final set of layers

as the

output

(Line 67).

Let’s go ahead and compile, train, and evaluate our newly formed

model

# compile the model using mean absolute percentage error as our loss,
# implying that we seek to minimize the absolute percentage difference
# between our price *predictions* and the *actual prices*
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)

# train the model
print("[INFO] training model...")
model.fit(
	[trainAttrX, trainImagesX], trainY,
	validation_data=([testAttrX, testImagesX], testY),
	epochs=200, batch_size=8)

# make predictions on the testing data
print("[INFO] predicting house prices...")
preds = model.predict([testAttrX, testImagesX])

Our

model

is compiled with

"mean_absolute_percentage_error"

loss

and an

Adam

optimizer with learning rate

decay

(Lines 72 and 73).

Training is kicked off on Lines 77-80. This is known as fitting the model (and is also where all the weights are tuned by the process known as backpropagation).

Calling

model.predict

on our testing data (Line 84) allows us to grab predictions for evaluating our model. Let’s perform evaluation now:

# compute the difference between the *predicted* house prices and the
# *actual* house prices, then compute the percentage difference and
# the absolute percentage difference
diff = preds.flatten() - testY
percentDiff = (diff / testY) * 100
absPercentDiff = np.abs(percentDiff)

# compute the mean and standard deviation of the absolute percentage
# difference
mean = np.mean(absPercentDiff)
std = np.std(absPercentDiff)

# finally, show some statistics on our model
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
print("[INFO] avg. house price: {}, std house price: {}".format(
	locale.currency(df["price"].mean(), grouping=True),
	locale.currency(df["price"].std(), grouping=True)))
print("[INFO] mean: {:.2f}%, std: {:.2f}%".format(mean, std))

To evaluate our model, we have computed absolute percentage difference (Lines 89-91) and used it to derive our final metrics (Lines 95 and 96).

These metrics (price mean, price standard deviation, and mean + standard deviation of the absolute percentage difference) are printed to the terminal with proper currency locale formatting (Lines 100-103).

Multi-input and mixed data results

Figure 8: Real estate price prediction is a difficult task, but our Keras multi-input + mixed input regression model yields relatively good results on our limited House Prices dataset.

Finally, we are ready to train our multi-input network on our mixed data!

Make sure you have:

Configured your dev environment according to the first tutorial in this series.
Used the “Downloads” section of this tutorial to download the source code.
Downloaded the house prices dataset using the instructions in the “Obtaining the House Prices dataset” section above.

From there, open up a terminal and execute the following command to kick off training the network:

$ python mixed_training.py --dataset Houses-dataset/Houses\ Dataset/
[INFO] training model...
Train on 271 samples, validate on 91 samples
Epoch 1/200
271/271 [==============================] - 2s 8ms/step - loss: 240.2516 - val_loss: 118.1782
Epoch 2/200
271/271 [==============================] - 1s 5ms/step - loss: 195.8325 - val_loss: 95.3750
Epoch 3/200
271/271 [==============================] - 1s 5ms/step - loss: 121.5940 - val_loss: 85.1037
Epoch 4/200
271/271 [==============================] - 1s 5ms/step - loss: 103.2910 - val_loss: 72.1434
Epoch 5/200
271/271 [==============================] - 1s 5ms/step - loss: 82.3916 - val_loss: 61.9368
Epoch 6/200
271/271 [==============================] - 1s 5ms/step - loss: 81.3794 - val_loss: 59.7905
Epoch 7/200
271/271 [==============================] - 1s 5ms/step - loss: 71.3617 - val_loss: 58.8067
Epoch 8/200
271/271 [==============================] - 1s 5ms/step - loss: 72.7032 - val_loss: 56.4613
Epoch 9/200
271/271 [==============================] - 1s 5ms/step - loss: 52.0019 - val_loss: 54.7461
Epoch 10/200
271/271 [==============================] - 1s 5ms/step - loss: 62.4559 - val_loss: 49.1401
...
Epoch 190/200
271/271 [==============================] - 1s 5ms/step - loss: 16.0892 - val_loss: 22.8415
Epoch 191/200
271/271 [==============================] - 1s 5ms/step - loss: 16.1908 - val_loss: 22.5139
Epoch 192/200
271/271 [==============================] - 1s 5ms/step - loss: 16.9099 - val_loss: 22.5922
Epoch 193/200
271/271 [==============================] - 1s 5ms/step - loss: 18.6216 - val_loss: 26.9679
Epoch 194/200
271/271 [==============================] - 1s 5ms/step - loss: 16.5341 - val_loss: 23.1445
Epoch 195/200
271/271 [==============================] - 1s 5ms/step - loss: 16.4120 - val_loss: 26.1224
Epoch 196/200
271/271 [==============================] - 1s 5ms/step - loss: 16.4939 - val_loss: 23.1224
Epoch 197/200
271/271 [==============================] - 1s 5ms/step - loss: 15.6253 - val_loss: 22.2930
Epoch 198/200
271/271 [==============================] - 1s 5ms/step - loss: 16.0514 - val_loss: 23.6948
Epoch 199/200
271/271 [==============================] - 1s 5ms/step - loss: 17.9525 - val_loss: 22.9743
Epoch 200/200
271/271 [==============================] - 1s 5ms/step - loss: 16.0377 - val_loss: 22.4130
[INFO] predicting house prices...
[INFO] avg. house price: $533,388.27, std house price: $493,403.08
[INFO] mean: 22.41%, std: 20.11%

Our mean absolute percentage error starts off very high but continues to fall throughout the training process.

By the end of training, we are obtaining of 22.41% mean absolute percentage error on our testing set, implying that, on average, our network will be ~22% off in its house price predictions.

Let’s compare this result to our previous two posts in the series:

Using just an MLP on the numerical/categorical data: 26.01%
Using just a CNN on the image data: 56.91%

As you can see, working with mixed data by:

Combining our numerical/categorical data along with image data
And training a multi-input model on the mixed data…

…has led to a better performing model!

Summary

In this tutorial, you learned how to define a Keras network capable of accepting multiple inputs.

You learned how to work with mixed data using Keras as well.

To accomplish these goals we defined a multiple input neural network capable of accepting:

Numerical data
Categorical data
Image data

The numerical data was min-max scaled to the range [0, 1] prior to training. Our categorical data was one-hot encoded (also ensuring the resulting integer vectors were in the range [0, 1]).

The numerical and categorical data were then concatenated into a single feature vector to form the first input to the Keras network.

Our image data was also scaled to the range [0, 1] — this data served as the second input to the Keras network.

One branch of the model included strictly fully-connected layers (for the concatenated numerical and categorical data) while the second branch of the multi-input model was essentially a small Convolutional Neural Network.

The outputs of both branches were combined and a single output (the regression prediction) was defined.

In this manner, we were able to train our multiple input network end-to-end, resulting in better accuracy than using just one of the inputs alone.

I hope you enjoyed today’s blog post — if you ever need to work with multiple inputs and mixed data in your own projects definitely consider using the code covered in this tutorial as a template.

From there you can modify the code to your own needs.

To download the source code, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Keras: Multiple Inputs and Mixed Data appeared first on PyImageSearch.

In this tutorial you will learn how to train a simple Convolutional Neural Network (CNN) with Keras on the Fashion MNIST dataset, enabling you to classify fashion images and categories.

The Fashion MNIST dataset is meant to be a (slightly more challenging) drop-in replacement for the (less challenging) MNIST dataset.

Similar to the MNIST digit dataset, the Fashion MNIST dataset includes:

60,000 training examples
10,000 testing examples
10 classes
28×28 grayscale/single channel images

The ten fashion class labels include:

T-shirt/top
Trouser/pants
Pullover shirt
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

Throughout this tutorial, you will learn how to train a simple Convolutional Neural Network (CNN) with Keras on the Fashion MNIST dataset, giving you not only hands-on experience working with the Keras library but also your first taste of clothing/fashion classification.

To learn how to train a Keras CNN on the Fashion MNIST dataset, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Fashion MNIST with Keras and Deep Learning

In the first part of this tutorial, we will review the Fashion MNIST dataset, including how to download it to your system.

From there we’ll define a simple CNN network using the Keras deep learning library.

Finally, we’ll train our CNN model on the Fashion MNIST dataset, evaluate it, and review the results.

Let’s go ahead and get started!

The Fashion MNIST dataset

Figure 1: The Fashion MNIST dataset was created by e-commerce company, Zalando, as a drop-in replacement for MNIST Digits. It is a great dataset to practice with when using Keras for deep learning. (image source)

The Fashion MNIST dataset was created by e-commerce company, Zalando.

As they note on their official GitHub repo for the Fashion MNIST dataset, there are a few problems with the standard MNIST digit recognition dataset:

It’s far too easy for standard machine learning algorithms to obtain 97%+ accuracy.
It’s even easier for deep learning models to achieve 99%+ accuracy.
The dataset is overused.
MNIST cannot represent modern computer vision tasks.

Zalando, therefore, created the Fashion MNIST dataset as a drop-in replacement for MNIST.

The Fashion MNIST dataset is identical to the MNIST dataset in terms of training set size, testing set size, number of class labels, and image dimensions:

60,000 training examples
10,000 testing examples
10 classes
28×28 grayscale images

If you’ve ever trained a network on the MNIST digit dataset then you can essentially change one or two lines of code and train the same network on the Fashion MNIST dataset!

How to install Keras

If you’re reading this tutorial, I’ll be assuming you have Keras installed. If not, be sure to follow Installing Keras for deep learning.

You’ll also need OpenCV and imutils installed. Pip is suitable and you can follow my pip install opencv tutorial to get started.

The last tools you’ll need are scikit-learn and matplotlib:

$ pip install scikit-learn
$ pip install matplotlib

Obtaining the Fashion MNIST dataset

Figure 2: The Fashion MNIST dataset is built right into Keras. Alternatively, you can download it from GitHub. (image source)

There are two ways to obtain the Fashion MNIST dataset.

If you are using the Keras deep learning library, the Fashion MNIST dataset is actually built directly into the datasets module of Keras:

from keras.datasets import fashion_mnist
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

Otherwise, if you are using another deep learning library you can download it directory from the the official Fashion MNIST GitHub repo.

A big thanks to Margaret Maynard-Reid for putting together the awesome illustration in Figure 2.

Project structure

To follow along, be sure to grab the “Downloads” for today’s blog post.

Once you’ve unzipped the files, your directory structure will look like this:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── minivggnet.py
├── fashion_mnist.py
└── plot.png

1 directory, 4 files

Our project today is rather straightforward — we’re reviewing two Python files:

```
pyimagesearch/minivggnet.py
```
: Contains a simple CNN based on VGGNet.
```
fashion_mnist.py
```
: Our training script for Fashion MNIST classification with Keras and deep learning. This script will load the data (remember, it is built into Keras), and train our MiniVGGNet model. A classification report and montage will be generated upon training completion.

Defining a simple Convolutional Neural Network (CNN)

Today we’ll be defining a very simple Convolutional Neural Network to train on the Fashion MNIST dataset.

We’ll call this CNN “MiniVGGNet” since:

The model is inspired by its bigger brother, VGGNet
The model has VGGNet characteristics, including:
- Only using 3×3 CONV filters
- Stacking multiple CONV layers before applying a max-pooling operation

We’ve used the MiniVGGNet model before a handful of times on the PyImageSearch blog but we’ll briefly review it here today as a matter of completeness.

Open up a new file, name it

minivggnet.py

, and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K

class MiniVGGNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

Our Keras imports are listed on Lines 2-10. Our Convolutional Neural Network model is relatively simple, but we will be taking advantage of batch normalization and dropout which are two methods I nearly always recommend. For further reading please take a look at Deep Learning for Computer Vision with Python.

Our

MiniVGGNet

class and its

build

method are defined on Lines 12-14. The

build

function accepts four parameters:

```
width
```
: Image width in pixels.
```
height
```
: Image height in pixels.
```
depth
```
: Number of channels. Typically for color this value is
```
3
```
and for grayscale it is
```
1
```
(the Fashion MNIST dataset is grayscale).
```
classes
```
: The number of types of fashion articles we can recognize. The number of classes affects the final fully-connected output layer. For the Fashion MNIST dataset there are a total of
```
10
```
classes.

Our

model

is initialized on Line 17 using the

Sequential

API.

From there, our

inputShape

is defined (Line 18). We’re going to use

"channels_last"

ordering since our backend is TensorFlow, but in case you’re using a different backend, Lines 23-25 will accommodate.

Now let’s add our layers to the CNN:

# first CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# second CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(512))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

Our

model

has two sets of

(CONV => RELU => BN) * 2 => POOL

layers (Lines 28-46). These layer sets also include batch normalization and dropout.

Convolutional layers, including their parameters, are described in detail in this previous post.

Pooling layers help to progressively reduce the spatial dimensions of the input volume.

Batch normalization, as the name suggests, seeks to normalize the activations of a given input volume before passing it into the next layer. It has been shown to be effective at reducing the number of epochs required to train a CNN at the expense of an increase in per-epoch time.

Dropout is a form of regularization that aims to prevent overfitting. Random connections are dropped to ensure that no single node in the network is responsible for activating when presented with a given pattern.

What follows is a fully-connected layer and softmax classifier (Lines 49-57). The softmax classifier is used to obtain output classification probabilities.

The

model

is then returned on Line 60.

For further reading about building models with Keras, please refer to my Keras Tutorial and Deep Learning for Computer Vision with Python.

Implementing the Fashion MNIST training script with Keras

Now that MiniVGGNet is implemented we can move on to the driver script which:

Loads the Fashion MNIST dataset.
Trains MiniVGGNet on Fashion MNIST + generates a training history plot.
Evaluates the resulting model and outputs a classification report.
Creates a montage visualization allowing us to see our results visually.

Create a new file named

fashion_mnist.py

, open it up, and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.minivggnet import MiniVGGNet
from sklearn.metrics import classification_report
from keras.optimizers import SGD
from keras.datasets import fashion_mnist
from keras.utils import np_utils
from keras import backend as K
from imutils import build_montages
import matplotlib.pyplot as plt
import numpy as np
import cv2

# initialize the number of epochs to train for, base learning rate,
# and batch size
NUM_EPOCHS = 25
INIT_LR = 1e-2
BS = 32

We begin by importing necessary packages, modules, and functions on Lines 2-15:

The
```
"Agg"
```
backend is used for Matplotlib so that we can save our training plot to disk (Line 3).
Our
```
MiniVGGNet
```
CNN (defined in
```
minivggnet.py
```
in the previous section) is imported on Line 6.
We’ll use scikit-learn’s
```
classification_report
```
to print final classification statistics/accuracies (Line 7).
Our Keras imports, including our
```
fashion_mnist
```
dataset, are grabbed on Lines 8-11.
The
```
build_montages
```
function from imutils will be used for visualization (Line 12).
Finally,
```
matplotlib
```
,
```
numpy
```
and OpenCV (
```
cv2
```
) are also imported (Lines 13-15).

Three hyperparameters are set on Lines 19-21, including our:

Learning rate
Batch size
Number of epochs we’ll train for

Let’s go ahead and load the Fashion MNIST dataset and reshape it if necessary:

# grab the Fashion MNIST dataset (if this is your first time running
# this the dataset will be automatically downloaded)
print("[INFO] loading Fashion MNIST...")
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

# if we are using "channels first" ordering, then reshape the design
# matrix such that the matrix is:
# 	num_samples x depth x rows x columns
if K.image_data_format() == "channels_first":
	trainX = trainX.reshape((trainX.shape[0], 1, 28, 28))
	testX = testX.reshape((testX.shape[0], 1, 28, 28))
 
# otherwise, we are using "channels last" ordering, so the design
# matrix shape should be: num_samples x rows x columns x depth
else:
	trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
	testX = testX.reshape((testX.shape[0], 28, 28, 1))

The Fashion MNIST dataset we’re using is loaded from disk on Line 26. If this is the first time you’ve used the Fashion MNIST dataset then Keras will automatically download and cache Fashion MNIST for you.

Additionally, Fashion MNIST is already organized into training/testing splits, so today we aren’t using scikit-learn’s

train_test_split

function that you’d normally see here.

From there we go ahead and re-order our data based on

"channels_first"

"channels_last"

image data formats (Lines 31-39). The ordering largely depends upon your backend. I’m using TensorFlow as the backend to Keras, which I presume you are using as well.

Let’s go ahead and preprocess + prepare our data:

# scale data to the range of [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# one-hot encode the training and testing labels
trainY = np_utils.to_categorical(trainY, 10)
testY = np_utils.to_categorical(testY, 10)

# initialize the label names
labelNames = ["top", "trouser", "pullover", "dress", "coat",
	"sandal", "shirt", "sneaker", "bag", "ankle boot"]

Here our pixel intensities are scaled to the range [0, 1] (Lines 42 and 43). We then one-hot encode the labels (Lines 46 and 47).

Here is an example of one-hot encoding based on the

labelNames

on Lines 50 and 51:

“T-shirt/top”:
```
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
“bag”:
```
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
```

Let’s go ahead and fit our

model

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / NUM_EPOCHS)
model = MiniVGGNet.build(width=28, height=28, depth=1, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
print("[INFO] training model...")
H = model.fit(trainX, trainY,
	validation_data=(testX, testY),
	batch_size=BS, epochs=NUM_EPOCHS)

On Lines 55-58 our

model

is initialized and compiled with the Stochastic Gradient Descent (

SGD

) optimizer and learning rate decay.

From there the

model

is trained via the call to

model.fit

on Lines 62-64.

After training for

NUM_EPOCHS

, we’ll go ahead and evaluate our network + generate a training plot:

# make predictions on the test set
preds = model.predict(testX)

# show a nicely formatted classification report
print("[INFO] evaluating network...")
print(classification_report(testY.argmax(axis=1), preds.argmax(axis=1),
	target_names=labelNames))

# plot the training loss and accuracy
N = NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig("plot.png")

To evaluate our network, we’ve made predictions on the testing set (Line 67) and then printed a

classification_report

in our terminal (Lines 71 and 72).

Training history is plotted and output to disk (Lines 75-86).

As if what we’ve done so far hasn’t been fun enough, we’re now going to visualize our results!

# initialize our list of output images
images = []

# randomly select a few testing fashion items
for i in np.random.choice(np.arange(0, len(testY)), size=(16,)):
	# classify the clothing
	probs = model.predict(testX[np.newaxis, i])
	prediction = probs.argmax(axis=1)
	label = labelNames[prediction[0]]
 
	# extract the image from the testData if using "channels_first"
	# ordering
	if K.image_data_format() == "channels_first":
		image = (testX[i][0] * 255).astype("uint8")
 
	# otherwise we are using "channels_last" ordering
	else:
		image = (testX[i] * 255).astype("uint8")

To do so, we:

Sample a set of the testing images via
```
random
```
sampling , looping over them individually (Line 92).
Make a prediction on each of the
```
random
```
testing images and determine the
```
label
```
name (Lines 94-96).
Based on channel ordering, grab the
```
image
```
itself (Lines 100-105).

Now let’s add a colored label to each image and arrange them in a montage:

# initialize the text label color as green (correct)
	color = (0, 255, 0)

	# otherwise, the class label prediction is incorrect
	if prediction[0] != np.argmax(testY[i]):
		color = (0, 0, 255)
 
	# merge the channels into one image and resize the image from
	# 28x28 to 96x96 so we can better see it and then draw the
	# predicted label on the image
	image = cv2.merge([image] * 3)
	image = cv2.resize(image, (96, 96), interpolation=cv2.INTER_LINEAR)
	cv2.putText(image, label, (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.75,
		(0, 255, 0), 2)

	# add the image to our list of output images
	images.append(image)

# construct the montage for the images
montage = build_montages(images, (96, 96), (4, 4))[0]

# show the output montage
cv2.imshow("Fashion MNIST", montage)
cv2.waitKey(0)

Here we:

Initialize our label
```
color
```
as green for “correct” and red for “incorrect” classification (Lines 108-112).
Create a 3-channel image by merging the grayscale
```
image
```
three times (Line 117).
Enlarge the
```
image
```
(Line 118) and draw a
```
label
```
on it (Lines 119-120).
Add each
```
image
```
to the
```
images
```
list (Line 123)

Once the

images

have all been annotated via the steps in the

for

loop, our OpenCV montage is built via Line 126.

Finally, the visualization is displayed until a keypress is detected (Lines 129 and 130).

Fashion MNIST results

We are now ready to train our Keras CNN on the Fashion MNIST dataset!

Make sure you have used the “Downloads” section of this blog post to download the source code and project structure.

From there, open up a terminal, navigate to where you downloaded the code, and execute the following command:

$ python fashion_mnist.py
Using TensorFlow backend.
[INFO] loading Fashion MNIST...
[INFO] compiling model...
[INFO] training model...
Train on 60000 samples, validate on 10000 samples
Epoch 1/25
60000/60000 [==============================] - 28s 460us/step - loss: 0.5227 - acc: 0.8241 - val_loss: 0.3165 - val_acc: 0.8837
Epoch 2/25
60000/60000 [==============================] - 26s 429us/step - loss: 0.3327 - acc: 0.8821 - val_loss: 0.2523 - val_acc: 0.9083
Epoch 3/25
60000/60000 [==============================] - 26s 429us/step - loss: 0.2870 - acc: 0.8955 - val_loss: 0.2464 - val_acc: 0.9107
...
Epoch 23/25
60000/60000 [==============================] - 26s 430us/step - loss: 0.1691 - acc: 0.9378 - val_loss: 0.1791 - val_acc: 0.9358
Epoch 24/25
60000/60000 [==============================] - 26s 430us/step - loss: 0.1693 - acc: 0.9374 - val_loss: 0.1819 - val_acc: 0.9349
Epoch 25/25
60000/60000 [==============================] - 26s 430us/step - loss: 0.1679 - acc: 0.9391 - val_loss: 0.1802 - val_acc: 0.9352
[INFO] evaluating network...
              precision    recall  f1-score   support

         top       0.88      0.89      0.89      1000
     trouser       1.00      0.99      0.99      1000
    pullover       0.90      0.92      0.91      1000
       dress       0.92      0.94      0.93      1000
        coat       0.92      0.89      0.90      1000
      sandal       0.99      0.99      0.99      1000
       shirt       0.81      0.80      0.81      1000
     sneaker       0.96      0.98      0.97      1000
         bag       0.99      0.99      0.99      1000
  ankle boot       0.98      0.97      0.97      1000

   micro avg       0.94      0.94      0.94     10000
   macro avg       0.94      0.94      0.94     10000
weighted avg       0.94      0.94      0.94     10000

Figure 3: Our Keras + deep learning Fashion MNIST training plot contains the accuracy/loss curves for training and validation.

Here you can see that our network obtained 94% accuracy on the testing set.

The model classified the “trouser” class 100% correctly but seemed to struggle quite a bit with the “shirt” class (~81% accurate).

According to our plot in Figure 3, there appears to be very little overfitting.

A deeper architecture with data augmentation would likely lead to higher accuracy.

Below I have included a sample of fashion classifications:

Figure 4: The results of training a Keras deep learning model (based on VGGNet, but smaller in size/complexity) using the Fashion MNIST dataset.

As you can see our network is performing quite well at fashion recognition.

Will this model work for fashion images outside the Fashion MNIST dataset?

Figure 5: In a previous tutorial I’ve shared a separate fashion-related tutorial about using Keras for multi-output deep learning classification — be sure to give it a look if you want to build a more robust fashion recognition model.

At this point, you are properly wondering if the model we just trained on the Fashion MNIST dataset would be directly applicable to images outside the Fashion MNIST dataset?

The short answer is “No, unfortunately not.”

The longer answer requires a bit of explanation.

To start, keep in mind that the Fashion MNIST dataset is meant to be a drop-in replacement for the MNIST dataset, implying that our images have already been processed.

Each image has been:

Converted to grayscale.
Segmented, such that all background pixels are black and all foreground pixels are some gray, non-black pixel intensity.
Resized to 28×28 pixels.

For real-world fashion and clothing images, you would have to preprocess your data in the same manner as the Fashion MNIST dataset.

And furthermore, even if you could preprocess your dataset in the exact same manner, the model still might not be transferable to real-world images.

Instead, you should train a CNN on example images that will mimic the images the CNN “sees” when deployed to a real-world situation.

To do that you will likely need to utilize multi-label classification and multi-output networks.

For more details on both of these techniques be sure to refer to the following tutorials:

Summary

In this tutorial, you learned how to train a simple CNN on the Fashion MNIST dataset using Keras.
The Fashion MNIST dataset is meant to be a drop-in replacement for the standard MNIST digit recognition dataset, including:

60,000 training examples
10,000 testing examples
10 classes
28×28 grayscale images

While the Fashion MNIST dataset is slightly more challenging than the MNIST digit recognition dataset, unfortunately, it cannot be used directly in real-world fashion classification tasks, unless you preprocess your images in the exact same manner as Fashion MNIST (segmentation, thresholding, grayscale conversion, resizing, etc.).

In most real-world fashion applications mimicking the Fashion MNIST pre-processing steps will be near impossible.

You can and should use Fashion MNIST as a drop-in replacement for the MNIST digit dataset; however, if you are interested in actually recognizing fashion items in real-world images you should refer to the following two tutorials:

Both of the tutorials linked to above will guide you in building a more robust fashion classification system.

I hope you enjoyed today’s post!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Fashion MNIST with Keras and Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to train a Keras deep learning model to predict breast cancer in breast histology images.

Back 2012-2013 I was working for the National Institutes of Health (NIH) and the National Cancer Institute (NCI) to develop a suite of image processing and machine learning algorithms to automatically analyze breast histology images for cancer risk factors, a task that took trained pathologists hours to complete. Our work helped facilitate further advancements in breast cancer risk factor prediction

Back then deep learning was not as popular and “mainstream” as it is now. For example, the ImageNet image classification challenge had only launched in 2009 and it wasn’t until 2012 that Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton won the competition with the now infamous AlexNet architecture.

To analyze the cellular structures in the breast histology images we were instead leveraging basic computer vision and image processing algorithms, but combining them in a novel way. These algorithms worked really well — but also required quite a bit of work to put together.

Today I thought it would be worthwhile to explore deep learning in the context of breast cancer classification.

Just last year a close family member of mine was diagnosed with cancer. And similarly, I would be willing to bet that every single reader of this blog knows someone who has had cancer at some point as well.

As deep learning researchers, practitioners, and engineers it’s important for us to gain hands-on experience applying deep learning to medical and computer vision problems — this experience can help us develop deep learning algorithms to better aid pathologists in predicting cancer.

To learn how to train a Keras deep learning model for breast cancer prediction, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Breast cancer classification with Keras and Deep Learning

In the first part of this tutorial, we will be reviewing our breast cancer histology image dataset.

From there we’ll create a Python script to split the input dataset into three sets:

A training set
A validation set
A testing set

Next, we’ll use Keras to define a Convolutional Neural Network which we’ll appropriately name “CancerNet”.

Finally, we’ll create a Python script to train CancerNet on our breast histology images.

We’ll wrap the blog post by reviewing our results.

The breast cancer histology image dataset

Figure 1: The Kaggle Breast Histopathology Images dataset was curated by Janowczyk and Madabhushi and Roa et al. The most common form of breast cancer, Invasive Ductal Carcinoma (IDC), will be classified with deep learning and Keras.

The dataset we are using for today’s post is for Invasive Ductal Carcinoma (IDC), the most common of all breast cancer.

The dataset was originally curated by Janowczyk and Madabhushi and Roa et al. but is available in public domain on Kaggle’s website.

The original dataset consisted of 162 slide images scanned at 40x.

Slide images are naturally massive (in terms of spatial dimensions), so in order to make them easier to work with, a total of 277,524 patches of 50×50 pixels were extracted, including:

198,738 negative examples (i.e., no breast cancer)
78,786 positive examples (i.e., indicating breast cancer was found in the patch)

There is clearly an imbalance in the class data with over 2x the number of negative data points than positive data points.

Each image in the dataset has a specific filename structure. An example of an image filename in the dataset can be seen below:

10253_idx5_x1351_y1101_class0.png

We can interpret this filename as:

Patient ID: 10253_idx5
x-coordinate of the crop: 1,351
y-coordinate of the crop: 1,101
Class label: 0 (0 indicates no IDC while 1 indicates IDC)

Figure 1 above shows examples of both positive and negative samples — our goal is to train a deep learning model capable of discerning the difference between the two classes.

Preparing your deep learning environment for Cancer classification

All of the Python packages you will use here today are installable via pip, a Python package manager.

I recommend that you install them into a virtual environment for this project, or that you add to one of your existing data science environments. Virtual environments are outside the scope of today’s blog post, but all of my installation guides will show you how to set them up.

If you need to set up a full blown deep learning system using recent OS’es, including macOS Mojave or Ubuntu 18.04, visit the respective links.

Here’s the gist of what you’ll need after your system prerequisites and virtual environment are ready (provided you are using a Python virtual environment, of course):

$ workon <env_name> #if you are using a virtualenv
$ pip install numpy opencv-contrib-python
$ pip install tensorflow keras
$ pip install imutils
$ pip install scikit-learn matplotlib

Note: None of our scripts today require OpenCV, but

imutils

has an OpenCV dependency.

Project structure

Go ahead and grab the “Downloads” for today’s blog post.

From there, unzip the file:

$ cd path/to/downloaded/zip
$ unzip breast-cancer-classification.zip

Now that you have the files extracted, it’s time to put the dataset inside of the directory structure.

Go ahead and make the following directories:

$ cd breast-cancer-classification
$ mkdir datasets
$ mkdir datasets/orig

Then, head on over to Kaggle’s website and log-in. From there you can click the following link to download the dataset into your project folder:

Click here to download the data from Kaggle.

Note: You will need create an account on Kaggle’s website (if you don’t already have an account) to download the dataset.

Be sure to save the .zip file in the

breast-cancer-classification/datasets/orig

folder.

Now head back to your terminal, navigate to the directory you just created, and unzip the data:

$ cd path/to/breast-cancer-classification/datasets/orig
$ unzip IDC_regular_ps50_idx5.zip

And from there, let’s go back to the project directory and use the

tree

command to inspect our project structure:

$ cd ../..
$ tree --dirsfirst -L 4
.
├── datasets
│   └── orig
│       ├── 10253
│       │   ├── 0
│       │   └── 1
│       ├── 10254
│       │   ├── 0
│       │   └── 1
│       ├── 10255
│       │   ├── 0
│       │   └── 1
...[omitting similar folders]
│       ├── 9381
│       │   ├── 0
│       │   └── 1
│       ├── 9382
│       │   ├── 0
│       │   └── 1
│       ├── 9383
│       │   ├── 0
│       │   └── 1
│       └── IDC_regular_ps50_idx5.zip
├── pyimagesearch
│   ├── __init__.py
│   ├── config.py
│   └── cancernet.py
├── build_dataset.py
├── train_model.py
└── plot.png

840 directories, 7 files

As you can see, our dataset is in the

datasets/orig

folder and is then broken out by faux patient ID. These images are separated into either benign (

0/

) or malignant (

1/

) directories.

Today’s

pyimagesearch/

module contains our configuration and CancerNet.

Today we’ll review the following Python files in this order:

```
config.py
```
: Contains our configuration that will be used by both our dataset builder and model trainer.
```
build_dataset.py
```
: Builds our dataset by splitting images into training, validation, and testing sets.
```
cancernet.py
```
: Contains our CancerNet breast cancer classification CNN.
```
train_model.py
```
: Responsible for training and evaluating our Keras breast cancer classification model.

The configuration file

Before we can build our dataset and train our network let’s review our configuration file.

For deep learning projects that span multiple Python files (such as this one), I like to create a single Python configuration file that stores all relevant configurations.

Let’s go ahead and take a look at

config.py

# import the necessary packages
import os

# initialize the path to the *original* input directory of images
ORIG_INPUT_DATASET = "datasets/orig"

# initialize the base path to the *new* directory that will contain
# our images after computing the training and testing split
BASE_PATH = "datasets/idc"

# derive the training, validation, and testing directories
TRAIN_PATH = os.path.sep.join([BASE_PATH, "training"])
VAL_PATH = os.path.sep.join([BASE_PATH, "validation"])
TEST_PATH = os.path.sep.join([BASE_PATH, "testing"])

# define the amount of data that will be used training
TRAIN_SPLIT = 0.8

# the amount of validation data will be a percentage of the
# *training* data
VAL_SPLIT = 0.1

First, our configuration file contains the path to the original input dataset downloaded from Kaggle (Line 5).

From there we specify the base path to where we’re going to store our image files after creating the training, testing, and validation splits (Line 9).

Using the

BASE_PATH

, we derive paths to training, validation, and testing output directories (Lines 12-14).

Our

TRAIN_SPLIT

is the percentage of data that will be used for training (Line 17). Here I’ve set it to 80%, where the remaining 20% will be used for testing.

Of the training data, we’ll reserve some images for validation. Line 21 specifies that 10% of the training data (after we’ve split off the testing data) will be used for validation.

We’re now armed with the information required to build our breast cancer image dataset, so let’s move on.

Building the breast cancer image dataset

Figure 2: We will split our deep learning breast cancer image dataset into training, validation, and testing sets. While this 5.8GB deep learning dataset isn’t large compared to most datasets, I’m going to treat it like it is so you can learn by example. Thus, we will use the opportunity to put the Keras ImageDataGenerator to work, yielding small batches of images. This eliminates the need to have the whole dataset in memory.

Our breast cancer image dataset consists of 198,783 images, each of which is 50×50 pixels.

If we were to try to load this entire dataset in memory at once we would need a little over 5.8GB.

For most modern machines, especially machines with GPUs, 5.8GB is a reasonable size; however, I’ll be making the assumption that your machine does not have that much memory.

Instead, we’ll organize our dataset on disk so we can use Keras’ ImageDataGenerator class to yield batches of images from disk without having to keep the entire dataset in memory.

But first we need to organize our dataset. Let’s build a script to do so now.

Open up the

build_dataset.py

file and insert the following code:

# import the necessary packages
from pyimagesearch import config
from imutils import paths
import random
import shutil
import os

# grab the paths to all input images in the original input directory
# and shuffle them
imagePaths = list(paths.list_images(config.ORIG_INPUT_DATASET))
random.seed(42)
random.shuffle(imagePaths)

# compute the training and testing split
i = int(len(imagePaths) * config.TRAIN_SPLIT)
trainPaths = imagePaths[:i]
testPaths = imagePaths[i:]

# we'll be using part of the training data for validation
i = int(len(trainPaths) * config.VAL_SPLIT)
valPaths = trainPaths[:i]
trainPaths = trainPaths[i:]

# define the datasets that we'll be building
datasets = [
	("training", trainPaths, config.TRAIN_PATH),
	("validation", valPaths, config.VAL_PATH),
	("testing", testPaths, config.TEST_PATH)
]

This script requires that we

import

our

config

settings and

paths

for collecting all the image paths. We also will use

random

to randomly shuffle our paths,

shutil

to copy images, and

os

for joining paths and making directories. Each of these imports is listed on Lines 2-6.

To begin, we’ll grab all the

imagePaths

for our dataset and

shuffle

them (Lines 10-12).

We then compute the index of the training/testing split (Line 15). Using that index,

, our

trainPaths

and

testPaths

are constructed via slicing the

imagePaths

(Lines 16 and 17).

Our

trainPaths

are further split, this time reserving a portion for validation,

valPaths

(Lines 20-22).

Lines 25-29 define a list called

datasets

. Inside are three tuples, each with the information required to organize all of our

imagePaths

into training, validation, and testing data.

Let’s go ahead and loop over the

datasets

list now:

# loop over the datasets
for (dType, imagePaths, baseOutput) in datasets:
	# show which data split we are creating
	print("[INFO] building '{}' split".format(dType))

	# if the output base output directory does not exist, create it
	if not os.path.exists(baseOutput):
		print("[INFO] 'creating {}' directory".format(baseOutput))
		os.makedirs(baseOutput)

	# loop over the input image paths
	for inputPath in imagePaths:
		# extract the filename of the input image and extract the
		# class label ("0" for "negative" and "1" for "positive")
		filename = inputPath.split(os.path.sep)[-1]
		label = filename[-5:-4]

		# build the path to the label directory
		labelPath = os.path.sep.join([baseOutput, label])

		# if the label output directory does not exist, create it
		if not os.path.exists(labelPath):
			print("[INFO] 'creating {}' directory".format(labelPath))
			os.makedirs(labelPath)

		# construct the path to the destination image and then copy
		# the image itself
		p = os.path.sep.join([labelPath, filename])
		shutil.copy2(inputPath, p)

On Line 32, we define a loop over our dataset splits. Inside, we:

Create the base output directory (Lines 37-39).
Implement a nested loop over all input images in the current split (Line 42):
- Extract the
```
filename
```
  from the input path (Line 45) and then extract the class
```
label
```
  from the filename (Line 46).
- Build our output
```
labelPath
```
  as well as create the label output directory (Lines 49-54).
- And finally, copy each file into its destination (Lines 58 and 59).

Now that our script is coded up, go ahead and create the training, testing, and validation split directory structure by executing the following command:

$ python build_dataset.py
[INFO] building 'training' split
[INFO] 'creating datasets/idc/training' directory
[INFO] 'creating datasets/idc/training/0' directory
[INFO] 'creating datasets/idc/training/1' directory
[INFO] building 'validation' split
[INFO] 'creating datasets/idc/validation' directory
[INFO] 'creating datasets/idc/validation/0' directory
[INFO] 'creating datasets/idc/validation/1' directory
[INFO] building 'testing' split
[INFO] 'creating datasets/idc/testing' directory
[INFO] 'creating datasets/idc/testing/0' directory
[INFO] 'creating datasets/idc/testing/1' directory
$ 
$ tree --dirsfirst --filelimit 10
.
├── datasets
│   ├── idc
│   │   ├── training
│   │   │   ├── 0 [143065 entries]
│   │   │   └── 1 [56753 entries]
│   │   ├── validation
│   │   |   ├── 0 [15962 entries]
│   │   |   └── 1 [6239 entries]
│   │   └── testing
│   │       ├── 0 [39711 entries]
│   │       └── 1 [15794 entries]
│   └── orig [280 entries]
├── pyimagesearch
│   ├── __init__.py
│   ├── config.py
│   └── cancernet.py
├── build_dataset.py
├── train_model.py
└── plot.png

14 directories, 8 files

The output of our script is shown under the command.

I’ve also executed the

tree

command again so you can see how our dataset is now structured into our training, validation, and testing sets.

Note: I didn’t bother expanding our original

datasets/orig/

structure — you can scroll up to the “Project Structure” section if you need a refresher.

CancerNet: Our breast cancer prediction CNN

Figure 3: Our Keras deep learning classification architecture for predicting breast cancer (click to expand)

The next step is to implement the CNN architecture we are going to use for this project.

To implement the architecture I used the Keras deep learning library and designed a network appropriately named “CancerNet” which:

Uses exclusively 3×3 CONV filters, similar to VGGNet
Stacks multiple 3×3 CONV filters on top of each other prior to performing max-pooling (again, similar to VGGNet)
But unlike VGGNet, uses depthwise separable convolution rather than standard convolution layers

Depthwise separable convolution is not a “new” idea in deep learning.

In fact, they were first utilized by Google Brain intern, Laurent Sifre in 2013.

Andrew Howard utilized them in 2015 when working with MobileNet.

And perhaps most notably, Francois Chollet used them in 2016-2017 when creating the famous Xception architecture.

A detailed explanation of the differences between standard convolution layers and depthwise separable convolution is outside the scope of this tutorial (for that, refer to this guide), but the gist is that depthwise separable convolution:

Is more efficient.
Requires less memory.
Requires less computation.
Can perform better than standard convolution in some situations.

I haven’t used depthwise separable convolution in any tutorials here on PyImageSearch so I thought it would be fun to play with it today.

With that said, let’s get started implementing CancerNet!

Open up the

cancernet.py

file and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import SeparableConv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K

class CancerNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

Our Keras imports are listed on Lines 2-10. We’ll be using Keras’

Sequential

API to build

CancerNet

An import you haven’t seen on the PyImageSearch blog is

SeparableConv2D

. This convolutional layer type allows for depthwise convolutions. For further details, please refer to the documentation.

The remaining imports/layer types are all discussed in both my introductory Keras Tutorial and in even greater detail inside of Deep Learning for Computer Vision with Python.

Let’s go ahead and define our

CancerNet

class on Line 12 and then proceed to

build

it on Line 14.

The

build

method requires four parameters:

```
width
```
,
```
height
```
, and
```
depth
```
: Here we specify the input image volume shape to our network, where
```
depth
```
is the number of color channels each image contains.
```
classes
```
: The number of classes our network will predict (for
```
CancerNet
```
, it will be
```
2
```
).

We go ahead and initialize our

model

on Line 17 and subsequently, specify our

inputShape

(Line 18). In the case of using TensorFlow as our backend, we’re now ready to add layers.

Other backends that specify

"channels_first"

require that we place the

depth

at the front of the

inputShape

and image dimensions following (Lines 23-24).

Let’s define our

DEPTHWISE_CONV => RELU => POOL

layers:

# CONV => RELU => POOL
		model.add(SeparableConv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# (CONV => RELU => POOL) * 2
		model.add(SeparableConv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(SeparableConv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# (CONV => RELU => POOL) * 3
		model.add(SeparableConv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(SeparableConv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(SeparableConv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

Three

DEPTHWISE_CONV => RELU => POOL

blocks are defined here with increasing stacking and number of filters. I’ve applied

BatchNormalization

and

Dropout

as well.

Let’s append our fully connected head:

# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(256))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

Our

FC => RELU

layers and softmax classifier make the head of the network.

The output of the softmax classifier will be the prediction percentages for each class our model will predict.

Finally, our

model

is returned to the training script.

Our training script

The last piece of the puzzle we need to implement is our actual training script.

Create a new file named

train_model.py

, open it up, and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import LearningRateScheduler
from keras.optimizers import Adagrad
from keras.utils import np_utils
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from pyimagesearch.cancernet import CancerNet
from pyimagesearch import config
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our imports come from 7 places:

```
matplotlib
```
: A scientific plotting package that is the de-facto standard for Python. On Line 3 we set matplotlib to use the
```
"Agg"
```
backend so that we’re able to save our training plots to disk.

keras

: We’ll be taking advantage of the

ImageDataGenerator

LearningRateScheduler

Adagrad

optimizer, and

np_utils

```
sklearn
```
: From scikit-learn we’ll need it’s implementation of a
```
classification_report
```
and a
```
confusion_matrix
```
.
```
pyimagesearch
```
: We’re going to be putting our newly defined CancerNet to use (training and evaluating it). We’ll also need our config to grab the paths to our three data splits. This module is not pip-installable; it is included the “Downloads” section of today’s post.
```
imutils
```
: I’ve made my convenience functions publicly available as a pip-installable package. We’ll be using the
```
paths
```
module to grab paths to each of our images.
```
numpy
```
: The typical tool used by data scientists for numerical processing with Python.
Python: Both
```
argparse
```
and
```
os
```
are built into Python installations. We’ll use argparse to parse a command line argument.

Let’s parse our one and only command line argument,

--plot

. With this argument provided in a terminal at runtime, our script will be able to dynamically accept different plot filenames. If you don’t specify a command line argument with the plot filename, a default of

plot.png

will be used.

Now that we’ve imported the required libraries and we’ve parsed command line arguments, let’s define training parameters including our training image paths and account for class imbalance:

# initialize our number of epochs, initial learning rate, and batch
# size
NUM_EPOCHS = 40
INIT_LR = 1e-2
BS = 32

# determine the total number of image paths in training, validation,
# and testing directories
trainPaths = list(paths.list_images(config.TRAIN_PATH))
totalTrain = len(trainPaths)
totalVal = len(list(paths.list_images(config.VAL_PATH)))
totalTest = len(list(paths.list_images(config.TEST_PATH)))

# account for skew in the labeled data
trainLabels = [int(p.split(os.path.sep)[-2]) for p in trainPaths]
trainLabels = np_utils.to_categorical(trainLabels)
classTotals = trainLabels.sum(axis=0)
classWeight = classTotals.max() / classTotals

Lines 28-30 define the number of training epochs, initial learning rate, and batch size.

From there, we grab our training image paths and determine the total number of images in each of the splits (Lines 34-37).

We’ll go ahead and compute the

classWeight

for our training data to account for class imbalance/skew.

Let’s initialize our data augmentation object:

# initialize the training data augmentation object
trainAug = ImageDataGenerator(
	rescale=1 / 255.0,
	rotation_range=20,
	zoom_range=0.05,
	width_shift_range=0.1,
	height_shift_range=0.1,
	shear_range=0.05,
	horizontal_flip=True,
	vertical_flip=True,
	fill_mode="nearest")

# initialize the validation (and testing) data augmentation object
valAug = ImageDataGenerator(rescale=1 / 255.0)

Data augmentation, a form of regularization, is important for nearly all deep learning experiments to assist with model generalization. The method purposely perturbs training examples, changing their appearance slightly, before passing them into the network for training. This partially alleviates the need to gather more training data, though more training data will rarely hurt your model.

Our data augmentation object,

trainAug

is initialized on Lines 46-55. As you can see, random rotations, shifts, shears, and flips will be applied to our data as it is generated. Rescaling our image pixel intensities to the range [0, 1] is handled by the

trainAug

generator as well as the

valAug

generator defined on Line 58.

Let’s initialize each of our generators now:

# initialize the training generator
trainGen = trainAug.flow_from_directory(
	config.TRAIN_PATH,
	class_mode="categorical",
	target_size=(48, 48),
	color_mode="rgb",
	shuffle=True,
	batch_size=BS)

# initialize the validation generator
valGen = valAug.flow_from_directory(
	config.VAL_PATH,
	class_mode="categorical",
	target_size=(48, 48),
	color_mode="rgb",
	shuffle=False,
	batch_size=BS)

# initialize the testing generator
testGen = valAug.flow_from_directory(
	config.TEST_PATH,
	class_mode="categorical",
	target_size=(48, 48),
	color_mode="rgb",
	shuffle=False,
	batch_size=BS)

Here we initialize the training, validation, and testing generator. Each generator will provide batches of images on demand, as is denoted by the

batch_size

parameter.

Let’s go ahead and initialize our

model

and start training!

# initialize our CancerNet model and compile it
model = CancerNet.build(width=48, height=48, depth=3,
	classes=2)
opt = Adagrad(lr=INIT_LR, decay=INIT_LR / NUM_EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# fit the model
H = model.fit_generator(
	trainGen,
	steps_per_epoch=totalTrain // BS,
	validation_data=valGen,
	validation_steps=totalVal // BS,
	class_weight=classWeight,
	epochs=NUM_EPOCHS)

Our model is initialized with the

Adagrad

optimizer on Lines 88-90.

We then

compile

our model with a

"binary_crossentropy"

loss

function (since we only have two classes of data), as well as learning rate decay (Lines 91 and 92).

Making a call to the Keras fit_generator method, our training process is initiated. Using this method, our image data can reside on disk and be yielded in batches rather than having the whole dataset in RAM throughout training. While not 100% necessary for today’s 5.8GB dataset, you can see how useful this is if you had a 200GB dataset, for example.

After training is complete, we’ll evaluate the model on the testing data:

# reset the testing generator and then use our trained model to
# make predictions on the data
print("[INFO] evaluating network...")
testGen.reset()
predIdxs = model.predict_generator(testGen,
	steps=(totalTest // BS) + 1)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report
print(classification_report(testGen.classes, predIdxs,
	target_names=testGen.class_indices.keys()))

Lines 107 and 108 make predictions on all of our testing data (again using a generator object).

The highest prediction indices are grabbed for each sample (Line 112) and then a

classification_report

is printed conveniently to the terminal (Lines 115 and 116).

Let’s gather additional evaluation metrics:

# compute the confusion matrix and and use it to derive the raw
# accuracy, sensitivity, and specificity
cm = confusion_matrix(testGen.classes, predIdxs)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])

# show the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))

Here we compute the

confusion_matrix

and then derive the accuracy,

sensitivity

, and

specificity

(Lines 120-124). The matrix and each of these values is then printed in our terminal (Lines 127-130).

Finally, let’s generate and store our training plot:

# plot the training loss and accuracy
N = NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Our training history plot consists of training/validation loss and training/validation accuracy. These are plotted over time so that we can spot over/underfitting.

Breast cancer prediction results

We’ve now implemented all the necessary Python scripts!

Let’s go ahead and train CancerNet on our breast cancer dataset.

Before continuing, ensure you have:

Configured your deep learning environment with the necessary libraries/packages listed in the “Preparing your deep learning environment for Cancer classification” section.
Used the “Downloads” section of this tutorial to download the source code.
Downloaded the breast cancer dataset from Kaggle’s website.
Unzipped the dataset and executed the
```
build_dataset.py
```
script to create the necessary image + directory structure.

After you’ve ticked off the four items above, open up a terminal and execute the following command:

$ python train_model.py
Found 199818 images belonging to 2 classes.
Found 22201 images belonging to 2 classes.
Found 55505 images belonging to 2 classes.
Epoch 1/40
6244/6244 [==============================] - 255s 41ms/step - loss: 0.3648 - acc: 0.8453 - val_loss: 0.4504 - val_acc: 0.8062
Epoch 2/40
6244/6244 [==============================] - 254s 41ms/step - loss: 0.3382 - acc: 0.8563 - val_loss: 0.3790 - val_acc: 0.8410
Epoch 3/40
6244/6244 [==============================] - 253s 41ms/step - loss: 0.3341 - acc: 0.8577 - val_loss: 0.3941 - val_acc: 0.8348
...
Epoch 38/40
6244/6244 [==============================] - 252s 40ms/step - loss: 0.3230 - acc: 0.8636 - val_loss: 0.3565 - val_acc: 0.8520
Epoch 39/40
6244/6244 [==============================] - 252s 40ms/step - loss: 0.3237 - acc: 0.8629 - val_loss: 0.3565 - val_acc: 0.8515
Epoch 40/40
6244/6244 [==============================] - 252s 40ms/step - loss: 0.3234 - acc: 0.8636 - val_loss: 0.3594 - val_acc: 0.8507
[INFO] evaluating network...
              precision    recall  f1-score   support

           0       0.93      0.85      0.89     39808
           1       0.69      0.85      0.76     15697

   micro avg       0.85      0.85      0.85     55505
   macro avg       0.81      0.85      0.83     55505
weighted avg       0.86      0.85      0.85     55505

[[33847  5961]
 [ 2402 13295]]
acc: 0.8493
sensitivity: 0.8503
specificity: 0.8470

Figure 4: Our CancerNet classification model training plot generated with Keras.

Looking at our output you can see that our model achieved ~85% accuracy; however, that raw accuracy is heavily weighted by the fact that we classified “benign/no cancer” correctly 93% of the time.

To understand our model’s performance at a deeper level we compute the sensitivity and the specificity.

Our sensitivity measures the proportion of the true positives that were also predicted as positive (85.03%).

Conversely, specificity measures our true negatives (84.70%).

We need to be really careful with our false negative here — we don’t want to classify someone as “No cancer” when they are in fact “Cancer positive”.

Our false positive rate is also important — we don’t want to mistakenly classify someone as “Cancer positive” and then subject them to painful, expensive, and invasive treatments when they don’t actually need them.

There is always a balance between sensitivity and specificity that a machine learning/deep learning engineer and practitioner must manage, but when it comes to deep learning and healthcare/health treatment, that balance becomes extremely important.

For more information on sensitivity, specificity, true positives, false negatives, true negatives, and false positives, refer to this guide.

Summary

In this tutorial, you learned how to use the Keras deep learning library to train a Convolutional Neural Network for breast cancer classification.

To accomplish this task, we leveraged a breast cancer histology image dataset curated by Janowczyk and Madabhushi and Roa et al.

The histology images themselves are massive (in terms of image size on disk and spatial dimensions when loaded into memory), so in order to make the images easier for us to work with them, Paul Mooney, part of the community advocacy team at Kaggle, converted the dataset to 50×50 pixel image patches and then uploaded the modified dataset directly to the Kaggle dataset archive.

A total of 277,524 images belonging to two classes are included in the dataset:

Positive (+): 78,786
Negative (-): 198,738

Here we can see there is a class imbalance in the data with over 2x more negative samples than positive samples.

The class imbalance, along with the challenging nature of the dataset, lead to us obtaining ~86% classification accuracy, ~85% sensitivity, and ~85% specificity.

I invite you to use this code as a template for starting your own breast cancer classification experiments.

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Breast cancer classification with Keras and Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to colorize black and white images using OpenCV, Deep Learning, and Python.

Image colorization is the process of taking an input grayscale (black and white) image and then producing an output colorized image that represents the semantic colors and tones of the input (for example, an ocean on a clear sunny day must be plausibly “blue” — it can’t be colored “hot pink” by the model).

Previous methods for image colorization either:

Relied on significant human interaction and annotation
Produced desaturated colorization

The novel approach we are going to use here today instead relies on deep learning. We will utilize a Convolutional Neural Network capable of colorizing black and white images with results that can even “fool” humans!

To learn how to perform black and white image coloration with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Black and white image colorization with OpenCV and Deep Learning

In the first part of this tutorial, we’ll discuss how deep learning can be utilized to colorize black and white images.

From there we’ll utilize OpenCV to colorize black and white images for both:

Images
Video streams

We’ll then explore some examples and demos of our work.

How can we colorize black and white images with deep learning?

Figure 1: Zhang et al.’s architecture for colorization of black and white images with deep learning.

The technique we’ll be covering here today is from Zhang et al.’s 2016 ECCV paper, Colorful Image Colorization.

Previous approaches to black and white image colorization relied on manual human annotation and often produced desaturated results that were not “believable” as true colorizations.

Zhang et al. decided to attack the problem of image colorization by using Convolutional Neural Networks to “hallucinate” what an input grayscale image would look like when colorized.

To train the network Zhang et al. started with the ImageNet dataset and converted all images from the RGB color space to the Lab color space.

Similar to the RGB color space, the Lab color space has three channels. But unlike the RGB color space, Lab encodes color information differently:

The L channel encodes lightness intensity only
The a channel encodes green-red.
And the b channel encodes blue-yellow

A full review of the Lab color space is outside the scope of this post (see this guide for more information on Lab), but the gist here is that Lab does a better job representing how humans see color.

Since the L channel encodes only the intensity, we can use the L channel as our grayscale input to the network.

From there the network must learn to predict the a and b channels. Given the input L channel and the predicted ab channels we can then form our final output image.

The entire (simplified) process can be summarized as:

Convert all training images from the RGB color space to the Lab color space.
Use the L channel as the input to the network and train the network to predict the ab channels.
Combine the input L channel with the predicted ab channels.
Convert the Lab image back to RGB.

To produce more plausible black and white image colorizations the authors also utilize a few additional techniques including mean annealing and a specialized loss function for color rebalancing (both of which are outside the scope of this post).

For more details on the image colorization algorithm and deep learning model, be sure to refer to the official publication of Zhang et al.

Project structure

Go ahead and download the source code, model, and example images using the “Downloads” section of this post.

Once you’ve extracted the zip, you should navigate into the project directory.

From there, let’s use the

tree

command to inspect the project structure:

$ tree --dirsfirst
.
├── images
│   ├── adrian_and_janie.png
│   ├── albert_einstein.jpg
│   ├── mark_twain.jpg
│   └── robin_williams.jpg
├── model
│   ├── colorization_deploy_v2.prototxt
│   ├── colorization_release_v2.caffemodel
│   └── pts_in_hull.npy
├── bw2color_image.py
└── bw2color_video.py

2 directories, 9 files

We have four sample black and white images in the

images/

directory.

Our Caffe model and prototxt are inside the

model/

directory along with the cluster points NumPy file.

We’ll be reviewing two scripts today:

```
bw2color_image.py
```
```
bw2color_video.py
```

The image script can process any black and white (also known as grayscale) image you pass in.

Our video script will either use your webcam or accept an input video file and then perform colorization.

Colorizing black and white images with OpenCV

Let’s go ahead and implement black and white image colorization script with OpenCV.

Open up the

bw2color_image.py

file and insert the following code:

# import the necessary packages
import numpy as np
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str, required=True,
	help="path to input black and white image")
ap.add_argument("-p", "--prototxt", type=str, required=True,
	help="path to Caffe prototxt file")
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--points", type=str, required=True,
	help="path to cluster center points")
args = vars(ap.parse_args())

Our colorizer script only requires three imports: NumPy, OpenCV, and

argparse

Let’s go ahead and use argparse to parse command line arguments. This script requires that these four arguments be passed to the script directly from the terminal:

```
--image
```
: The path to our input black/white image.
```
--prototxt
```
: Our path to the Caffe prototxt file.
```
--model
```
. Our path to the Caffe pre-trained model.
```
--points
```
: The path to a NumPy cluster center points file.

With the above four flags and corresponding arguments, the script will be able to run with different inputs without changing any code.

Let’s go ahead and load our model and cluster centers into memory:

# load our serialized black and white colorizer model and cluster
# center points from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
pts = np.load(args["points"])

# add the cluster centers as 1x1 convolutions to the model
class8 = net.getLayerId("class8_ab")
conv8 = net.getLayerId("conv8_313_rh")
pts = pts.transpose().reshape(2, 313, 1, 1)
net.getLayer(class8).blobs = [pts.astype("float32")]
net.getLayer(conv8).blobs = [np.full([1, 313], 2.606, dtype="float32")]

Line 21 loads our Caffe model directly from the command line argument values. OpenCV can read Caffe models via the

cv2.dnn.readNetFromCaffe

function.

Line 22 then loads the cluster center points directly from the command line argument path to the points file. This file is in NumPy format so we’re using

np.load

From there, Lines 25-29:

Load centers for ab channel quantization used for rebalancing.
Treat each of the points as 1×1 convolutions and add them to the model.

Now let’s load, scale, and convert our image:

# load the input image from disk, scale the pixel intensities to the
# range [0, 1], and then convert the image from the BGR to Lab color
# space
image = cv2.imread(args["image"])
scaled = image.astype("float32") / 255.0
lab = cv2.cvtColor(scaled, cv2.COLOR_BGR2LAB)

To load our input image from the file path, we use

cv2.imread

on Line 34.

Preprocessing steps include:

Scaling pixel intensities to the range [0, 1] (Line 35).
Converting from BGR to Lab color space (Line 36).

Let’s continue with our preprocessing:

# resize the Lab image to 224x224 (the dimensions the colorization
# network accepts), split channels, extract the 'L' channel, and then
# perform mean centering
resized = cv2.resize(lab, (224, 224))
L = cv2.split(resized)[0]
L -= 50

We’ll go ahead and resize the input image to 224×224 (Line 41), the required input dimensions for the network.

Then we grab the

channel only (i.e., the input) and perform mean subtraction (Lines 42 and 43).

Now we can pass the input L channel through the network to predict the ab channels:

# pass the L channel through the network which will *predict* the 'a'
# and 'b' channel values
'print("[INFO] colorizing image...")'
net.setInput(cv2.dnn.blobFromImage(L))
ab = net.forward()[0, :, :, :].transpose((1, 2, 0))

# resize the predicted 'ab' volume to the same dimensions as our
# input image
ab = cv2.resize(ab, (image.shape[1], image.shape[0]))

A forward pass of the

channel through the network takes place on Lines 48 and 49 (here is a refresher on OpenCV’s blobFromImage if you need it).

Notice that after we called

net.forward

, on the same line, we went ahead and extracted the predicted

ab

volume. I make it look easy here, but refer to the Zhang et al. documentation and demo on GitHub if you would like more details.

From there, we resize the predicted

ab

volume to be the same dimensions as our input image (Line 53).

Now comes the time for post-processing. Stay with me here as we essentially go in reverse for some of our previous steps:

# grab the 'L' channel from the *original* input image (not the
# resized one) and concatenate the original 'L' channel with the
# predicted 'ab' channels
L = cv2.split(lab)[0]
colorized = np.concatenate((L[:, :, np.newaxis], ab), axis=2)

# convert the output image from the Lab color space to RGB, then
# clip any values that fall outside the range [0, 1]
colorized = cv2.cvtColor(colorized, cv2.COLOR_LAB2BGR)
colorized = np.clip(colorized, 0, 1)

# the current colorized image is represented as a floating point
# data type in the range [0, 1] -- let's convert to an unsigned
# 8-bit integer representation in the range [0, 255]
colorized = (255 * colorized).astype("uint8")

# show the original and output colorized images
cv2.imshow("Original", image)
cv2.imshow("Colorized", colorized)
cv2.waitKey(0)

Post processing includes:

Grabbing the
```
L
```
channel from the original input image (Line 58) and concatenating the original
```
L
```
channel and predicted
```
ab
```
channels together forming
```
colorized
```
(Line 59).
Converting the
```
colorized
```
image from the Lab color space to RGB (Line 63).
Clipping any pixel intensities that fall outside the range [0, 1] (Line 64).
Bringing the pixel intensities back into the range [0, 255] (Line 69). During the preprocessing steps (Line 35) we divided by
```
255
```
and now we are multiplying by
```
255
```
. I’ve also found that this scaling and
```
"uint8"
```
conversion isn’t a requirement but that it helps the code work between OpenCV 3.4.x and 4.x versions.

Finally, both our original

image

and

colorized

images are displayed on the screen!

Image colorization results

Now that we’ve implemented our image colorization script, let’s give it a try.

Make sure you’ve used the “Downloads” section of this blog post to download the source code, colorization model, and example images.

From there, open up a terminal, navigate to where you downloaded the source code, and execute the following command:

$ python bw2color_image.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy \
	--image images/robin_williams.jpg
[INFO] loading model...

Figure 2: Grayscale image colorization with OpenCV and deep learning. This is a picture of famous late actor, Robin Williams.

On the left, you can see the original input image of Robin Williams, a famous actor and comedian who passed away ~5 years ago.

On the right, you can see the output of the black and white colorization model.

Let’s try another image, this one of Albert Einstein:

$ python bw2color_image.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy
	--image images/albert_einstein.jpg
[INFO] loading model...

Figure 3: Image colorization using deep learning and OpenCV. This is an image of Albert Einstein.

I’m particularly impressed by this image colorization.

Notice how the water is an appropriate shade of blue while Einstein’s shirt is white and his pants are khaki — all of these are plausible colorizations.

Here is another example image, this one of Mark Twain, one of my all-time favorite authors:

$ python bw2color_image.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy
	--image images/mark_twain.jpg
[INFO] loading model...

Figure 4: A black/white image of Mark Twain has undergone colorization via OpenCV and deep learning.

Here we can see that the grass and foliage are correctly colored a shade of green, although you can see these shades of green blending into Twain’s shoes and hands.

The final image demonstrates a not-so-great black and white image colorization with OpenCV:

$ python bw2color_image.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy
	--image images/adrian_and_janie.png
[INFO] loading model...

Figure 5: Janie is the puppers we recently adopted into our family. This is her first snow day. Black and white cameras/images are great for snow, but I wanted to see how image colorization would turn out with OpenCV and deep learning.

This photo is of myself and Janie, my beagle puppy, during a snowstorm a few weeks ago.

Here you can see that while the snow, Janie, my jacket, and even the gazebo in the background are correctly colored, my blue jeans are actually red.

Not all image colorizations will be perfect but the results here today do demonstrate the plausibility of the Zhang et al. approach.

Real-time black and white video colorization with OpenCV

We’ve already seen how we can apply black and white image colorization to images — but can we do the same with video streams?

You bet we can.

This script follows the same process as above except we’ll be processing frames of a video stream. I’ll be reviewing it in less detail and focusing on the frame grabbing + processing aspects.

Open up the

bw2color_video.py

and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str,
	help="path to optional input video (webcam will be used otherwise)")
ap.add_argument("-p", "--prototxt", type=str, required=True,
	help="path to Caffe prototxt file")
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--points", type=str, required=True,
	help="path to cluster center points")
ap.add_argument("-w", "--width", type=int, default=500,
	help="input width dimension of frame")
args = vars(ap.parse_args())

Our video script requires two additional imports:

```
VideoStream
```
allows us to grab frames from a webcam or video file
```
time
```
will be used to pause to allow a webcam to warm up

Let’s initialize our

VideoStream

now:

# initialize a boolean used to indicate if either a webcam or input
# video is being used
webcam = not args.get("input", False)

# if a video path was not supplied, grab a reference to the webcam
if webcam:
	print("[INFO] starting video stream...")
	vs = VideoStream(src=0).start()
	time.sleep(2.0)

# otherwise, grab a reference to the video file
else:
	print("[INFO] opening video file...")
	vs = cv2.VideoCapture(args["input"])

Depending on whether we’re working with a

webcam

or video file, we’ll create our

vs

(i.e., “video stream”) object here.

From there, we’ll load the colorizer deep learning model and cluster centers (the same way we did in our previous script):

# load our serialized black and white colorizer model and cluster
# center points from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
pts = np.load(args["points"])

# add the cluster centers as 1x1 convolutions to the model
class8 = net.getLayerId("class8_ab")
conv8 = net.getLayerId("conv8_313_rh")
pts = pts.transpose().reshape(2, 313, 1, 1)
net.getLayer(class8).blobs = [pts.astype("float32")]
net.getLayer(conv8).blobs = [np.full([1, 313], 2.606, dtype="float32")]

Now we’ll start an infinite

while

loop over incoming frames. We’ll process the frames directly in the loop:

# loop over frames from the video stream
while True:
	# grab the next frame and handle if we are reading from either
	# VideoCapture or VideoStream
	frame = vs.read()
	frame = frame if webcam else frame[1]

	# if we are viewing a video and we did not grab a frame then we
	# have reached the end of the video
	if not webcam and frame is None:
		break

	# resize the input frame, scale the pixel intensities to the
	# range [0, 1], and then convert the frame from the BGR to Lab
	# color space
	frame = imutils.resize(frame, width=args["width"])
	scaled = frame.astype("float32") / 255.0
	lab = cv2.cvtColor(scaled, cv2.COLOR_BGR2LAB)

	# resize the Lab frame to 224x224 (the dimensions the colorization
	# network accepts), split channels, extract the 'L' channel, and
	# then perform mean centering
	resized = cv2.resize(lab, (224, 224))
	L = cv2.split(resized)[0]
	L -= 50

Each frame from our

vs

is grabbed on Lines 55 and 56. A check is made for a

None

type

frame

— when this occurs, we’ve reached the end of a video file (if we’re processing a video file) and we can

break

from the loop (Lines 60 and 61).

Preprocessing (just as before) is conducted on Lines 66-75. This is where we resize, scale, and convert to Lab. Then we grab the

channel, and perform mean subtraction.

Let’s now apply deep learning colorization and post-process the result:

# pass the L channel through the network which will *predict* the
	# 'a' and 'b' channel values
	net.setInput(cv2.dnn.blobFromImage(L))
	ab = net.forward()[0, :, :, :].transpose((1, 2, 0))

	# resize the predicted 'ab' volume to the same dimensions as our
	# input frame, then grab the 'L' channel from the *original* input
	# frame (not the resized one) and concatenate the original 'L'
	# channel with the predicted 'ab' channels
	ab = cv2.resize(ab, (frame.shape[1], frame.shape[0]))
	L = cv2.split(lab)[0]
	colorized = np.concatenate((L[:, :, np.newaxis], ab), axis=2)

	# convert the output frame from the Lab color space to RGB, clip
	# any values that fall outside the range [0, 1], and then convert
	# to an 8-bit unsigned integer ([0, 255] range)
	colorized = cv2.cvtColor(colorized, cv2.COLOR_LAB2BGR)
	colorized = np.clip(colorized, 0, 1)
	colorized = (255 * colorized).astype("uint8")

Our deep learning forward pass of

through the network results in the predicted

ab

channel.

Then we’ll post-process the result to from our

colorized

image (Lines 86-95). This is where we resize, grab our original

, and concatenate our predicted

ab

. From there, we convert from Lab to RGB, clip, and scale.

If you followed along closely above, you’ll remember that all we do next is display the results:

# show the original and final colorized frames
	cv2.imshow("Original", frame)
	cv2.imshow("Grayscale", cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY))
	cv2.imshow("Colorized", colorized)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# if we are using a webcam, stop the camera video stream
if webcam:
	vs.stop()

# otherwise, release the video file pointer
else:
	vs.release()

# close any open windows
cv2.destroyAllWindows()

Our original webcam

frame

is shown along with our grayscale image and

colorized

result.

If the

"q"

key

is pressed, we’ll

break

from the loop and cleanup.

That’s all there is to it!

Video colorization results

Let’s go ahead and give our video black and white colorization script a try.

Make sure you use the “Downloads” section of this tutorial to download the source code and colorization model.

From there, open up a terminal and execute the following command to have the colorizer run on your webcam:

$ python bw2color_video.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy

Figure 6: Black and white image colorization in video with OpenCV and deep learning demo.

If you want to run the colorizer on a video file you can use the following command:

$ python bw2color_video.py \
	--prototxt model/colorization_deploy_v2.prototxt \
	--model model/colorization_release_v2.caffemodel \
	--points model/pts_in_hull.npy
	--input video/jurassic_park_intro.mp4

Credits:

The model here is running in close to real-time on my 3Ghz Intel Xeon W.

With a GPU, real-time performance could certainly be obtained; however, keep in mind that GPU support for OpenCV’s “dnn” module is currently a bit limited and it, unfortunately, does not yet support NVIDIA GPUs.

Summary

In today’s tutorial, you learned how to colorize black and white images using OpenCV and Deep Learning.

The image colorization model we used here today was first introduced by Zhang et al. in their 2016 publication, Colorful Image Colorization.

Using this model, we were able to colorize both:

Black and white images
Black and white videos

Our results, while not perfect, demonstrated the plausibility of automatically colorizing black and white images and videos.

According to Zhang et al., their approach was able to “fool” humans 32% of the time!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Black and white image colorization with OpenCV and Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to apply Holistically-Nested Edge Detection (HED) with OpenCV and Deep Learning. We’ll apply Holistically-Nested Edge Detection to both images and video streams, followed by comparing the results to OpenCV’s standard Canny edge detector.

Edge detection enables us to find the boundaries of objects in images and was one of the first applied use cases of image processing and computer vision.

When it comes to edge detection with OpenCV you’ll most likely utilize the Canny edge detector; however, there are a few problems with the Canny edge detector, namely:

Setting the lower and upper values to the hysteresis thresholding is a manual process which requires experimentation and visual validation.
Hysteresis thresholding values that work well for one image may not work well for another (this is nearly always true for images captured in varying lighting conditions).
The Canny edge detector often requires a number of preprocessing steps (i.e. conversion to grayscale, blurring/smoothing, etc.) in order to obtain a good edge map.

Holistically-Nested Edge Detection (HED) attempts to address the limitations of the Canny edge detector through an end-to-end deep neural network.

This network accepts an RGB image as an input and then produces an edge map as an output. Furthermore, the edge map produced by HED does a better job preserving object boundaries in the image.

To learn more about Holistically-Nested Edge Detection with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Holistically-Nested Edge Detection with OpenCV and Deep Learning

In this tutorial we will learn about Holistically-Nested Edge Detection (HED) using OpenCV and Deep Learning.

We’ll start by discussing the Holistically-Nested Edge Detection algorithm.

From there we’ll review our project structure and then utilize HED for edge detection in both images and video.

Let’s go ahead and get started!

What is Holistically-Nested Edge Detection?

Figure 1: Holistically-Nested Edge Detection with OpenCV and Deep Learning (source: 2015 Xie and Tu Figure 1)

The algorithm we’ll be using here today is from Xie and Tu’s 2015 paper, Holistically-Nested Edge Detection, or simply “HED” for short.

The work of Xie and Tu describes a deep neural network capable of automatically learning rich hierarchical edge maps that are capable of determining the edge/object boundary of objects in images.

This edge detection network is capable of obtaining state-of-the-art results on the Berkely BSDS500 and NYU Depth datasets.

A full review of the network architecture and algorithm outside the scope of this post, so please refer to the official publication for more details.

Project structure

Go ahead and grab today’s “Downloads” and unzip the files.

From there, you can inspect the project directory with the following command:

$ tree --dirsfirst
.
├── hed_model
│   ├── deploy.prototxt
│   └── hed_pretrained_bsds.caffemodel
├── images
│   ├── cat.jpg
│   ├── guitar.jpg
│   └── janie.jpg
├── detect_edges_image.py
└── detect_edges_video.py

2 directories, 7 files

Our HED Caffe model is included in the

hed_model/

directory.

I’ve provided a number of sample

images/

including one of myself, my dog, and a sample cat image I found on the internet.

Today we’re going to review the

detect_edges_image.py

and

detect_edges_video.py

scripts. Both scripts share the same edge detection process, so we’ll be spending most of our time on the HED image script.

Holistically-Nested Edge Detection in Images

The Python and OpenCV Holistically-Nested Edge Detection example we are reviewing today is very similar to the HED example in OpenCV’s official repo.

My primary contribution here is to:

Provide some additional documentation (when appropriate)
And most importantly, show you how to use Holistically-Nested Edge Detection in your own projects.

Let’s go ahead and get started — open up the

detect_edge_image.py

file and insert the following code:

# import the necessary packages
import argparse
import cv2
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--edge-detector", type=str, required=True,
	help="path to OpenCV's deep learning edge detector")
ap.add_argument("-i", "--image", type=str, required=True,
	help="path to input image")
args = vars(ap.parse_args())

Our imports are handled on Lines 2-4. We’ll be using argparse to parse command line arguments. OpenCV functions and methods are accessed through the

cv2

import. Our

os

import will allow us to build file paths regardless of operating system.

This script requires two command line arguments:

```
--edge-detector
```
: The path to OpenCV’s deep learning edge detector. The path contains two Caffe files that will be used to initialize our model later.
```
--image
```
: The path to the input image for testing. Like I said previously — I’ve provided a few images in the “Downloads”, but you should try the script on your own images as well.

Let’s define our

CropLayer

class:

class CropLayer(object):
	def __init__(self, params, blobs):
		# initialize our starting and ending (x, y)-coordinates of
		# the crop
		self.startX = 0
		self.startY = 0
		self.endX = 0
		self.endY = 0

In order to utilize the Holistically-Nested Edge Detection model with OpenCV, we need to define a custom layer cropping class — we appropriately name this class

CropLayer

In the constructor of this class, we store the starting and ending (x, y)-coordinates of where the crop will start and end, respectively (Lines 15-21).

The next step when applying HED with OpenCV is to define the

getMemoryShapes

function, the method responsible for computing the volume size of the

inputs

def getMemoryShapes(self, inputs):
		# the crop layer will receive two inputs -- we need to crop
		# the first input blob to match the shape of the second one,
		# keeping the batch size and number of channels
		(inputShape, targetShape) = (inputs[0], inputs[1])
		(batchSize, numChannels) = (inputShape[0], inputShape[1])
		(H, W) = (targetShape[2], targetShape[3])

		# compute the starting and ending crop coordinates
		self.startX = int((inputShape[3] - targetShape[3]) / 2)
		self.startY = int((inputShape[2] - targetShape[2]) / 2)
		self.endX = self.startX + W
		self.endY = self.startY + H

		# return the shape of the volume (we'll perform the actual
		# crop during the forward pass
		return [[batchSize, numChannels, H, W]]

Line 27 derives the shape of the input volume as well as the target shape.

Line 28 extracts the batch size and number of channels from the

inputs

as well.

Finally, Line 29 extracts the height and width of the target shape, respectively.

Given these variables, we can compute the starting and ending crop (x, y)-coordinates on Lines 32-35.

We then return the shape of the volume to the calling function on Line 39.

The final method we need to define is the

forward

function. This function is responsible for performing the crop during the forward pass (i.e., inference/edge prediction) of the network:

def forward(self, inputs):
		# use the derived (x, y)-coordinates to perform the crop
		return [inputs[0][:, :, self.startY:self.endY,
				self.startX:self.endX]]

Lines 43 and 44 take advantage of Python and NumPy’s convenient list/array slicing syntax.

Given our

CropLayer

class we can now load our HED model from disk and register

CropLayer

with the

net

# load our serialized edge detector from disk
print("[INFO] loading edge detector...")
protoPath = os.path.sep.join([args["edge_detector"],
	"deploy.prototxt"])
modelPath = os.path.sep.join([args["edge_detector"],
	"hed_pretrained_bsds.caffemodel"])
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

# register our new layer with the model
cv2.dnn_registerLayer("Crop", CropLayer)

Our prototxt path and model path are built up using the

--edge-detector

command line argument available via

args["edge_detector"]

(Lines 48-51).

From there, both the

protoPath

and

modelPath

are used to load and initialize our Caffe model on Line 52.

Let’s go ahead and load our input

image

# load the input image and grab its dimensions
image = cv2.imread(args["image"])
(H, W) = image.shape[:2]

# convert the image to grayscale, blur it, and perform Canny
# edge detection
print("[INFO] performing Canny edge detection...")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
canny = cv2.Canny(blurred, 30, 150)

Our original

image

is loaded and spatial dimensions (width and height) are extracted on Lines 58 and 59.

We also compute the Canny edge map (Lines 64-66) so we can compare our edge detection results to HED.

Finally, we’re ready to apply HED:

# construct a blob out of the input image for the Holistically-Nested
# Edge Detector
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(W, H),
	mean=(104.00698793, 116.66876762, 122.67891434),
	swapRB=False, crop=False)

# set the blob as the input to the network and perform a forward pass
# to compute the edges
print("[INFO] performing holistically-nested edge detection...")
net.setInput(blob)
hed = net.forward()
hed = cv2.resize(hed[0, 0], (W, H))
hed = (255 * hed).astype("uint8")

# show the output edge detection results for Canny and
# Holistically-Nested Edge Detection
cv2.imshow("Input", image)
cv2.imshow("Canny", canny)
cv2.imshow("HED", hed)
cv2.waitKey(0)

To apply Holistically-Nested Edge Detection (HED) with OpenCV and deep learning, we:

Construct a
```
blob
```
from our image (Lines 70-72).
Pass the blob through the HED net, obtaining the
```
hed
```
output (Lines 77 and 78).
Resize the output to our original image dimensions (Line 79).
Scale our image pixels back to the range [0, 255] and ensure the type is
```
"uint8"
```
(Line 80).

Finally, we we’ll display:

The original input image
The Canny edge detection image
Our Holistically-Nested Edge detection results

Image and HED Results

To apply Holistically-Nested Edge Detection to your own images with OpenCV, make sure you use the “Downloads” section of this tutorial to grab the source code, trained HED model, and example image files. From there, open up a terminal and execute the following command:

$ python detect_edges_image.py --edge-detector hed_model --image images/cat.jpg
[INFO] loading edge detector...
[INFO] performing Canny edge detection...
[INFO] performing holistically-nested edge detection...

Figure 2: Edge detection via the HED approach with OpenCV and deep learning (input image source).

On the left we have our input image.

In the center we have the Canny edge detector.

And on the right is our final output after applying Holistically-Nested Edge Detection.

Notice how the Canny edge detector is not able to preserve the object boundary of the cat, mountains, or the rock the cat is sitting on.

HED, on the other hand, is able to preserve all of those object boundaries.

Let’s try another image:

$ python detect_edges_image.py --edge-detector hed_model --image images/guitar.jpg
[INFO] loading edge detector...
[INFO] performing Canny edge detection...
[INFO] performing holistically-nested edge detection...

Figure 3: Me playing guitar in my office (left). Canny edge detection (center). Holistically-Nested Edge Detection (right).

In Figure 3 above we can see an example image of myself playing guitar. With the Canny edge detector there is a lot of “noise” caused by the texture and pattern of the carpet — HED, on the other contrary, has no such noise.

Furthermore, HED does a better job of capturing the object boundaries of my shirt, my jeans (including the hole in my jeans), and my guitar.

Let’s do one final example:

$ python detect_edges_image.py --edge-detector hed_model --image images/janie.jpg
[INFO] loading edge detector...
[INFO] performing Canny edge detection...
[INFO] performing holistically-nested edge detection...

Figure 4: My beagle, Janie, undergoes Canny and Holistically-Nested Edge Detection (HED) with OpenCV and deep learning.

There are two objects in this image: (1) Janie, the dog, and (2) the chair behind her.

The Canny edge detector (center) does a reasonable job highlighting the outline of the chair but isn’t able to properly capture the object boundary of the dog, primarily due to the light/dark and dark/light transitions in her coat.

HED (right) is able to capture the entire outline of Janie more easily.

Holistically-Nested Edge Detection in Video

We’ve applied Holistically-Nested Edge Detection to images with OpenCV — is it possible to do the same for videos?

Let’s find out.

Open up the

detect_edges_video.py

file and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
import argparse
import imutils
import time
import cv2
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--edge-detector", type=str, required=True,
	help="path to OpenCV's deep learning edge detector")
ap.add_argument("-i", "--input", type=str,
	help="path to optional input video (webcam will be used otherwise)")
args = vars(ap.parse_args())

Our vide script requires three additional imports:

```
VideoStream
```
: Reads frames from an input source such as a webcam, video file, or another source.
```
imutils
```
: My package of convenience functions that I’ve made available on GitHub and PyPi. We’re using my
```
resize
```
function.
```
time
```
: This module allows us to place a sleep command to allow our video stream to establish and “warm up”.

The two command line arguments on Lines 10-15 are quite similar:

```
--edge-detector
```
: The path to OpenCV’s HED edge detector.
```
--input
```
: An optional path to an input video file. If a path isn’t provided then the webcam will be used.

Our

CropLayer

class is identical to the one we defined previously:

class CropLayer(object):
	def __init__(self, params, blobs):
		# initialize our starting and ending (x, y)-coordinates of
		# the crop
		self.startX = 0
		self.startY = 0
		self.endX = 0
		self.endY = 0

	def getMemoryShapes(self, inputs):
		# the crop layer will receive two inputs -- we need to crop
		# the first input blob to match the shape of the second one,
		# keeping the batch size and number of channels
		(inputShape, targetShape) = (inputs[0], inputs[1])
		(batchSize, numChannels) = (inputShape[0], inputShape[1])
		(H, W) = (targetShape[2], targetShape[3])

		# compute the starting and ending crop coordinates
		self.startX = int((inputShape[3] - targetShape[3]) / 2)
		self.startY = int((inputShape[2] - targetShape[2]) / 2)
		self.endX = self.startX + W
		self.endY = self.startY + H

		# return the shape of the volume (we'll perform the actual
		# crop during the forward pass
		return [[batchSize, numChannels, H, W]]

	def forward(self, inputs):
		# use the derived (x, y)-coordinates to perform the crop
		return [inputs[0][:, :, self.startY:self.endY,
				self.startX:self.endX]]

After defining our identical

CropLayer

class, we’ll go ahead and initialize our video stream and HED model:

# initialize a boolean used to indicate if either a webcam or input
# video is being used
webcam = not args.get("input", False)

# if a video path was not supplied, grab a reference to the webcam
if webcam:
	print("[INFO] starting video stream...")
	vs = VideoStream(src=0).start()
	time.sleep(2.0)

# otherwise, grab a reference to the video file
else:
	print("[INFO] opening video file...")
	vs = cv2.VideoCapture(args["input"])

# load our serialized edge detector from disk
print("[INFO] loading edge detector...")
protoPath = os.path.sep.join([args["edge_detector"],
	"deploy.prototxt"])
modelPath = os.path.sep.join([args["edge_detector"],
	"hed_pretrained_bsds.caffemodel"])
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

# register our new layer with the model
cv2.dnn_registerLayer("Crop", CropLayer)

Whether we elect to use our

webcam

or a video file, the script will dynamically work for either (Lines 51-62).

Our HED model is loaded and the

CropLayer

is registered on Lines 65-73.

Let’s acquire frames in a loop and apply edge detection!

# loop over frames from the video stream
while True:
	# grab the next frame and handle if we are reading from either
	# VideoCapture or VideoStream
	frame = vs.read()
	frame = frame if webcam else frame[1]

	# if we are viewing a video and we did not grab a frame then we
	# have reached the end of the video
	if not webcam and frame is None:
		break

	# resize the frame and grab its dimensions
	frame = imutils.resize(frame, width=500)
	(H, W) = frame.shape[:2]

We begin looping over frames on Lines 76-80. If we reach the end of a video file (which happens when a frame is

None

), we’ll break from the loop (Lines 84 and 85).

Lines 88 and 89 resize our frame so that it has a width of 500 pixels. We then grab the dimensions of the frame after resizing.

Now let’s process the frame exactly as in our previous script:

# convert the frame to grayscale, blur it, and perform Canny
	# edge detection
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	blurred = cv2.GaussianBlur(gray, (5, 5), 0)
	canny = cv2.Canny(blurred, 30, 150)

	# construct a blob out of the input frame for the Holistically-Nested
	# Edge Detector, set the blob, and perform a forward pass to
	# compute the edges
	blob = cv2.dnn.blobFromImage(frame, scalefactor=1.0, size=(W, H),
		mean=(104.00698793, 116.66876762, 122.67891434),
		swapRB=False, crop=False)
	net.setInput(blob)
	hed = net.forward()
	hed = cv2.resize(hed[0, 0], (W, H))
	hed = (255 * hed).astype("uint8")

Canny edge detection (Lines 93-95) and HED edge detection (Lines 100-106) are computed over the input frame.

From there, we’ll display the edge detection results:

# show the output edge detection results for Canny and
	# Holistically-Nested Edge Detection
	cv2.imshow("Frame", frame)
	cv2.imshow("Canny", canny)
	cv2.imshow("HED", hed)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# if we are using a webcam, stop the camera video stream
if webcam:
	vs.stop()

# otherwise, release the video file pointer
else:
	vs.release()

# close any open windows
cv2.destroyAllWindows()

Our three output frames are displayed on Lines 110-112: (1) the original, resized frame, (2) the Canny edge detection result, and (3) the HED result.

Keypresses are captured via Line 113. If

"q"

is pressed, we’ll break from the loop and cleanup (Lines 116-128).

Video and HED Results

So, how does Holistically-Nested Edge Detection perform in real-time with OpenCV?

Let’s find out.

Be sure to use the “Downloads” section of this blog post to download the source code and HED model.

From there, open up a terminal and execute the following command:

$ python detect_edges_video.py --edge-detector hed_model
[INFO] starting video stream...
[INFO] loading edge detector...

In the short GIF demo above you can see a demonstration of the HED model in action.

Notice in particular how the boundary of the lamp in the background is completely lost when using the Canny edge detector; however, when using HED the boundary is preserved.

In terms of performance, I was using my 3Ghz Intel Xeon W when gathering the demo above. We are obtaining close to real-time performance on the CPU using the HED model.

To obtain true real-time performance you would need to utilize a GPU; however, keep in mind that GPU support for OpenCV’s “dnn” module is particularly limited (specifically NVIDIA GPUs are not currently supported).

In the meantime, you may want to consider using the Caffe + Python bindings if you need real-time performance.

Summary

In this tutorial, you learned how to perform Holistically-Nested Edge Detection (HED) using OpenCV and Deep Learning.

Unlike the Canny edge detector, which requires preprocessing steps, manual tuning of parameters, and often does not perform well on images captured using varying lighting conditions, Holistically-Nested Edge Detection seeks to create an end-to-end deep learning edge detector.

As our results show, the output edge maps produced by HED do a better job of preserving object boundaries than the simple Canny edge detector. Holistically-Nested Edge Detection can potentially replace Canny edge detection in applications where the environment and lighting conditions are potentially unknown or simply not controllable.

The downside is that HED is significantly more computationally expensive than Canny. The Canny edge detector can run in super real-time on a CPU; however, real-time performance with HED would require a GPU.

I hope you enjoyed today’s post!

To download the source code to this guide, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Holistically-Nested Edge Detection with OpenCV and Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to perform liveness detection with OpenCV. You will create a liveness detector cable of spotting fake faces and performing anti-face spoofing in face recognition systems.

Over the past year, I have authored a number of face recognition tutorials, including:

However, a common question I get asked over email and in the comments sections of the face recognition posts is:

How do I spot real versus fake faces?

Consider what would happen if a nefarious user tried to purposely circumvent your face recognition system.

Such a user could try to hold up a photo of another person. Maybe they even have a photo or video on their smartphone that they could hold up to the camera responsible for performing face recognition (such as in the image at the top of this post).

In those situations it’s entirely possible for the face held up to the camera to be correctly recognized…but ultimately leading to an unauthorized user bypassing your face recognition system!

How would you go about spotting these “fake” versus “real/legitimate” faces? How could you apply anti-face spoofing algorithms into your facial recognition applications?

The answer is to apply liveness detection with OpenCV which is exactly what I’ll be covering today.

To learn how to incorporate liveness detection with OpenCV into your own face recognition systems, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Liveness Detection with OpenCV

In the first part of this tutorial, we’ll discuss liveness detection, including what it is and why we need it to improve our face recognition systems.

From there we’ll review the dataset we’ll be using to perform liveness detection, including:

How to build to a dataset for liveness detection
Our example real versus fake face images

We’ll also review our project structure for the liveness detector project as well.

In order to create the liveness detector, we’ll be training a deep neural network capable of distinguishing between real versus fake faces.

We’ll, therefore, need to:

Build the image dataset itself.
Implement a CNN capable of performing liveness detector (we’ll call this network “LivenessNet”).
Train the liveness detector network.
Create a Python + OpenCV script capable of taking our trained liveness detector model and apply it to real-time video.

Let’s go ahead and get started!

What is liveness detection and why do we need it?

Figure 1: Liveness detection with OpenCV. On the left is a live (real) video of me and on the right you can see I am holding my iPhone (fake/spoofed).

Face recognition systems are becoming more prevalent than ever. From face recognition on your iPhone/smartphone, to face recognition for mass surveillance in China, face recognition systems are being utilized everywhere.

However, face recognition systems are easily fooled by “spoofing” and “non-real” faces.

Face recognition systems can be circumvented simply by holding up a photo of a person (whether printed, on a smartphone, etc.) to the face recognition camera.

In order to make face recognition systems more secure, we need to be able to detect such fake/non-real faces — liveness detection is the term used to refer to such algorithms.

There are a number of approaches to liveness detection, including:

Texture analysis, including computing Local Binary Patterns (LBPs) over face regions and using an SVM to classify the faces as real or spoofed.
Frequency analysis, such as examining the Fourier domain of the face.
Variable focusing analysis, such as examining the variation of pixel values between two consecutive frames.
Heuristic-based algorithms, including eye movement, lip movement, and blink detection. These set of algorithms attempt to track eye movement and blinks to ensure the user is not holding up a photo of another person (since a photo will not blink or move its lips).
Optical Flow algorithms, namely examining the differences and properties of optical flow generated from 3D objects and 2D planes.
3D face shape, similar to what is used on Apple’s iPhone face recognition system, enabling the face recognition system to distinguish between real faces and printouts/photos/images of another person.
Combinations of the above, enabling a face recognition system engineer to pick and choose the liveness detections models appropriate for their particular application.

A full review of liveness detection algorithms can be found in Chakraborty and Das’ 2014 paper, An Overview of Face liveness Detection.

For the purposes of today’s tutorial, we’ll be treating liveness detection as a binary classification problem.

Given an input image, we’ll train a Convolutional Neural Network capable of distinguishing real faces from fake/spoofed faces.

But before we get to training our liveness detection model, let’s first examine our dataset.

Our liveness detection videos

Figure 2: An example of gathering real versus fake/spoofed faces. The video on the left is a legitimate recording of my face. The video on the right is that same video played back while my laptop records it.

To keep our example straightforward, the liveness detector we are building in this blog post will focus on distinguishing real faces versus spoofed faces on a screen.

This algorithm can easily be extended to other types of spoofed faces, including print outs, high-resolution prints, etc.

In order to build the liveness detection dataset, I:

Took my iPhone and put it in portrait/selfie mode.
Recorded a ~25-second video of myself walking around my office.
Replayed the same 25-second video, this time facing my iPhone towards my desktop where I recorded the video replaying.
This resulted in two example videos, one for “real” faces and another for “fake/spoofed” faces.
Finally, I applied face detection to both sets of videos to extract individual face ROIs for both classes.

I have provided you with both my real and fake video files in the “Downloads” section of the post.

You can use these videos as a starting point for your dataset but I would recommend gathering more data to help make your liveness detector more robust and accurate.

With testing, I determined that the model is slightly biased towards my own face which makes sense because that is all the model was trained on. And furthermore, since I am white/caucasian I wouldn’t expect this same dataset to work as well with other skin tones.

Ideally, you would train a model with faces of multiple people and include faces of multiple ethnicities. Be sure to refer to the “Limitations and further work“ section below for additional suggestions on improving your liveness detection models.

In the rest of the tutorial, you will learn how to take the dataset I recorded it and turn it into an actual liveness detector with OpenCV and deep learning.

Project structure

Go ahead and grab the code, dataset, and liveness model using the “Downloads” section of this post and then unzip the archive.

Once you navigate into the project directory, you’ll notice the following structure:

$ tree --dirsfirst --filelimit 10
.
├── dataset
│   ├── fake [150 entries]
│   └── real [161 entries]
├── face_detector
│   ├── deploy.prototxt
│   └── res10_300x300_ssd_iter_140000.caffemodel
├── pyimagesearch
│   ├── __init__.py
│   └── livenessnet.py
├── videos
│   ├── fake.mp4
│   └── real.mov
├── gather_examples.py
├── train_liveness.py
├── liveness_demo.py
├── le.pickle
├── liveness.model
└── plot.png

6 directories, 12 files

There are four main directories inside our project:

```
dataset/
```
: Our dataset directory consists of two classes of images:
- Fake images of me from a camera aimed at my screen while playing a video of my face.
- Real images of me captured from a selfie video with my phone.
```
face_detector/
```
: Consists of our pretrained Caffe face detector to locate face ROIs.
```
pyimagesearch/
```
: This module contains our LivenessNet class.
```
videos/
```
: I’ve provided two input videos for training our LivenessNet classifier.

Today we’ll be reviewing three Python scripts in detail. By the end of the post you’ll be able to run them on your own data and input video feeds as well. In order of appearance in this tutorial, the three scripts are:

```
gather_examples.py
```
: This script grabs face ROIs from input video files and helps us to create a deep learning face liveness dataset.
```
train_liveness.py
```
: As the filename indicates, this script will train our LivenessNet classifier. We’ll use Keras and TensorFlow to train the model. The training process results in a few files:
- ```
le.pickle
```
  : Our class label encoder.
- ```
liveness.model
```
  : Our serialized Keras model which detects face liveness.
- ```
plot.png
```
  : The training history plot shows accuracy and loss curves so we can assess our model (i.e. over/underfitting).
```
liveness_demo.py
```
: Our demonstration script will fire up your webcam to grab frames to conduct face liveness detection in real-time.

Detecting and extracting face ROIs from our training (video) dataset

Figure 3: Detecting face ROIs in video for the purposes of building a liveness detection dataset.

Now that we’ve had a chance to review both our initial dataset and project structure, let’s see how we can extract both real and fake face images from our input videos.

The end goal if this script will be to populate two directories:

```
dataset/fake/
```
: Contains face ROIs from the
```
fake.mp4
```
file
```
dataset/real/
```
: Holds face ROIs from the
```
real.mov
```
file.

Given these frames, we’ll later train a deep learning-based liveness detector on the images.

Open up the

gather_examples.py

file and insert the following code:

# import the necessary packages
import numpy as np
import argparse
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, required=True,
	help="path to input video")
ap.add_argument("-o", "--output", type=str, required=True,
	help="path to output directory of cropped faces")
ap.add_argument("-d", "--detector", type=str, required=True,
	help="path to OpenCV's deep learning face detector")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
ap.add_argument("-s", "--skip", type=int, default=16,
	help="# of frames to skip before applying face detection")
args = vars(ap.parse_args())

Lines 2-5 import our required packages. This script only requires OpenCV and NumPy in addition to built-in Python modules.

From there Lines 8-19 parse our command line arguments:

```
--input
```
: The path to our input video file.
```
--output
```
: The path to the output directory where each of the cropped faces will be stored.
```
--detector
```
: The path to the face detector. We’ll be using OpenCV’s deep learning face detector. This Caffe model is included with today’s “Downloads” for your convenience.
```
--confidence
```
: The minimum probability to filter weak face detections. By default, this value is 50%.
```
--skip
```
: We don’t need to detect and store every image because adjacent frames will be similar. Instead, we’ll skip N frames between detections. You can alter the default of 16 using this argument.

Let’s go ahead and load the face detector and initialize our video stream:

# load our serialized face detector from disk
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
	"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

# open a pointer to the video file stream and initialize the total
# number of frames read and saved thus far
vs = cv2.VideoCapture(args["input"])
read = 0
saved = 0

Lines 23-26 load OpenCV’s deep learning face detector.

From there we open our video stream on Line 30.

We also initialize two variables for the number of frames read as well as the number of frames saved while our loop executes (Lines 31 and 32).

Let’s go ahead create a loop to process the frames:

# loop over frames from the video file stream
while True:
	# grab the frame from the file
	(grabbed, frame) = vs.read()

	# if the frame was not grabbed, then we have reached the end
	# of the stream
	if not grabbed:
		break

	# increment the total number of frames read thus far
	read += 1

	# check to see if we should process this frame
	if read % args["skip"] != 0:
		continue

Our

while

loop begins on Lines 35.

From there we grab and verify a

frame

(Lines 37-42).

At this point, since we’ve read a

frame

, we’ll increment our

read

counter (Line 48). If we are skipping this particular frame, we’ll continue without further processing (Lines 48 and 49).

Let’s go ahead and detect faces:

# grab the frame dimensions and construct a blob from the frame
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0,
		(300, 300), (104.0, 177.0, 123.0))

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

	# ensure at least one face was found
	if len(detections) > 0:
		# we're making the assumption that each image has only ONE
		# face, so find the bounding box with the largest probability
		i = np.argmax(detections[0, 0, :, 2])
		confidence = detections[0, 0, i, 2]

In order to perform face detection, we need to create a blob from the image (Lines 53 and 54). This

blob

has a 300×300 width and height to accommodate our Caffe face detector. Scaling the bounding boxes will be necessary later, so Line 52, grabs the frame dimensions.

Lines 58 and 59 perform a

forward

pass of the

blob

through the deep learning face detector.

Our script makes the assumption that there is only one face in each frame of the video (Lines 62-65). This helps prevent false positives. If you’re working with a video containing more than one face, I recommend that you adjust the logic accordingly.

Thus, Line 65 grabs the highest probability face detection index. Line 66 extracts the confidence of the detection using the index.

Let’s filter weak detections and write the face ROI to disk:

# ensure that the detection with the largest probability also
		# means our minimum probability test (thus helping filter out
		# weak detections)
		if confidence > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the face and extract the face ROI
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")
			face = frame[startY:endY, startX:endX]

			# write the frame to disk
			p = os.path.sep.join([args["output"],
				"{}.png".format(saved)])
			cv2.imwrite(p, face)
			saved += 1
			print("[INFO] saved {} to disk".format(p))

# do a bit of cleanup
vs.release()
cv2.destroyAllWindows()

Line 71 ensures that our face detection ROI meets the minimum threshold to reduce false positives.

From there we extract the face ROI bounding

box

coordinates and face ROI itself (Lines 74-76).

We generate a path + filename for the face ROI and write it to disk on Lines 79-81. At this point, we can increment the number of

saved

faces.

Once processing is complete, we’ll perform cleanup on Lines 86 and 87.

Building our liveness detection image dataset

Figure 4: Our OpenCV face liveness detection dataset. We’ll use Keras and OpenCV to train and demo a liveness model.

Now that we’ve implemented the

gather_examples.py

script, let’s put it to work.

Make sure you use the “Downloads” section of this tutorial to grab the source code and example input videos.

From there, open up a terminal and execute the following command to extract faces for our “fake/spoofed” class:

$ python gather_examples.py --input videos/real.mov --output dataset/real \
	--detector face_detector --skip 1
[INFO] loading face detector...
[INFO] saved datasets/fake/0.png to disk
[INFO] saved datasets/fake/1.png to disk
[INFO] saved datasets/fake/2.png to disk
[INFO] saved datasets/fake/3.png to disk
[INFO] saved datasets/fake/4.png to disk
[INFO] saved datasets/fake/5.png to disk
...
[INFO] saved datasets/fake/145.png to disk
[INFO] saved datasets/fake/146.png to disk
[INFO] saved datasets/fake/147.png to disk
[INFO] saved datasets/fake/148.png to disk
[INFO] saved datasets/fake/149.png to disk

Similarly, we can do the same for the “real” class as well:

$ python gather_examples.py --input videos/fake.mov --output dataset/fake \
	--detector face_detector --skip 4
[INFO] loading face detector...
[INFO] saved datasets/real/0.png to disk
[INFO] saved datasets/real/1.png to disk
[INFO] saved datasets/real/2.png to disk
[INFO] saved datasets/real/3.png to disk
[INFO] saved datasets/real/4.png to disk
...
[INFO] saved datasets/real/156.png to disk
[INFO] saved datasets/real/157.png to disk
[INFO] saved datasets/real/158.png to disk
[INFO] saved datasets/real/159.png to disk
[INFO] saved datasets/real/160.png to disk

Since the “real” video file is longer than the “fake” video file, we’ll use a longer skip frames value to help balance the number of output face ROIs for each class.

After executing the scripts you should have the following image counts:

Fake: 150 images
Real: 161 images
Total: 311 images

Implementing “LivenessNet”, our deep learning liveness detector

Figure 5: Deep learning architecture for LivenessNet, a CNN designed to detect face liveness in images and videos.

The next step is to implement “LivenessNet”, our deep learning-based liveness detector.

At the core,

LivenessNet

is actually just a simple Convolutional Neural Network.

We’ll be purposely keeping this network as shallow and with as few parameters as possible for two reasons:

To reduce the chances of overfitting on our small dataset.
To ensure our liveness detector is fast, capable of running in real-time (even on resource-constrained devices, such as the Raspberry Pi).

Let’s implement LivenessNet now — open up

livenessnet.py

and insert the following code:

# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K

class LivenessNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

All of our imports are from Keras (Lines 2-10). For an in-depth review of each of these layers and functions, be sure to refer to Deep Learning for Computer Vision with Python.

Our

LivenessNet

class is defined on Line 12. It consists of one static method,

build

(Line 14). The

build

method accepts four parameters:

```
width
```
: How wide the image/volume is.
```
height
```
: How tall the image is.
```
depth
```
: The number of channels for the image (in this case 3 since we’ll be working with RGB images).
```
classes
```
: The number of classes. We have two total classes: “real” and “fake”.

Our

model

is initialized on Line 17.

The

inputShape

to our model is defined on Line 18 while channel ordering is determined on Lines 23-25.

Let’s begin adding layers to our CNN:

# first CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(16, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(16, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# second CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

Our CNN exhibits VGGNet-esque qualities. It is very shallow with only a few learned filters. Ideally, we won’t need a deep network to distinguish between real and spoofed faces.

The first

CONV => RELU => CONV => RELU => POOL

layer set is specified on Lines 28-36 where batch normalization and dropout are also added.

Another

CONV => RELU => CONV => RELU => POOL

layer set is appended on Lines 39-46.

Finally, we’ll add our

FC => RELU

layers:

# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(64))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

Lines 49-57 consist of fully connected and ReLU activated layers with a softmax classifier head.

The model is returned to the training script on Line 60.,

Creating the liveness detector training script

Figure 6: The process of training LivenessNet. Using both “real” and “spoofed/fake” images as our dataset, we can train a liveness detection model with OpenCV, Keras, and deep learning.

Given our dataset of real/spoofed images as well as our implementation of LivenessNet, we are now ready to train the network.

Open up the

train_liveness.py

file and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.livenessnet import LivenessNet
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.utils import np_utils
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import cv2
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to trained model")
ap.add_argument("-l", "--le", type=str, required=True,
	help="path to label encoder")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our face liveness training script consists of a number of imports (Lines 2-19). Let’s review them now:

```
matplotlib
```
: Used to generate a training plot. We specify the
```
"Agg"
```
backend so we can easily save our plot to disk on Line 3.
```
LivenessNet
```
: The liveness CNN that we defined in the previous section.
```
train_test_split
```
: A function from scikit-learn which constructs splits of our data for training and testing.
```
classification_report
```
: Also from scikit-learn, this tool will generate a brief statistical report on our model’s performance.
```
ImageDataGenerator
```
: Used for performing data augmentation, providing us with batches of randomly mutated images.
```
Adam
```
: An optimizer that worked well for this model. (alternatives include SGD, RMSprop, etc.).
```
paths
```
: From my imutils package, this module will help us to gather the paths to all of our image files on disk.
```
pyplot
```
: Used to generate a nice training plot.
```
numpy
```
: A numerical processing library for Python. It is an OpenCV requirement as well.
```
argparse
```
: For processing command line arguments.
```
pickle
```
: Used to serialize our label encoder to disk.
```
cv2
```
: Our OpenCV bindings.
```
os
```
: This module can do quite a lot, but we’ll just be using it for it’s operating system path separator.

That was a mouthful, but now that you know what the imports are for, reviewing the rest of the script should be more straightforward.

This script accepts four command line arguments:

```
--dataset
```
: The path to the input dataset. Earlier in the post we created the dataset with the
```
gather_examples.py
```
script.
```
--model
```
: Our script will generate an output model file — here you supply the path to it.
```
--le
```
: The path to our output serialized label encoder file also needs to be supplied.
```
--plot
```
: The training script will generate a plot. If you wish to override the default value of
```
"plot.png"
```
, you should specify this value on the command line.

This next code block will perform a number of initializations and build our data:

# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-4
BS = 8
EPOCHS = 50

# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

for imagePath in imagePaths:
	# extract the class label from the filename, load the image and
	# resize it to be a fixed 96x96 pixels, ignoring aspect ratio
	label = imagePath.split(os.path.sep)[-2]
	image = cv2.imread(imagePath)
	image = cv2.resize(image, (32, 32))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

# convert the data into a NumPy array, then preprocess it by scaling
# all pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0

Training parameters including initial learning rate, batch size, and number of epochs are set on Lines 35-37.

From there, our

imagePaths

are grabbed. We also initialize two lists to hold our

data

and class

labels

(Lines 42-44).

The loop on Lines 46-55 builds our

data

and

labels

lists. The

data

consists of our images which are loaded and resized to be 32×32 pixels. Each image has a corresponding label stored in the

labels

list.

All pixel intensities are scaled to the range [0, 1] while the list is made into a NumPy array via Line 50.

Now let’s encode our labels and partition our data:

# encode the labels (which are currently strings) as integers and then
# one-hot encode them
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = np_utils.to_categorical(labels, 2)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25, random_state=42)

Lines 63-65 one-hot encode the labels.

We utilize scikit-learn to partition our data — 75% is used for training while 25% is reserved for testing (Lines 69 and 70).

Next, we’ll initialize our data augmentation object and compile + train our face liveness model:

# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
	width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
	horizontal_flip=True, fill_mode="nearest")

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model = LivenessNet.build(width=32, height=32, depth=3,
	classes=len(le.classes_))
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
print("[INFO] training network for {} epochs...".format(EPOCHS))
H = model.fit_generator(aug.flow(trainX, trainY, batch_size=BS),
	validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,
	epochs=EPOCHS)

Lines 73-75 construct a data augmentation object which will generate images with random rotations, zooms, shifts, shears, and flips. To read more about data augmentation, read my previous blog post.

Our

LivenessNet

model is built and compiled on Lines 79-83.

We then commence training on Lines 87-89. This process will be relatively quick considering our shallow network and small dataset.

Once the model is trained we can evaluate the results and generate a training plot:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BS)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=le.classes_))

# save the network to disk
print("[INFO] serializing network to '{}'...".format(args["model"]))
model.save(args["model"])

# save the label encoder to disk
f = open(args["le"], "wb")
f.write(pickle.dumps(le))
f.close()

# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, EPOCHS), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, EPOCHS), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, EPOCHS), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, EPOCHS), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Predictions are made on the testing set (Line 93). From there a

classification_report

is generated and printed to the terminal (Lines 94 and 95).

The

LivenessNet

model is serialized to disk along with the label encoder on Lines 99-104.

The remaining Lines 107-117 generate a training history plot for later inspection.

Training our liveness detector

We are now ready to train our liveness detector.

Make sure you’ve used the “Downloads” section of the tutorial to download the source code and dataset — from, there execute the following command:

$ python train.py --dataset dataset --model liveness.model --le le.pickle
[INFO] loading images...
[INFO] compiling model...
[INFO] training network for 50 epochs...
Epoch 1/50
29/29 [==============================] - 2s 58ms/step - loss: 1.0113 - acc: 0.5862 - val_loss: 0.4749 - val_acc: 0.7436
Epoch 2/50
29/29 [==============================] - 1s 21ms/step - loss: 0.9418 - acc: 0.6127 - val_loss: 0.4436 - val_acc: 0.7949
Epoch 3/50
29/29 [==============================] - 1s 21ms/step - loss: 0.8926 - acc: 0.6472 - val_loss: 0.3837 - val_acc: 0.8077
...
Epoch 48/50
29/29 [==============================] - 1s 21ms/step - loss: 0.2796 - acc: 0.9094 - val_loss: 0.0299 - val_acc: 1.0000
Epoch 49/50
29/29 [==============================] - 1s 21ms/step - loss: 0.3733 - acc: 0.8792 - val_loss: 0.0346 - val_acc: 0.9872
Epoch 50/50
29/29 [==============================] - 1s 21ms/step - loss: 0.2660 - acc: 0.9008 - val_loss: 0.0322 - val_acc: 0.9872
[INFO] evaluating network...
              precision    recall  f1-score   support

        fake       0.97      1.00      0.99        35
        real       1.00      0.98      0.99        43

   micro avg       0.99      0.99      0.99        78
   macro avg       0.99      0.99      0.99        78
weighted avg       0.99      0.99      0.99        78

[INFO] serializing network to 'liveness.model'...

Figure 6: A plot of training a face liveness model using OpenCV, Keras, and deep learning.

As our results show, we are able to obtain 99% liveness detection accuracy on our validation set!

Putting the pieces together: Liveness detection with OpenCV

Figure 7: Face liveness detection with OpenCV and deep learning.

The final step is to combine all the pieces:

We’ll access our webcam/video stream
Apply face detection to each frame
For each face detected, apply our liveness detector model

Open up the

liveness_demo.py

and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import time
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to trained model")
ap.add_argument("-l", "--le", type=str, required=True,
	help="path to label encoder")
ap.add_argument("-d", "--detector", type=str, required=True,
	help="path to OpenCV's deep learning face detector")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Lines 2-11 import our required packages. Notably, we’ll use

```
VideoStream
```
to access our camera feed.
```
img_to_array
```
so that our frame will be in a compatible array format.
```
load_model
```
to load our serialized Keras model.
```
imutils
```
for its convenience functions.
```
cv2
```
for our OpenCV bindings.

Let’s parse our command line arguments via Lines 14-23:

```
--model
```
: The path to our pretrained Keras model for liveness detection.
```
--le
```
: Our path to the label encoder.
```
--detector
```
: The path to OpenCV’s deep learning face detector, used to find the face ROIs.
```
--confidence
```
: The minimum probability threshold to filter out weak detections.

Now let’s go ahead an initialize the face detector, LivenessNet model + label encoder, and our video stream:

# load our serialized face detector from disk
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
	"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

# load the liveness detector model and label encoder from disk
print("[INFO] loading liveness detector...")
model = load_model(args["model"])
le = pickle.loads(open(args["le"], "rb").read())

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

The OpenCV face detector is loaded via Lines 27-30.

From there we load our serialized, pretrained model (

LivenessNet

) and the label encoder (Lines 34 and 35).

Our

VideoStream

object is instantiated and our camera is allowed two seconds to warm up (Lines 39 and 40).

At this point, it’s time to start looping over frames to detect real versus fake/spoofed faces:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 600 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=600)

	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0,
		(300, 300), (104.0, 177.0, 123.0))

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

Line 43 opens an infinite

while

loop block where we begin by capturing + resizing individual frames (Lines 46 and 47).

After resizing, dimensions of the frame are grabbed so that we can later perform scaling (Line 50).

Using OpenCV’s blobFromImage function we generate a

blob

(Lines 51 and 52) and then proceed to perform inference by passing it through the face detector network (Lines 56 and 57).

Now we’re ready for the fun part — liveness detection with OpenCV and deep learning:

# loop over the detections
	for i in range(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with the
		# prediction
		confidence = detections[0, 0, i, 2]

		# filter out weak detections
		if confidence > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the face and extract the face ROI
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# ensure the detected bounding box does fall outside the
			# dimensions of the frame
			startX = max(0, startX)
			startY = max(0, startY)
			endX = min(w, endX)
			endY = min(h, endY)

			# extract the face ROI and then preproces it in the exact
			# same manner as our training data
			face = frame[startY:endY, startX:endX]
			face = cv2.resize(face, (32, 32))
			face = face.astype("float") / 255.0
			face = img_to_array(face)
			face = np.expand_dims(face, axis=0)

			# pass the face ROI through the trained liveness detector
			# model to determine if the face is "real" or "fake"
			preds = model.predict(face)[0]
			j = np.argmax(preds)
			label = le.classes_[j]

			# draw the label and bounding box on the frame
			label = "{}: {:.4f}".format(label, preds[j])
			cv2.putText(frame, label, (startX, startY - 10),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				(0, 0, 255), 2)

On Line 60, we begin looping over face detections. Inside we:

Filter out weak detections (Lines 63-66).
Extract the face bounding
```
box
```
coordinates and ensure they do not fall outside the dimensions of the frame (Lines 69-77).
Extract the face ROI and preprocess it in the same manner as our training data (Lines 81-85).
Employ our liveness detector model to determine if the face is “real” or “fake/spoofed” (Lines 89-91).
Line 91 is where you would insert your own code to perform face recognition but only on real images. The pseudo code would similar to
```
if label == "real": run_face_reconition()
```
directly after Line 91).
Finally (for this demo), we draw the
```
label
```
text and a
```
rectangle
```
around the face (Lines 94-98).

Let’s display our results and clean up:

# show the output frame and wait for a key press
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

The ouput frame is displayed on each iteration of the loop while keypresses are captured (Lines 101-102). Whenever the user presses “q” (“quit”) we’ll break out of the loop and release pointers and close windows (Lines 105-110).

Deploying our liveness detector to real-time video

To follow along with our liveness detection demo make sure you have used the “Downloads” section of the blog post to download the source code and pre-trained liveness detection model.

From there, open up a terminal and execute the following command:

$ python liveness_demo.py --model liveness.model --le le.pickle \
	--detector face_detector
Using TensorFlow backend.
[INFO] loading face detector...
[INFO] loading liveness detector...
[INFO] starting video stream...

Here you can see that our liveness detector is successfully distinguishing real from fake/spoofed faces.

I have included a longer demo in the video below:

Limitations, improvements, and further work

The primary restriction of our liveness detector is really our limited dataset — there are only a total of 311 images (161 belonging to the “real” class and 150 to the “fake” class, respectively).

One of the first extensions to this work would be to simply gather additional training data, and more specifically, images/frames that are not of simply me or yourself.

Keep in mind that the example dataset used here today includes faces for only one person (myself). I am also white/caucasian — you should gather training faces for other ethnicities and skin tones as well.

Our liveness detector was only trained on spoof attacks from holding up a screen — it was not trained on images or photos that were printed out. Therefore, my third recommendation is to invest in additional image/face sources outside of simple screen recording playbacks.

Finally, I want to mention that there is no silver bullet to liveness detection.

Some of the best liveness detectors incorporate multiple methods of liveness detection (be sure to refer to the “What is liveness detection and why do we need it?” section above).

Take the time to consider and assess your own project, guidelines, and requirements — in some cases, all you may need is basic eye blink detection heuristics.

In other cases, you’ll need to combine deep learning-based liveness detection with other heuristics.

Don’t rush into face recognition and liveness detection — take the time and discipline to consider your own unique project requirements. Doing so will ensure you obtain better, more accurate results.

Summary

In this tutorial, you learned how to perform liveness detection with OpenCV.

Using this liveness detector you can now spot fake fakes and perform anti-face spoofing in your own face recognition systems.

To create our liveness detector we utilized OpenCV, Deep Learning, and Python.

The first step was to gather our real vs. fake dataset. To accomplish this task, we:

First recorded a video of ourselves using our smartphone (i.e., “real” faces).
Held our smartphone up to our laptop/desktop, replayed the same video, and then recorded the replaying using our webcam (i.e., “fake” faces).
Applied face detection to both sets of videos to form our final liveness detection dataset.

After building our dataset we implemented, “LivenessNet”, a Keras + Deep Learning CNN.

This network is purposely shallow, ensuring that:

We reduce the chances of overfitting on our small dataset.
The model itself is capable of running in real-time (including on the Raspberry Pi).

Overall, our liveness detector was able to obtain 99% accuracy on our validation set.

To demonstrate the full liveness detection pipeline in action we created a Python + OpenCV script that loaded our liveness detector and applied it to real-time video streams.

As our demo showed, our liveness detector was capable of distinguishing between real and fake faces.

I hope you enjoyed today’s post on liveness detection with OpenCV.

To download the source code to this post and apply liveness detection to your own projects (plus be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

The post Liveness Detection with OpenCV appeared first on PyImageSearch.

In this tutorial, you will learn how to build a Raspberry Pi security camera using OpenCV and computer vision. The Pi security camera will be IoT capable, making it possible for our Raspberry Pi to to send TXT/MMS message notifications, images, and video clips when the security camera is triggered.

Back in my undergrad years, I had an obsession with hummus. Hummus and pita/vegetables were my lunch of choice.

I loved it.

I lived on it.

And I was very protective of my hummus — college kids are notorious for raiding each other’s fridges and stealing each other’s food. No one was to touch my hummus.

But — I was a victim of such hummus theft on more than one occasion…and I never forgot it!

I never figured out who stole my hummus, and even though my wife and I are the only ones who live in our house, I often hide the hummus in the back of the fridge (where no one will look) or under fruits and vegetables (which most people wouldn’t want to eat).

Of course, back then I wasn’t as familiar with computer vision and OpenCV as I do now. Had I known what I do at present, I would have built a Raspberry Pi security camera to capture the hummus heist in action!

Today I’m channeling my inner undergrad-self and laying rest to the chickpea bandit. And if he ever returns again, beware, my fridge is monitored!

To learn how to build a security camera with a Raspberry Pi and OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Building a Raspberry Pi security camera with OpenCV

In the first part of this tutorial, we’ll briefly review how we are going to build an IoT-capable security camera with the Raspberry Pi.

Next, we’ll review our project/directory structure and install the libraries/packages to successfully build the project.

We’ll also briefly review both Amazon AWS/S3 and Twilio, two services that when used together will enable us to:

Upload an image/video clip when the security camera is triggered.
Send the image/video clip directly to our smartphone via text message.

From there we’ll implement the source code for the project.

And finally, we’ll put all the pieces together and put our Raspberry Pi security camera into action!

An IoT security camera with the Raspberry Pi

Figure 1: Raspberry Pi + Internet of Things (IoT). Our project today will use two cloud services: Twilio and AWS S3. Twilio is an SMS/MMS messaging service. S3 is a file storage service to help facilitate the video messages.

We’ll be building a very simple IoT security camera with the Raspberry Pi and OpenCV.

The security camera will be capable of recording a video clip when the camera is triggered, uploading the video clip to the cloud, and then sending a TXT/MMS message which includes the video itself.

We’ll be building this project specifically with the goal of detecting when a refrigerator is opened and when the fridge is closed — everything in between will be captured and recorded.

Therefore, this security camera will work best in the same “open” and “closed” environment where there is a large difference in light. For example, you could also deploy this inside a mailbox that opens/closes.

You can easily extend this method to work with other forms of detection, including simple motion detection and home surveillance, object detection, and more. I’ll leave that as an exercise for you, the reader, to implement — in that case, you can use this project as a “template” for implementing any additional computer vision functionality.

Project structure

Go ahead and grab the “Downloads” for today’s blog post.

Once you’ve unzipped the files, you’ll be presented with the following directory structure:

$ tree --dirsfirst
.
├── config
│   └── config.json
├── pyimagesearch
│   ├── notifications
│   │   ├── __init__.py
│   │   └── twilionotifier.py
│   ├── utils
│   │   ├── __init__.py
│   │   └── conf.py
│   └── __init__.py
└── detect.py

4 directories, 7 files

Today we’ll be reviewing four files:

```
config/config.json
```
: This commented JSON file holds our configuration. I’m providing you with this file, but you’ll need to insert your API keys for both Twilio and S3.
```
pyimagesearch/notifications/twilionotifier.py
```
: Contains the
```
TwilioNotifier
```
class for sending SMS/MMS messages. This is the same exact class I use for sending text, picture, and video messages with Python inside my upcoming Raspberry Pi book.
```
pyimagesearch/utils/conf.py
```
: The
```
Conf
```
class is responsible for loading the commented JSON configuration.
```
detect.py
```
: The heart of today’s project is contained in this driver script. It watches for significant light change, starts recording video, and alerts me when someone steals my hummus or anything else I’m hiding in the fridge.

Now that we understand the directory structure and files therein, let’s move on to configuring our machine and learning about S3 + Twilio. From there, we’ll begin reviewing the four key files in today’s project.

Installing package/library prerequisites

Today’s project requires that you install a handful of Python libraries on your Raspberry Pi.

In my upcoming book, all of these packages will be preinstalled in a custom Raspbian image. All you’ll have to do is download the Raspbian .img file, flash it to your micro-SD card, and boot! From there you’ll have a pre-configured dev environment with all the computer vision + deep learning libraries you need!

Note: If you want my custom Raspbian images right now (with both OpenCV 3 and OpenCV 4), you should grab a copy of either the Quickstart Bundle or Hardcopy Bundle of Practical Python and OpenCV + Case Studies which includes the Raspbian .img file.

This introductory book will also teach you OpenCV fundamentals so that you can learn how to confidently build your own projects. These fundamentals and concepts will go a long way if you’re planning to grab my upcoming Raspberry Pi for Computer Vision book.

In the meantime, you can get by with this minimal installation of packages to replicate today’s project:

```
opencv-contrib-python
```
: The OpenCV library.
```
imutils
```
: My package of convenience functions and classes.
```
twilio
```
: The Twilio package allows you to send text/picture/video messages.
```
boto3
```
: The
```
boto3
```
package will communicate with the Amazon S3 files storage service. Our videos will be stored in S3.
```
json-minify
```
: Allows for commented JSON files (because we all love documentation!)

To install these packages, I recommend that you follow my pip install opencv guide to setup a Python virtual environment.

You can then pip install all required packages:

$ workon <env_name> # insert your environment name such as cv or py3cv4
$ pip install opencv-contrib-python
$ pip install imutils
$ pip install twilio
$ pip install boto3
$ pip install json-minify

Now that our environment is configured, each time you want to activate it, simply use the

workon

command.

Let’s review S3, boto3, and Twilio!

What is Amazon AWS and S3?

Figure 2: Amazon’s Simple Storage Service (S3) will be used to store videos captured from our IoT Raspberry Pi. We will use the boto3 Python package to work with S3.

Amazon Web Services (AWS) has a service called Simple Storage Service, commonly known as S3.

The S3 services is a highly popular service used for storing files. I actually use it to host some larger files such as GIFs on this blog.

Today we’ll be using S3 to host our video files generated by the Raspberry Pi Security camera.

S3 is organized by “buckets”. A bucket contains files and folders. It also can be set up with custom permissions and security settings.

A package called

boto3

will help us to transfer the files from our Internet of Things Raspberry Pi to AWS S3.

Before we dive into

boto3

, we need to set up an S3 bucket.

Let’s go ahead and create a bucket, resource group, and user. We’ll give the resource group permissions to access the bucket and then we’ll add the user to the resource group.

Step #1: Create a bucket

Amazon has great documentation on how to create an S3 bucket here.

Step #2: Create a resource group + user. Add the user to the resource group.

After you create your bucket, you’ll need to create an IAM user + resource group and define permissions.

Visit the resource groups page to create a group. I named my example “s3pi”.
Visit the users page to create a user. I named my example “raspberrypisecurity”.

Step #3: Grab your access keys. You’ll need to paste them into today’s config file.

Watch these slides to walk you through Steps 1-3, but refer to the documentation as well because slides become out of date rapidly:

Figure 3: The steps to gain API access to Amazon S3. We’ll use boto3 along with the access keys in our Raspberry Pi IoT project.

Obtaining your Twilio API keys

Figure 4: Twilio is a popular SMS/MMS platform with a great API.

Twilio, a phone number service with an API, allows for voice, SMS, MMS, and more.

Twilio will serve as the bridge between our Raspberry Pi and our cell phone. I want to know exactly when the chickpea bandit is opening my fridge so that I can take countermeasures.

Let’s set up Twilio now.

Step #1: Create an account and get a free number.

Go ahead and sign up for Twilio and you’ll be assigned a temporary trial number. You can purchase a number + quota later if you choose to do so.

Step #2: Grab your API keys.

Now we need to obtain our API keys. Here’s a screenshot showing where to create one and copy it:

Figure 5: The Twilio API keys are necessary to send text messages with Python.

A final note about Twilio is that it does support the popular What’s App messaging platform. Support for What’s App is welcomed by the international community, however, it is currently in Beta. Today we’ll be demonstrating standard SMS/MMS only. I’ll leave it up to you to explore Twilio in conjunction with What’s App.

Our JSON configuration file

There are a number of variables that need to be specified for this project, and instead of hardcoding them, I decided to keep our code more modular and organized by putting them in a dedicated JSON configuration file.

Since JSON doesn’t natively support comments, our

Conf

class will take advantage of JSON-minify to parse out the comments. If JSON isn’t your config file of choice, you can try YAML or XML as well.

Let’s take a look at the commented JSON file now:

{
	// two constants, first threshold for detecting if the
	// refrigerator is open, and a second threshold for the number of
	// seconds the refrigerator is open
	"thresh": 50,
	"open_threshold_seconds": 60,

Lines 5 and 6 contain two settings. The first is the light threshold for determining when the refrigerator is open. The second is a threshold for the number of seconds until it is determined that someone left the door open.

Now let’s handle AWS + S3 configs:

// variables to store your aws account credentials
	"aws_access_key_id": "YOUR_AWS_ACCESS_KEY_ID",
	"aws_secret_access_key": "YOUR_AWS_SECRET_ACCESS_KEY",
	"s3_bucket": "YOUR_AWS_S3_BUCKET",

Each of the values on Lines 9-11 are available in your AWS console (we just generated them in the “What is Amazon AWS and S3?” section above).

And finally our Twilio configs:

// variables to store your twilio account credentials
	"twilio_sid": "YOUR_TWILIO_SID",
	"twilio_auth": "YOUR_TWILIO_AUTH_ID",
	"twilio_to": "YOUR_PHONE_NUMBER",
	"twilio_from": "YOUR_TWILIO_PHONE_NUMBER"
}

Twilio security settings are on Lines 14 and 15. The

"twilio_from"

value must match one of your Twilio phone numbers. If you’re using the trial, you only have one number. If you use the wrong number, are out of quota, etc., Twilio will likely send an error message to your email address.

Phone numbers can be formatted like this in the U.S.:

"+1-555-555-5555"

Loading the JSON configuration file

Our configuration file includes comments (for documentation purposes) which unfortunately means we cannot use Python’s built-in

json

package which cannot load files with comments.

Instead, we’ll use a combination of JSON-minify and a custom

Conf

class to load our JSON file as a Python dictionary.

Let’s take a look at how to implement the

Conf

class now:

# import the necessary packages
from json_minify import json_minify
import json

class Conf:
	def __init__(self, confPath):
		# load and store the configuration and update the object's
		# dictionary
		conf = json.loads(json_minify(open(confPath).read()))
		self.__dict__.update(conf)

	def __getitem__(self, k):
		# return the value associated with the supplied key
		return self.__dict__.get(k, None)

This class is relatively straightforward. Notice that in the constructor, we use

json_minify

(Line 9) to parse out the comments prior to passing the file contents to

json.loads

The

__getitem__

method will grab any value from the configuration with dictionary syntax. In other words, we won’t call this method directly — rather, we’ll simply use dictionary syntax in Python to grab a value associated with a given key.

Uploading key video clips and sending them via text message

Once our security camera is triggered we’ll need methods to:

Upload the images/video to the cloud (since the Twilio API cannot directly serve “attachments”).
Utilize the Twilio API to actually send the text message.

To keep our code neat and organized we’ll be encapsulating this functionality inside a class named

TwilioNotifier

— let’s review this class now:

# import the necessary packages
from twilio.rest import Client
import boto3
from threading import Thread

class TwilioNotifier:
	def __init__(self, conf):
		# store the configuration object
		self.conf = conf

	def send(self, msg, tempVideo):
		# start a thread to upload the file and send it
		t = Thread(target=self._send, args=(msg, tempVideo,))
		t.start()

On Lines 2-4, we import the Twilio

Client

, Amazon’s

boto3

, and Python’s built-in

Thread

From there, our

TwilioNotifier

class and constructor are defined on Lines 6-9. Our constructor accepts a single parameter, the configuration, which we presume has been loaded from disk via the

Conf

class.

This project only demonstrates sending messages. We’ll be demonstrating receiving messages with Twilio in an upcoming blog post as well as in the Raspberry Pi Computer Vision book.

The

send

method is defined on Lines 11-14. This method accepts two key parameters:

The string text
```
msg
```
The video file,
```
tempVideo
```
. Once the video is successfully stored in S3, it will be removed from the Pi to save space. Hence it is a temporary video.

The

send

method kicks off a

Thread

to actually send the message, ensuring the main thread of execution is not blocked.

Thus, the core text message sending logic is in the next method,

_send

def _send(self, msg, tempVideo):
		# create a s3 client object
		s3 = boto3.client("s3",
			aws_access_key_id=self.conf["aws_access_key_id"],
			aws_secret_access_key=self.conf["aws_secret_access_key"],
		)

		# get the filename and upload the video in public read mode
		filename = tempVideo.path[tempVideo.path.rfind("/") + 1:]
		s3.upload_file(tempVideo.path, self.conf["s3_bucket"],
			filename, ExtraArgs={"ACL": "public-read",
			"ContentType": "video/mp4"})

The

_send

method is defined on Line 16. It operates as an independent thread so as not to impact the driver script flow.

Parameters (

msg

and

tempVideo

) are passed in when the thread is launched.

The

_send

method first will upload the video to AWS S3 via:

Initializing the
```
s3
```
client with the access key and secret access key (Lines 18-21).
Uploading the file (Lines 25-27).

Line 24 simply extracts the

filename

from the video path since we’ll need it later.

Let’s go ahead and send the message:

# get the bucket location and build the url
		location = s3.get_bucket_location(
			Bucket=self.conf["s3_bucket"])["LocationConstraint"]
		url = "https://s3-{}.amazonaws.com/{}/{}".format(location,
			self.conf["s3_bucket"], filename)

		# initialize the twilio client and send the message
		client = Client(self.conf["twilio_sid"],
			self.conf["twilio_auth"])
		client.messages.create(to=self.conf["twilio_to"], 
			from_=self.conf["twilio_from"], body=msg, media_url=url)
		
		# delete the temporary file
		tempVideo.cleanup()

To send the message and have the video show up in a cell phone messaging app, we need to send the actual text string along with a URL to the video file in S3.

Note: This must be a publicly accessible URL, so ensure that your S3 settings are correct.

The URL is generated on Lines 30-33.

From there, we’ll create a Twilio

client

(not to be confused with our boto3

s3

client) on Lines 36 and 37.

Lines 38 and 39 actually send the message. Notice the

to

from_

body

, and

media_url

parameters.

Finally, we’ll remove the temporary video file to save some precious space (Line 42). If we don’t do this it’s possible that your Pi may run out of space if your disk space is already low.

The Raspberry Pi security camera driver script

Now that we have (1) our configuration file, (2) a method to load the config, and (3) a class to interact with the S3 and Twilio APIs, let’s create the main driver script for the Raspberry Pi security camera.

The way this script works is relatively simple:

It monitors the average amount of light seen by the camera.
When the refrigerator door opens, the light comes on, the Pi detects the light, and the Pi starts recording.
When the refrigerator door is closed, the light turns off, the Pi detects the absence of light, and the Pi stops recording + sends me or you a video message.
If someone leaves the refrigerator open for longer than the specified seconds in the config file, I’ll receive a separate text message indicating that the door was left open.

Let’s go ahead and implement these features.

Open up the

detect.py

file and insert the following code:

# import the necessary packages
from __future__ import print_function
from pyimagesearch.notifications import TwilioNotifier
from pyimagesearch.utils import Conf
from imutils.video import VideoStream
from imutils.io import TempFile
from datetime import datetime
from datetime import date
import numpy as np
import argparse
import imutils
import signal
import time
import cv2
import sys

Lines 2-15 import our necessary packages. Notably, we’ll be using our

TwilioNotifier

Conf

class,

VideoStream

imutils

, and OpenCV.

Let’s define an interrupt signal handler and parse for our config file path argument:

# function to handle keyboard interrupt
def signal_handler(sig, frame):
	print("[INFO] You pressed `ctrl + c`! Closing refrigerator monitor" \
		" application...")
	sys.exit(0)

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--conf", required=True, 
	help="Path to the input configuration file")
args = vars(ap.parse_args())

Our script will run headless because we don’t need an HDMI screen inside the fridge.

On Lines 18-21, we define a

signal_handler

class to capture “ctrl + c” events from the keyboard gracefully. It isn’t always necessary to do this, but if you need anything to execute before the script exits (such as someone disabling your security camera!), you can put it in this function.

We have a single command line argument to parse. The

--conf

flag (the path to config file) can be provided directly in the terminal or launch on reboot script. You may learn more about command line arguments here.

Let’s perform our initializations:

# load the configuration file and initialize the Twilio notifier
conf = Conf(args["conf"])
tn = TwilioNotifier(conf)

# initialize the flags for fridge open and notification sent
fridgeOpen = False
notifSent = False

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] warming up camera...")
# vs = VideoStream(src=0).start()
vs = VideoStream(usePiCamera=True).start()
time.sleep(2.0)

# signal trap to handle keyboard interrupt
signal.signal(signal.SIGINT, signal_handler)
print("[INFO] Press `ctrl + c` to exit, or 'q' to quit if you have" \
	" the display option on...")

# initialize the video writer and the frame dimensions (we'll set
# them as soon as we read the first frame from the video)
writer = None
W = None
H = None

Our initializations take place on Lines 30-52. Let’s review them:

Lines 30 and 31 instantiate our
```
Conf
```
and
```
TwilioNotifier
```
objects.
Two status variables are initialized to determine when the fridge is open and when a notification has been sent (Lines 34 and 35).
We’ll start our
```
VideoStream
```
on Lines 39-41. I’ve elected to use a PiCamera, so Line 39 (USB webcam) is commented out. You can easily swap these if you are using a USB webcam.
Line 44 starts our
```
signal_handler
```
thread to run in the background.
Our video
```
writer
```
and frame dimensions are initialized on Lines 50-52.

It’s time to begin looping over frames:

# loop over the frames of the stream
while True:
	# grab both the next frame from the stream and the previous
	# refrigerator status
	frame = vs.read()
	fridgePrevOpen = fridgeOpen

	# quit if there was a problem grabbing a frame
	if frame is None:
		break

	# resize the frame and convert the frame to grayscale
	frame = imutils.resize(frame, width=200)
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	
	# if the frame dimensions are empty, set them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

Our

while

loop begins on Line 55. We proceed to

read

frame

from our video stream (Line 58). The

frame

undergoes a sanity check on Lines 62 and 63 to determine if we have a legitimate image from our camera.

Line 59 sets our

fridgePrevOpen

flag. The previous value must always be set at the beginning of the loop and it is based on the current value which will be determined later.

Our

frame

is resized to a dimension that will look reasonable on a smartphone and also make for a smaller filesize for our MMS video (Line 66).

On Line 67, we create a grayscale image from

frame

— we’ll need this soon to determine the average amount of light in the frame.

Our dimensions are set via Lines 70 and 71 during the first iteration of the loop.

Now let’s determine if the refrigerator is open:

# calculate the average of all pixels where a higher mean
	# indicates that there is more light coming into the refrigerator
	mean = np.mean(gray)

	# determine if the refrigerator is currently open
	fridgeOpen = mean > conf["thresh"]

Determining if the refrigerator is open is a dead-simple, two-step process:

Average all pixel intensities of our grayscale image (Line 75).
Compare the average to the threshold value in our configuration (Line 78). I’m confident that a value of
```
50
```
(in the
```
config.json
```
file) will be an appropriate threshold for most refrigerators with a light that turns on and off as the door is opened and closed. That said, you may want to experiment with tweaking that value yourself.

The

fridgeOpen

variable is simply a boolean indicating if the refrigerator is open or not.

Let’s now determine if we need to start capturing a video:

# if the fridge is open and previously it was closed, it means
	# the fridge has been just opened
	if fridgeOpen and not fridgePrevOpen:
		# record the start time
		startTime = datetime.now()

		# create a temporary video file and initialize the video
		# writer object
		tempVideo = TempFile(ext=".mp4")
		writer = cv2.VideoWriter(tempVideo.path, 0x21, 30, (W, H),
			True)

As shown by the conditional on Line 82, so long as the refrigerator was just opened (i.e. it was not previously opened), we will initialize our video

writer

We’ll go ahead and grab the

startTime

, create a

tempVideo

, and initialize our video

writer

with the temporary file path (Lines 84-90).

Now we’ll handle the case where the refrigerator was previously open:

# if the fridge is open then there are 2 possibilities,
	# 1) it's left open for more than the *threshold* seconds. 
	# 2) it's closed in less than or equal to the *threshold* seconds.
	elif fridgePrevOpen:
		# calculate the time different between the current time and
		# start time
		timeDiff = (datetime.now() - startTime).seconds

		# if the fridge is open and the time difference is greater
		# than threshold, then send a notification
		if fridgeOpen and timeDiff > conf["open_threshold_seconds"]:
			# if a notification has not been sent yet, then send a 
			# notification
			if not notifSent:
				# build the message and send a notification
				msg = "Intruder has left your fridge open!!!"

				# release the video writer pointer and reset the
				# writer object
				writer.release()
				writer = None
				
				# send the message and the video to the owner and
				# set the notification sent flag
				tn.send(msg, tempVideo)
				notifSent = True

If the refrigerator was previously open, let’s check to ensure it wasn’t left open long enough to trigger an “Intruder has left your fridge open!” alert.

Kids can leave the refrigerator open by accident, or maybe after a holiday, you have a lot of food preventing the refrigerator door from closing all the way. You don’t want your food to spoil, so you may want these alerts!

For this message to be sent, the

timeDiff

must be greater than the threshold set in the config (Lines 98-102).

This message will include a

msg

and video to you, as shown on Lines 107-117. The

msg

is defined, the

writer

is released, and the notification is set.

Let’s now take care of the most common scenario where the refrigerator was previously open, but now it is closed (i.e. some thief stole your food, or maybe it was you when you became hungry):

# check to see if the fridge is closed
		elif not fridgeOpen:
			# if a notification has already been sent, then just set 
			# the notifSent to false for the next iteration
			if notifSent:
				notifSent = False

			# if a notification has not been sent, then send a 
			# notification
			else:
				# record the end time and calculate the total time in
				# seconds
				endTime = datetime.now()
				totalSeconds = (endTime - startTime).seconds
				dateOpened = date.today().strftime("%A, %B %d %Y")

				# build the message and send a notification
				msg = "Your fridge was opened on {} at {} " \
					"at {} for {} seconds.".format(dateOpened
					startTime.strftime("%I:%M%p"), totalSeconds)

				# release the video writer pointer and reset the
				# writer object
				writer.release()
				writer = None
				
				# send the message and the video to the owner
				tn.send(msg, tempVideo)

The case beginning on Line 120 will send a video message indicating, “Your fridge was opened on {{ day }} at {{ time }} for {{ seconds }}.”

On Lines 123 and 124, our

notifSent

flag is reset if needed. If the notification was already sent, we set this value to

False

, effectively resetting it for the next iteration of the loop.

Otherwise, if the notification has not been sent, we’ll calculate the

totalSeconds

the refrigerator was open (Lines 131 and 132). We’ll also record the date the door was opened (Line 133).

Our

msg

string is populated with these values (Lines 136-138).

Then the video

writer

is released and the message and video are sent (Line 142-147).

Our final block finishes out the loop and performs cleanup:

# check to see if we should write the frame to disk
	if writer is not None:
		writer.write(frame)

# check to see if we need to release the video writer pointer
if writer is not None:
	writer.release()

# cleanup the camera and close any open windows
cv2.destroyAllWindows()
vs.stop()

To finish the loop, we’ll write the

frame

to the video

writer

object and then go back to the top to grab the next frame.

When the loop exits, the

writer

is released, and the video stream is stopped.

Great job! You made it through a simple IoT project using a Raspberry Pi and camera.

It’s now time to place the bait. I know my thief likes hummus as much as I do, so I ran to the store and came back to put it in the fridge.

RPi security camera results

Figure 6: My refrigerator is armed with an Internet of Things (IoT) Raspberry Pi, PiCamera, and Battery Pack. And of course, I’ve placed some hummus in there for me and the thief. I’ll also know if someone takes a New Belgium Dayblazer beer of mine.

When deploying the Raspberry Pi security camera in your refrigerator to catch the hummus bandit, you’ll need to ensure that it will continue to run without a wireless connection to your laptop.

There are two great options for deployment:

Run the computer vision Python script on reboot.
Leave a
```
screen
```
session running with the Python computer vision script executing within.

Be sure to visit the first link if you just want your Pi to run the script when you plug in power.

While this blog post isn’t the right place for a full screen demo, here are the basics:

Install screen via:
```
sudo apt-get install screen
```
Open an SSH connection to your Pi and run it:
```
screen
```
If the connection from your laptop to your Pi ever dies or is closed, don’t panic! The screen session is still running. You can reconnect by SSH’ing into the Pi again and then running
```
screen -r
```
. You’ll be back in your virtual window.
Keyboard shortcuts for screen:
- “ctrl + a, c”: Creates a new “window”.
- “ctrl + a, p” and “ctrl + a, n”: Cycles through “previous” and “next” windows, respectively.
For a more in-depth review of
```
screen
```
, see the documentation. Here’s a screen keyboard shortcut cheat sheet.

Once you’re comfortable with starting a script on reboot or working with

screen

, grab a USB battery pack that can source enough current. Shown in Figure 4, we’re using a RavPower 2200mAh battery pack connected to the Pi power input. The product specs claim to charge an iPhone 6+ times, and it seems to run a Raspberry Pi for about +/-10 hours (depending on the algorithm) as well.

Go ahead and plug in the battery pack, connect, and deploy the script (if you didn’t set it up to start on boot).

The commands are:

$ screen
# wait for screen to start
$ source ~/.profile
$ workon <env_name> # insert the name of your virtual environment
$ python detect.py --conf config/config.json

If you aren’t familiar with command line arguments, please read this tutorial. The command line argument is also required if you are deploying the script upon reboot.

Let’s see it in action!

Figure 7: Me testing the Pi Security Camera notifications with my iPhone.

I’ve included a full deme of the Raspberry Pi security camera below:

Interested in building more projects with the Raspberry Pi, OpenCV, and computer vision?

Figure 8: Catching a furry little raccoon with an infrared light/camera connected to the Raspberry Pi.

Are you interested in using your Raspberry Pi to build practical, real-world computer vision and deep learning applications, including:

Computer vision and IoT projects on the Pi
Servos, PID, and controlling the Pi with computer vision
Human activity, home surveillance, and facial applications
Deep learning on the Raspberry Pi
Fast, efficient deep learning with the Movidius NCS and OpenVINO toolkit
Self-driving car applications on the Raspberry Pi
Tips, suggestions, and best practices when performing computer vision and deep learning with the Raspberry Pi

If so, you’ll definitely want to check out my upcoming book, Raspberry Pi for Computer Vision — to learn more about the book (including release date information) just click the link below and enter your email address:

Keep me in the loop!

From there I’ll ensure you’re kept in the know on the RPi + Computer Vision book, including updates, behind the scenes looks, and release date information.

Summary

In this tutorial, you learned how to build a Raspberry Pi security camera from scratch using OpenCV and computer vision.

Specifically, you learned how to:

Access the Raspberry Pi camera module or USB webcam.
Setup your Amazon AWS/S3 account so you can upload images/video when your security camera is triggered (other services such as Dropbox, Box, Google Drive, etc. will work as well, provided you can obtain a public-facing URL of the media).
Obtain Twilio API keys used to send text messages with the uploaded images/video.
Create a Raspberry Pi security camera using OpenCV and computer vision.

Finally, we put all the pieces together and deployed the security camera to monitor a refrigerator:

Each time the door was opened we started recording
After the door was closed the recording stopped
The recording was then uploaded to the cloud
And finally, a text message was sent to our phone showing the activity

You can extend the security camera to include other components as well. My first suggestion would be to take a look at how to build a home surveillance system using a Raspberry Pi where we use a more advanced motion detection technique. It would be fun to implement Twilio SMS/MMS notifications into the home surveillance project as well.

I hope you enjoyed this tutorial!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Building a Raspberry Pi security camera with OpenCV appeared first on PyImageSearch.

Inside this tutorial, you will learn how to perform pan and tilt object tracking using a Raspberry Pi, Python, and computer vision.

One of my favorite features of the Raspberry Pi is the huge amount of additional hardware you can attach to the Pi. Whether it’s cameras, temperature sensors, gyroscopes/accelerometers, or even touch sensors, the community surrounding the Raspberry Pi has enabled it to accomplish nearly anything.

But one of my favorite add-ons to the Raspberry Pi is the pan and tilt camera.

Using two servos, this add-on enables our camera to move left-to-right and up-and-down simultaneously, allowing us to detect and track objects, even if they were to go “out of frame” (as would happen if an object approached the boundaries of a frame with a traditional camera).

Today we are going to use the pan and tilt camera for object tracking and more specifically, face tracking.

To learn how to perform pan and tilt tracking with the Raspberry Pi and OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Pan/tilt face tracking with a Raspberry Pi and OpenCV

In the first part of this tutorial, we’ll briefly describe what pan and tilt tracking is and how it can be accomplished using servos.

From there we’ll also review the concept of a PID controller, a control loop feedback mechanism often used in control systems.

We’ll then will implement our PID controller, face detector + object tracker, and driver script used to perform pan/tilt tracking.

I’ll also cover manual PID tuning basics — an essential skill.

Let’s go ahead and get started!

What is pan/tilt object tracking?

Figure 1: The Raspberry Pi pan-tilt servo HAT by Pimoroni.

The goal of pan and tilt object tracking is for the camera to stay centered upon an object.

Typically this tracking is accomplished with two servos. In our case, we have one servo for panning left and right. We have a separate servo for tilting up and down.

Each of our servos and the fixture itself has a range of 180 degrees (some systems have a greater range than this).

Hardware requirements for today’s project

You will need the following hardware to replicate today’s project:

Pimoroni pan tilt HAT full kit – The Pimoroni kit is a quality product and it hasn’t let me down. Budget about 30 minutes for assembly. I do not recommend the SparkFun kit as it requires soldering and additional assembly.
2.5A, 5V power supply – If you supply less than 2.5A, your Pi might not have enough current causing it to reset. Why? Because the servos draw necessary current away. Get a power supply and dedicate it to this project hardware.
HDMI Screen – Placing an HDMI screen next to your camera as you move around will allow you to visualize and debug, essential for manual tuning. Do not try X11 forwarding — it is simply too slow for video applications. VNC is possible if you don’t have an HDMI screen but I haven’t found an easy way to start VNC without having an actual screen plugged in as well.
Keyboard/mouse – Obvious reasons.

What is a PID controller?

A common feedback control loop is what is called a PID or Proportional-Integral-Derivative controller.

PIDs are typically used in automation such that a mechanical actuator can reach an optimum value (read by the feedback sensor) quickly and accurately.

They are used in manufacturing, power plants, robotics, and more.

The PID controller calculates an error term (the difference between desired set point and sensor reading) and has a goal of compensating for the error.

The PID calculation outputs a value that is used as an input to a “process” (an electromechanical process, not what us computer science/software engineer types think of as a “computer process”).

The sensor output is known as the “process variable” and serves as input to the equation. Throughout the feedback loop, timing is captured and it is input to the equation as well.

Wikipedia has a great diagram of a PID controller:

Figure 2: A Proportional Integral Derivative (PID) control loop will be used for each of our panning and tilting processes (image source).

Notice how the output loops back into the input. Also notice how the Proportional, Integral, and Derivative values are each calculated and summed.

The figure can be written in equation form as:

$u(t) = K_\text{p} e(t) + K_\text{i} \int_0^t e(t') \,dt' + K_\text{d} \frac{de(t)}{dt}$

Let’s review P, I, and D:

P (proportional): If the current error is large, the output will be proportionally large to cause a significant correction.
I (integral): Historical values of the error are integrated over time. Less significant corrections are made to reduce the error. If the error is eliminated, this term won’t grow.
D (derivative): This term anticipates the future. In effect, it is a dampening method. If either P or I will cause a value to overshoot (i.e. a servo was turned past an object or a steering wheel was turned too far), D will dampen the effect before it gets to the output.

Do I need to learn more about PIDs and where is the best place?

PIDs are a fundamental control theory concept.

There are tons of resources. Some are heavy on mathematics, some conceptual. Some are easy to understand, some not.

That said, as a software programmer, you just need to know how to implement one and tune one. Even if you think the mathematical equation looks complex, when you see the code, you will be able to follow and understand.

PIDs are easier to tune if you understand how they work, but as long as you follow the manual tuning guidelines demonstrated later in this post, you don’t have to be intimate with the equations above at all times.

Just remember:

P – proportional, present (large corrections)
I – integral, “in the past” (historical)
D – derivative, dampening (anticipates the future)

For more information, the Wikipedia PID controller page is really great and also links to other great guides.

Project structure

Once you’ve grabbed today’s “Downloads” and extracted them, you’ll be presented with the following directory structure:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   ├── objcenter.py
│   └── pid.py
├── haarcascade_frontalface_default.xml
└── pan_tilt_tracking.py

1 directory, 5 files

Today we’ll be reviewing three Python files:

```
objcenter.py
```
: Calculates the center of a face bounding box using the Haar Cascade face detector. If you wish, you may detect a different type of object and place the logic in this file.
```
pid.py
```
: Discussed above, this is our control loop. I like to keep the PID in a class so that I can create new
```
PID
```
objects as needed. Today we have two: (1) panning and (2) tilting.
```
pan_tilt_tracking.py
```
: This is our pan/tilt object tracking driver script. It uses multiprocessing with four independent processes (two of which are for panning and tilting, one is for finding an object, and one is for driving the servos with fresh angle values).

The

haarcascade_frontalface_default.xml

is our pre-trained Haar Cascade face detector. Haar works great with the Raspberry Pi as it requires fewer computaitonal resources than HOG or Deep Learning.

Creating the PID controller

The following PID script is based on Erle Robotics GitBook‘s example as well as the Wikipedia pseudocode. I added my own style and formatting that readers (like you) of my blog have come to expect.

Go ahead and open

pid.py

. Let’s review:

# import necessary packages
import time

class PID:
	def __init__(self, kP=1, kI=0, kD=0):
		# initialize gains
		self.kP = kP
		self.kI = kI
		self.kD = kD

This script implements the PID formula. It is heavy in basic math. We don’t need to import advanced math libraries, but we do need to import

time

on Line 2 (our only import).

We define a class called

PID

on Line 4.

The

PID

class has three methods:

```
__init__
```
: The constructor.
```
initialize
```
: Initializes values. This logic could be in the constructor, but then you wouldn’t have the convenient option of reinitializing at any time.
```
update
```
: This is where the calculation is made.

Our constructor is defined on Lines 5-9 accepting three parameters,

kP

kI

, and

kD

. These values are constants and are specified in our driver script. Three corresponding instance variables are defined in the method body.

Now let’s review

initialize

def initialize(self):
		# initialize the current and previous time
		self.currTime = time.time()
		self.prevTime = self.currTime

		# initialize the previous error
		self.prevError = 0

		# initialize the term result variables
		self.cP = 0
		self.cI = 0
		self.cD = 0

The

initialize

method sets our current timestamp and previous timestamp on Lines 13 and 14 (so we can calculate the time delta in our

update

method).

Our self-explanatory previous error term is defined on Line 17.

The P, I, and D variables are established on Lines 20-22.

Let’s move on to the heart of the PID class — the

update

method:

def update(self, error, sleep=0.2):
		# pause for a bit
		time.sleep(sleep)

		# grab the current time and calculate delta time
		self.currTime = time.time()
		deltaTime = self.currTime - self.prevTime

		# delta error
		deltaError = error - self.prevError

		# proportional term
		self.cP = error

		# integral term
		self.cI += error * deltaTime

		# derivative term and prevent divide by zero
		self.cD = (deltaError / deltaTime) if deltaTime > 0 else 0

		# save previous time and error for the next update
		self.prevtime = self.currTime
		self.prevError = error

		# sum the terms and return
		return sum([
			self.kP * self.cP,
			self.kI * self.cI,
			self.kD * self.cD])

Our update method accepts two parameters: the

error

value and

sleep

in seconds.

Inside the

update

method, we:

Sleep for a predetermined amount of time on Line 26, thereby preventing updates so fast that our servos (or another actuator) can’t respond fast enough. The
```
sleep
```
value should be chosen wisely based on knowledge of mechanical, computational, and even communication protocol limitations. Without prior knowledge, you should experiment for what seems to work best.
Calculate
```
deltaTime
```
(Line 30). Updates won’t always come in at the exact same time (we have no control over it). Thus, we calculate the time difference between the previous update and now (this current update). This will affect our
```
cI
```
and
```
cD
```
terms.
Compute
```
deltaError
```
(Line 33) The difference between the provided
```
error
```
and
```
prevError
```
.

Then we calculate our

PID

control terms:

```
cP
```
: Our proportional term is equal to the
```
error
```
term.
```
cI
```
: Our integral term is simply the
```
error
```
multiplied by
```
deltaTime
```
.
```
cD
```
: Our derivative term is
```
deltaError
```
over
```
deltaTime
```
. Division by zero is accounted for.

Finally, we:

Set the
```
prevTime
```
and
```
prevError
```
(Lines 45 and 46). We’ll need these values during our next
```
update
```
.
Return the summation of calculated terms multiplied by constant terms (Lines 49-52).

Keep in mind that updates will be happening in a fast-paced loop. Depending on your needs, you should adjust the

sleep

parameter (as previously mentioned).

Implementing the face detector and object center tracker

Figure 3: Panning and tilting with a Raspberry Pi camera to keep the camera centered on a face.

The goal of our pan and tilt tracker will be to keep the camera centered on the object itself.

To accomplish this goal, we need to:

Detect the object itself.
Compute the center (x, y)-coordinates of the object.

Let’s go ahead and implement our

ObjCenter

class which will accomplish both of these goals:

# import necessary packages
import imutils
import cv2

class ObjCenter:
	def __init__(self, haarPath):
		# load OpenCV's Haar cascade face detector
		self.detector = cv2.CascadeClassifier(haarPath)

This script requires

imutils

and

cv2

to be imported.

Our

ObjCenter

class is defined on Line 5.

On Line 6, the constructor accepts a single argument — the path to the Haar Cascade face detector.

We’re using the Haar method to find faces. Keep in mind that the Raspberry Pi (even a 3B+) is a resource-constrained device. If you elect to use a slower (but more accurate) HOG or a CNN, keep in mind that you’ll want to slow down the PID calculations so they aren’t firing faster than you’re actually detecting new face coordinates.

Note: You may also elect to use a Movidius NCS or Google Coral TPU USB Accelerator for face detection. We’ll be covering that concept in a future tutorial/in the Raspberry Pi for Computer Vision book.

The

detector

is initialized on Line 8.

Let’s define the

update

method which will find the center (x, y)-coordinate of a face:

def update(self, frame, frameCenter):
		# convert the frame to grayscale
		gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

		# detect all faces in the input frame
		rects = self.detector.detectMultiScale(gray, scaleFactor=1.05,
			minNeighbors=9, minSize=(30, 30),
			flags=cv2.CASCADE_SCALE_IMAGE)

		# check to see if a face was found
		if len(rects) > 0:
			# extract the bounding box coordinates of the face and
			# use the coordinates to determine the center of the
			# face
			(x, y, w, h) = rects[0]
			faceX = int((x + w) / 2)
			faceY = int((y + h) / 2)

			# return the center (x, y)-coordinates of the face
			return ((faceX, faceY), rects[0])

		# otherwise no faces were found, so return the center of the
		# frame
		return (frameCenter, None)

Today’s project has two

update

methods so I’m taking the time here to explain the difference:

We previously reviewed the
```
PID
```
```
update
```
method. This method performs the PID calculations to help calculate a servo angle to keep the face in the center of the camera’s view.
Now we are reviewing the
```
ObjCcenter
```
```
update
```
method. This method simply finds a face and returns its center coordinates.

The

update

method (for finding the face) is defined on Line 10 and accepts two parameters:

```
frame
```
: An image ideally containing one face.
```
frameCenter
```
: The center coordinates of the frame.

The frame is converted to grayscale on Line 12.

From there we perform face detection using the Haar Cascade

detectMultiScale

method.

On Lines 20-26 we check that faces have been detected and from there calculate the center (x, y)-coordinates of the face itself.

Lines 20-24 makes an important assumption: we assume that only one face is in the frame at all times and that face can be accessed by the 0-th index of

rects

Note: Without this assumption holding true additional logic would be required to determine which face to track. See the “Improvements for pan/tilt face tracking with the Raspberry Pi” section of this post. where I describe how to handle multiple face detections with Haar.

The center of the face, as well as the bounding box coordinates, are returned on Line 29. We’ll use the bounding box coordinates to draw a box around the face for display purposes.

Otherwise, when no faces are found, we simply return the center of the frame (so that the servos stop and do not make any corrections until a face is found again).

Our pan and tilt driver script

Let’s put the pieces together and implement our pan and tilt driver script!

Open up the

pan_tilt_tracking.py

file and insert the following code:

# import necessary packages
from multiprocessing import Manager
from multiprocessing import Process
from imutils.video import VideoStream
from pyimagesearch.objcenter import ObjCenter
from pyimagesearch.pid import PID
import pantilthat as pth
import argparse
import signal
import time
import sys
import cv2

# define the range for the motors
servoRange = (-90, 90)

On Line 2-12 we import necessary libraries. Notably we’ll use:

```
Process
```
and
```
Manager
```
will help us with
```
multiprocessing
```
and shared variables.
```
VideoStream
```
will allow us to grab frames from our camera.
```
ObjCenter
```
will help us locate the object in the frame while
```
PID
```
will help us keep the object in the center of the frame by calculating our servo angles.
```
pantilthat
```
is the library used to interface with the Raspberry Pi Pimoroni pan tilt HAT.

Our servos on the pan tilt HAT have a range of 180 degrees (-90 to 90) as is defined on Line 15. These values should reflect the limitations of your servos.

Let’s define a “ctrl + c”

signal_handler

# function to handle keyboard interrupt
def signal_handler(sig, frame):
	# print a status message
	print("[INFO] You pressed `ctrl + c`! Exiting...")

	# disable the servos
	pth.servo_enable(1, False)
	pth.servo_enable(2, False)

	# exit
	sys.exit()

This multiprocessing script can be tricky to exit from. There are a number of ways to accomplish it, but I decided to go with a

signal_handler

approach.

The

signal_handler

is a thread that runs in the background and it will be called using the the

signal

module of Python. It accepts two arguments,

sig

and the

frame

. The

sig

is the signal itself (generally “ctrl + c”). The

frame

is not a video frame and is actually the execution frame.

We’ll need to start the

signal_handler

thread inside of each process.

Line 20 prints a status message. Lines 23 and 24 disable our servos. And Line 27 exits from our program.

You might look at this script as a whole and think “If I have four processes, and

signal_handler

is running in each of them, then this will occur four times.”

You are absolutely right, but this is a compact and understandable way to go about killing off our processes, short of pressing “ctrl + c” as many times as you can in a sub-second period to try to get all processes to die off. Imagine if you had 10 processes and were trying to kill them with the “ctrl + c” approach.

Now that we know how our processes will exit, let’s define our first process:

def obj_center(args, objX, objY, centerX, centerY):
	# signal trap to handle keyboard interrupt
	signal.signal(signal.SIGINT, signal_handler)

	# start the video stream and wait for the camera to warm up
	vs = VideoStream(usePiCamera=True).start()
	time.sleep(2.0)

	# initialize the object center finder
	obj = ObjCenter(args["cascade"])

	# loop indefinitely
	while True:
		# grab the frame from the threaded video stream and flip it
		# vertically (since our camera was upside down)
		frame = vs.read()
		frame = cv2.flip(frame, 0)

		# calculate the center of the frame as this is where we will
		# try to keep the object
		(H, W) = frame.shape[:2]
		centerX.value = W // 2
		centerY.value = H // 2

		# find the object's location
		objectLoc = obj.update(frame, (centerX.value, centerY.value))
		((objX.value, objY.value), rect) = objectLoc

		# extract the bounding box and draw it
		if rect is not None:
			(x, y, w, h) = rect
			cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0),
				2)

		# display the frame to the screen
		cv2.imshow("Pan-Tilt Face Tracking", frame)
		cv2.waitKey(1)

	# stop the video stream
	vs.stop()

Our

obj_center

thread begins on Line 29 and accepts five variables:

```
args
```
: Our command line arguments dictionary (created in our main thread).
```
objX
```
and
```
objY
```
: The (x, y)-coordinates of the object. We’ll continuously calculate this.
```
centerX
```
and
```
centerY
```
: The center of the frame.

On Line 31 we start our

signal_handler

Then, on Lines 34 and 35, we start our

VideoStream

for our

PiCamera

, allowing it to warm up for two seconds.

Our

ObjCenter

is instantiated as

obj

on Line 38. Our cascade path is passed to the constructor.

From here, our process enters an infinite loop on Line 41. The only way to escape out of the loop is if the user types “ctrl + c” as you’ll notice no

break

command.

Our

frame

is grabbed and flipped on Lines 44 and 45. We must

flip

the

frame

because the

PiCamera

is physically upside down in the pan tilt HAT fixture by design.

Lines 49-51 set our frame width and height as well as calculate the center point of the frame. You’ll notice that we are using

.value

to access our center point variables — this is required with the

Manager

method of sharing data between processes.

To calculate where our object is, we’ll simply call the

update

method on

obj

while passing the video

frame

. The reason we also pass the center coordinates is because we’ll just have the

ObjCenter

class return the frame center if it doesn’t see a Haar face. Effectively, this makes the PID error

and thus, the servos stop moving and remain in their current positions until a face is found.

Note: I choose to return the frame center if the face could not be detected. Alternatively, you may wish to return the coordinates of the last location a face was detected. That is an implementation choice that I will leave up to you.

The result of the

update

is parsed on Line 55 where our object coordinates and the bounding box are assigned.

The last steps are to draw a rectangle around our face (Lines 58-61) and to display the video frame (Lines 64 and 65).

Let’s define our next process,

pid_process

def pid_process(output, p, i, d, objCoord, centerCoord):
	# signal trap to handle keyboard interrupt
	signal.signal(signal.SIGINT, signal_handler)

	# create a PID and initialize it
	p = PID(p.value, i.value, d.value)
	p.initialize()

	# loop indefinitely
	while True:
		# calculate the error
		error = centerCoord.value - objCoord.value

		# update the value
		output.value = p.update(error)

Our

pid_process

is quite simple as the heavy lifting is taken care of by the

PID

class. Two of these processes will be running at any given time (panning and tilting). If you have a complex robot, you might have many more PID processes running.

The method accepts six parameters:

```
output
```
: The servo angle that is calculated by our PID controller. This will be a pan or tilt angle.
```
p
```
,
```
i
```
, and
```
d
```
: Our PID constants.
```
objCoord
```
: This value is passed to the process so that the process has access to keep track of where the object is. For panning, it is an x-coordinate. Similarly, for tilting, it is a y-coordinate.
```
centerCoord
```
: Used to calculate our
```
error
```
, this value is just the center of the frame (either x or y depending on whether we are panning or tilting).

Be sure to trace each of the parameters back to where the process is started in the main thread of this program.

On Line 69, we start our special

signal_handler

Then we instantiate our PID on Line 72, passing the each of the P, I, and D values.

Subsequently, the

PID

object is initialized (Line 73).

Now comes the fun part in just two lines of code:

Calculate the
```
error
```
on Line 78. For example, this could be the frame’s y-center minus the object’s y-location for tilting.
Call
```
update
```
(Line 81), passing the new error (and a sleep time if necessary). The returned value is the
```
output.value
```
. Continuing our example, this would be the tilt angle in degrees.

We have another thread that “watches” each

output.value

to drive the servos.

Speaking of driving our servos, let’s implement a servo range checker and our servo driver now:

def in_range(val, start, end):
	# determine the input value is in the supplied range
	return (val >= start and val <= end)

def set_servos(pan, tlt):
	# signal trap to handle keyboard interrupt
	signal.signal(signal.SIGINT, signal_handler)

	# loop indefinitely
	while True:
		# the pan and tilt angles are reversed
		panAngle = -1 * pan.value
		tiltAngle = -1 * tlt.value

		# if the pan angle is within the range, pan
		if in_range(panAngle, servoRange[0], servoRange[1]):
			pth.pan(panAngle)

		# if the tilt angle is within the range, tilt
		if in_range(tiltAngle, servoRange[0], servoRange[1]):
			pth.tilt(tiltAngle)

Lines 83-85 define an

in_range

method to determine if a value is within a particular range.

From there, we’ll drive our servos to specific pan and tilt angles in the

set_servos

method.

Our

set_servos

method will be running in another process. It accepts

pan

and

tlt

values and will watch the values for updates. The values themselves are constantly being adjusted via our

pid_process

We establish our

signal_handler

on Line 89.

From there, we’ll start our infinite loop until a signal is caught:

Our
```
panAngle
```
and
```
tltAngle
```
values are made negative to accommodate the orientation of the servos and camera (Lines 94 and 95).
Then we check each value ensuring it is in the range as well as drive the servos to the new angle (Lines 98-103).

That was easy.

Now let’s parse command line arguments:

# check to see if this is the main body of execution
if __name__ == "__main__":
	# construct the argument parser and parse the arguments
	ap = argparse.ArgumentParser()
	ap.add_argument("-c", "--cascade", type=str, required=True,
		help="path to input Haar cascade for face detection")
	args = vars(ap.parse_args())

The main body of execution begins on Line 106.

We parse our command line arguments on Lines 108-111. We only have one — the path to the Haar Cascade on disk.

Now let’s work with process safe variables and start our processes:

# start a manager for managing process-safe variables
	with Manager() as manager:
		# enable the servos
		pth.servo_enable(1, True)
		pth.servo_enable(2, True)

		# set integer values for the object center (x, y)-coordinates
		centerX = manager.Value("i", 0)
		centerY = manager.Value("i", 0)

		# set integer values for the object's (x, y)-coordinates
		objX = manager.Value("i", 0)
		objY = manager.Value("i", 0)

		# pan and tilt values will be managed by independed PIDs
		pan = manager.Value("i", 0)
		tlt = manager.Value("i", 0)

Inside the

Manager

block, our process safe variables are established. We have quite a few of them.

First, we enable the servos on Lines 116 and 117. Without these lines, the hardware won’t work.

Let’s look at our first handful of process safe variables:

The frame center coordinates are integers (denoted by
```
"i"
```
) and initialized to
```
0
```
(Lines 120 and 121).
The object center coordinates, also integers and initialized to
```
0
```
(Lines 124 and 125).
Our
```
pan
```
and
```
tlt
```
angles (Lines 128 and 129) are integers that I’ve set to start in the center pointing towards a face (angles of
```
0
```
degrees).

Now is where we’ll set the P, I, and D constants:

# set PID values for panning
		panP = manager.Value("f", 0.09)
		panI = manager.Value("f", 0.08)
		panD = manager.Value("f", 0.002)

		# set PID values for tilting
		tiltP = manager.Value("f", 0.11)
		tiltI = manager.Value("f", 0.10)
		tiltD = manager.Value("f", 0.002)

Our panning and tilting PID constants (process safe) are set on Lines 132-139. These are floats. Be sure to review the PID tuning section next to learn how we found suitable values. To get the most value out of this project, I would recommend setting each to zero and following the tuning method/process (not to be confused with a computer science method/process).

With all of our process safe variables ready to go, let’s launch our processes:

# we have 4 independent processes
		# 1. objectCenter  - finds/localizes the object
		# 2. panning       - PID control loop determines panning angle
		# 3. tilting       - PID control loop determines tilting angle
		# 4. setServos     - drives the servos to proper angles based
		#                    on PID feedback to keep object in center
		processObjectCenter = Process(target=obj_center,
			args=(args, objX, objY, centerX, centerY))
		processPanning = Process(target=pid_process,
			args=(pan, panP, panI, panD, objX, centerX))
		processTilting = Process(target=pid_process,
			args=(tlt, tiltP, tiltI, tiltD, objY, centerY))
		processSetServos = Process(target=set_servos, args=(pan, tlt))

		# start all 4 processes
		processObjectCenter.start()
		processPanning.start()
		processTilting.start()
		processSetServos.start()

		# join all 4 processes
		processObjectCenter.join()
		processPanning.join()
		processTilting.join()
		processSetServos.join()

		# disable the servos
		pth.servo_enable(1, False)
		pth.servo_enable(2, False)

Each process is kicked off on Lines 147-153, passing required process safe values. We have four processes:

A process which finds the object in the frame. In our case, it is a face.
A process which calculates panning (left and right) angles with a PID.
A process which calculates tilting (up and down) angles with a PID.
A process which drives the servos.

Each of the processes is started and then joined (Lines 156-165).

Servos are disabled when all processes exit (Lines 168 and 169). This also occurs in the

signal_handler

just in case.

Tuning the pan and tilt PIDs independently, a critical step

That was a lot of work!

Now that we understand the code, we need to perform manual tuning of our two independent PIDs (one for panning and one for tilting).

Tuning a PID ensures that our servos will track the object (in our case, a face) smoothly.

Be sure to refer to the manual tuning section in the PID Wikipedia article.

The article instructs you to follow this process to tune your PID:

Set
```
kI
```
and
```
kD
```
to zero.
Increase
```
kP
```
from zero until the output oscillates (i.e. the servo goes back and forth or up and down). Then set the value to half.
Increase
```
kI
```
until offsets are corrected quickly, knowing that too high of a value will cause instability.
Increase
```
kD
```
until the output settles on the desired output reference quickly after a load disturbance (i.e. if you move your face somewhere really fast). Too much
```
kD
```
will cause excessive response and make your output overshoot where it needs to be.

I cannot stress this enough: Make small changes while tuning.

Let’s prepare to tune the values manually.

Even if you coded along through the previous sections, make sure you use the “Downloads” section of this tutorial to download the source code to this guide.

Transfer the zip to your Raspberry Pi using SCP or another method. Once on your Pi, unzip the files.

We will be tuning our PIDs independently, first by tuning the tilting process.

Go ahead and comment out the panning process in the driver script:

# start all 4 processes
		processObjectCenter.start()
		#processPanning.start()
		processTilting.start()
		processSetServos.start()

		# join all 4 processes
		processObjectCenter.join()
		#processPanning.join()
		processTilting.join()
		processSetServos.join()

From there, open up a terminal and execute the following command:

$ python pan_tilt_tracking.py --cascade haarcascade_frontalface_default.xml

You will need to follow the manual tuning guide above to tune the tilting process.

While doing so, you’ll need to:

Start the program and move your face up and down, causing the camera to tilt. I recommend doing squats at your knees and looking directly at the camera.
Stop the program + adjust values per the tuning guide.
Repeat until you’re satisfied with the result (and thus, the values). It should be tilting well with small displacements, and large changes in where your face is. Be sure to test both.

At this point, let’s switch to the other PID. The values will be similar, but it is necessary to tune them as well.

Go ahead and comment out the tilting process (which is fully tuned).

From there uncomment the panning process:

# start all 4 processes
		processObjectCenter.start()
		processPanning.start()
		#processTilting.start()
		processSetServos.start()

		# join all 4 processes
		processObjectCenter.join()
		processPanning.join()
		#processTilting.join()
		processSetServos.join()

And once again, execute the following command:

$ python pan_tilt_tracking.py --cascade haarcascade_frontalface_default.xml

Now follow the steps above again to tune the panning process.

Pan/tilt tracking with a Raspberry Pi and OpenCV

With our freshly tuned PID constants, let’s put our pan and tilt camera to the test.

Assuming you followed the section above, ensure that both processes (panning and tilting) are uncommented and ready to go.

From there, open up a terminal and execute the following command:

$ python pan_tilt_tracking.py --cascade haarcascade_frontalface_default.xml

Once the script is up and running you can walk in front of your camera.

If all goes well you should see your face being detected and tracked, similar to the GIF below:

Figure 4: Raspberry Pi pan tilt face tracking in action.

As you can see, the pan/tilt camera tracks my face well.

Improvements for pan/tilt tracking with the Raspberry Pi

There are times when the camera will encounter a false positive face causing the control loop to go haywire. Don’t be fooled! Your PID is working just fine, but your computer vision environment is impacting the system with false information.

We chose Haar because it is fast, however just remember Haar can lead to false positives:

Haar isn’t as accurate as HOG. HOG is great but is resource hungry compared to Haar.
Haar is far from accurate compared to a Deep Learning face detection method. The DL method is too slow to run on the Pi and real-time. If you tried to use it panning and tilting would be pretty jerky.

My recommendation is that you set up your pan/tilt camera in a new environment and see if that improves the results. For example, we were testing the face tracking, we found that it didn’t work well in a kitchen due to reflections off the floor, refrigerator, etc. However, when we aimed the camera out the window and I stood outside, the tracking improved drastically because

ObjCenter

was providing legitimate values for the face and thus our PID could do its job.

What if there are two faces in the frame?

Or what if I’m the only face in the frame, but consistently there is a false positive?

This is a great question. In general, you’d want to track only one face, so there are a number of options:

Use the confidence value and take the face with the highest confidence. This is not possible using the default Haar detector code as it doesn’t report confidence values. Instead, let’s explore other options.
Try to get the
rejectLevels
and
rejectWeights
. I’ve never tried this, but the following links may help:
- OpenCV Answers thread
- StackOverflow thread
Grab the largest bounding box — easy and simple.
Select the face closest to the center of the frame. Since the camera tries to keep the face closest to the center, we could compute the Euclidean distance between all centroid bounding boxes and the center (x, y)-coordinates of the frame. The bounding box closest to the centroid would be selected.

Interested in building more projects with the Raspberry Pi, OpenCV, and computer vision?

Are you interested in using your Raspberry Pi to build practical, real-world computer vision and deep learning applications, including:

Computer vision and IoT projects on the Pi
Servos, PID, and controlling the Pi with computer vision
Human activity, home surveillance, and facial applications
Deep learning on the Raspberry Pi
Fast, efficient deep learning with the Movidius NCS and OpenVINO toolkit
Self-driving car applications on the Raspberry Pi
Tips, suggestions, and best practices when performing computer vision and deep learning with the Raspberry Pi

Keep me in the loop!

From there I’ll ensure you’re kept in the know on the RPi + Computer Vision book, including updates, behind the scenes looks, and release date information.

Summary

In this tutorial, you learned how to perform pan and tilt tracking using a Raspberry Pi, OpenCV, and Python.

To accomplish this task, we first required a pan and tilt camera.

From there we implemented our PID used in our feedback control loop.

Once we had our PID controller we were able to implement the face detector itself.

The face detector had one goal — to detect the face in the input image and then return the center (x, y)-coordinates of the face bounding box, enabling us to pass these coordinates into our pan and tilt system.

From there the servos would center the camera on the object itself.

I hope you enjoyed today’s tutorial!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Pan/tilt face tracking with a Raspberry Pi and OpenCV appeared first on PyImageSearch.

Inside this tutorial, you will learn how to utilize the OpenVINO toolkit with OpenCV for faster deep learning inference on the Raspberry Pi.

Raspberry Pis are great — I love the quality hardware and the supportive community built around the device.

That said, for deep learning, the current Raspberry Pi hardware is inherently resource-constrained and you’ll be lucky to get more than a few FPS (using the RPi CPU alone) out of most state-of-the-art models (especially object detection and instance/semantic segmentation).

We know from my previous posts that Intel’s Movidius Neural Compute Stick allows for faster inference with the deep learning coprocessor that you plug into the USB socket:

Since 2017, the Movidius team has been hard at work on their Myriad processors and their consumer grade USB deep learning sticks.

The first version of the API that came with the sticks worked well and demonstrated the power of the Myriad, but left a lot to be desired.

Then, the Movidius APIv2 was released and welcomed by the Movidius + Raspberry Pi community. It was easier/more reliable than the APIv1 but had its fair share of issues as well.

But now, it’s become easier than ever to work with the Movidius NCS, especially with OpenCV.

Meet OpenVINO, an Intel library for hardware optimized computer vision designed to replace the V1 and V2 APIs.

Intel’s shift to support the Movidius hardware with OpenVINO software makes the Movidius shine in all of its metallic blue glory.

OpenVINO is extremely simple to use — just set the target processor (a single function call) and let OpenVINO-optimized OpenCV handle the rest.

But the question remains:

How can I install OpenVINO on the Raspberry Pi?

Today we’ll learn just that, along with a practical object detection demo (spoiler alert: it is dead simple to use the Movidius coprocessor now).

To learn how to install OpenVINO on the Raspberry Pi (and perform object detection with the Movidius Neural Compute Stick), just follow this tutorial!

Looking for the source code to this post?
Jump right to the downloads section.

OpenVINO, OpenCV, and Movidius NCS on the Raspberry Pi

In this blog post we’re going to cover three main topics.

First, we’ll learn what OpenVINO is and how it is a very welcome paradigm shift for the Raspberry Pi.
We’ll then cover how to install OpenCV and OpenVINO on your Raspberry Pi.
Finally, we’ll develop a real-time object detection script using OpenVINO, OpenCV, and the Movidius NCS.

Note: There are many Raspberry Pi install guides on my blog, most unrelated to Movidius. Before you begin, be sure to check out the available install tutorials on my OpenCV installation guides page and choose the one that best fits your needs.

Let’s get started.

What is OpenVINO?

Figure 1: The Intel OpenVINO toolkit optimizes your computer vision apps for Intel hardware such as the Movidius Neural Compute Stick. Real-time object detection with OpenVINO and OpenCV using Raspberry Pi and Movidius NCS sees a significant speedup. (source)

Intel’s OpenVINO is an acceleration library for optimized computing with Intel’s hardware portfolio.

OpenVINO supports Intel CPUs, GPUs, FPGAs, and VPUs.

Deep learning libraries you’ve come to rely upon such as TensorFlow, Caffe, and mxnet are supported by OpenVINO.

Figure 2: The Intel OpenVINO Toolkit supports intel CPUs, GPUs, FPGAs, and VPUs. TensorFlow, Caffe, mxnet, and OpenCV’s DNN module all are optimized and accelerated for Intel hardware. The Movidius line of vision processing units (VPUs) are supported by OpenVINO and pair well with the Raspberry Pi. (source: OpenVINO Product Brief)

Intel has even optimized OpenCV’s DNN module to support its hardware for deep learning.

In fact, many newer smart cameras use Intel’s hardware along with the OpenVINO toolkit. OpenVINO is edge computing and IoT at its finest — it enables resource-constrained devices like the Raspberry Pi to work with the Movidius coprocessor to perform deep learning at speeds that are useful for real-world applications.

We’ll be installing OpenVINO on the Raspberry Pi so it can be used with the Movidius VPU (Vision Processing Unit) in the next section.

Be sure to read the OpenVINO product brief PDF for more information.

Installing OpenVINO’s optimized OpenCV on the Raspberry Pi

In this section, we’ll cover prerequisites and all steps required to install OpenCV and OpenVINO on your Raspberry Pi.

Be sure to read this entire section before you begin so that you are familiar with the steps required.

Let’s begin.

Hardware, assumptions, and prerequisites

In this tutorial, I am going to assume that you have the following hardware:

Raspberry Pi 3B+ (or Raspberry Pi 3B)
Movidius NCS 2 (or Movidius NCS 1)
PiCamera V2 (or USB webcam)
32GB microSD card with Raspbian Stretch freshly flashed (16GB would likely work as well)
HDMI screen + keyboard/mouse (at least for the initial WiFi configuration)
5V power supply (I recommend a 2.5A supply because the Movidius NCS is a power hog)

If you don’t have a microSD with a fresh burn of Raspbian Stretch, you may download it here. I recommend the full install:

Figure 3: The Raspbian Stretch operating system is required for OpenVINO and the Movidius on the Raspberry Pi.

From there, use Etcher (or a suitable alternative) to flash the card.

Once you’re ready, insert the microSD card into your Raspberry Pi and boot it up.

Enter your WiFi credentials and enable SSH, VNC, and the camera interface.

From here you will need one of the following:

Physical access to your Raspberry Pi so that you can open up a terminal and execute commands
Remote access via SSH or VNC

I’ll be doing the majority of this tutorial via SSH, but as long as you have access to a terminal, you can easily follow along.

Can’t SSH? If you see your Pi on your network, but can’t ssh to it, you may need to enable SSH. This can easily be done via the Raspberry Pi desktop preferences menu or using the

raspi-config

command.

After you’ve changed the setting and rebooted, you can test SSH directly on the Pi with the localhost address.

Open a terminal and type

ssh pi@127.0.0.1

to see if it is working. To SSH from another computer you’ll need the Pi’s IP address — you can determine the IP address by looking at your router’s clients page or by running

ifconfig

to determine the IP on/of the Pi itself.

Is your Raspberry Pi keyboard layout giving you problems? Change your keyboard layout by going to the Raspberry Pi desktop preferences menu. I use the standard US Keyboard layout, but you’ll want to select the one appropriate for you.

Step #0: Expand filesystem on your Raspberry Pi

To get the OpenVINO party started, fire up your Raspberry Pi and open an SSH connection (alternatively use the Raspbian desktop with a keyboard + mouse and launch a terminal).

If you’ve just flashed Raspbian Stretch, I always recommend that you first check to ensure your filesystem is using all available space on the microSD card.

To check your disk space usage execute the

df -h

command in your terminal and examine the output:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        30G  4.2G   24G  15% /
devtmpfs        434M     0  434M   0% /dev
tmpfs           438M     0  438M   0% /dev/shm
tmpfs           438M   12M  427M   3% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           438M     0  438M   0% /sys/fs/cgroup
/dev/mmcblk0p1   42M   21M   21M  51% /boot
tmpfs            88M     0   88M   0% /run/user/1000

As you can see, my Raspbian filesystem has been automatically expanded to include all 32GB of the micro-SD card. This is denoted by the fact that the size is 30GB (nearly 32GB) and I have 24GB available (15% usage).

If you’re seeing that you aren’t using your entire memory card capacity, below you can find instructions on how to expand the filesystem.

Open up the Raspberry Pi configuration in your terminal:

$ sudo raspi-config

And then select the “Advanced Options” menu item:

Figure 4: Selecting the “Advanced Options” from the raspi-config menu to expand the Raspbian file system on your Raspberry Pi is important before installing OpenVINO and OpenCV. Next, we’ll actually expand the filesystem.

Followed by selecting “Expand filesystem”:

Figure 5: The Raspberry Pi “Expand Filesystem” menu allows us to take advantage of our entire flash memory card. This will give us the space necessary to install OpenVINO, OpenCV, and other packages.

Once prompted, you should select the first option, “A1. Expand File System”, hit Enter on your keyboard, arrow down to the “<Finish>” button, and then reboot your Pi — you will be prompted to reboot. Alternatively, you can reboot from the terminal:

$ sudo reboot

Be sure to run the

df -h

command again to check that your file system is expanded.

Step #1: Reclaim space on your Raspberry Pi

One simple way to gain more space on your Raspberry Pi is to delete both LibreOffice and Wolfram engine to free up some space on your Pi:

$ sudo apt-get purge wolfram-engine
$ sudo apt-get purge libreoffice*
$ sudo apt-get clean
$ sudo apt-get autoremove

After removing the Wolfram Engine and LibreOffice, you can reclaim almost 1GB!

Step #3: Install OpenVINO + OpenCV dependencies on your Raspberry Pi

This step shows some dependencies which I install on every OpenCV system. While you’ll soon see that OpenVINO is already compiled, I recommend that you go ahead and install these packages anyway in case you end up compiling OpenCV from scratch at any time going forward.

Let’s update our system:

$ sudo apt-get update && sudo apt-get upgrade

And then install developer tools including CMake:

$ sudo apt-get install build-essential cmake unzip pkg-config

Next, it is time to install a selection of image and video libraries — these are key to being able to work with image and video files:

$ sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
$ sudo apt-get install libxvidcore-dev libx264-dev

From there, let’s install GTK, our GUI backend:

$ sudo apt-get install libgtk-3-dev

And now let’s install a package which may help to reduce GTK warnings:

$ sudo apt-get install libcanberra-gtk*

The asterisk ensures we will grab the ARM-specific GTK. It is required.

Now we need two packages which contain numerical optimizations for OpenCV:

$ sudo apt-get install libatlas-base-dev gfortran

And finally, let’s install the Python 3 development headers:

$ sudo apt-get install python3-dev

Once you have all of these prerequisites installed you can move on to the next step.

Step #4: Download and unpack OpenVINO for your Raspberry Pi

Figure 6: Download and install the OpenVINO toolkit for Raspberry Pi and Movidius computer vision apps (source: Intel’s OpenVINO Product Brief).

From here forward, our install instructions are largely based upon Intel’s Raspberry Pi OpenVINO guide. There are a few “gotchas” which is why I decided to write a guide. We’ll also use virtual environments as PyImageSearch readers have come to expect.

Our next step is to download OpenVINO.

Let’s navigate to our home folder and create a new directory

$ cd ~
$ mkdir openvino
$ cd openvino

From there, go ahead and grab the OpenVINO toolkit for the Raspberry Pi download. You may try wget as I have, just beware of the problem noted in the subsequent codeblock:

$ wget http://download.01.org/openvinotoolkit/2018_R5/packages/l_openvino_toolkit_ie_p_2018.5.445.tgz

At this point, through trial and error, I found that

wget

actually only grabbed an HTML file which seems to be a really strange server error at Intel’s download site.

Ensure that you actually have a tar file using this command:

$ file l_openvino_toolkit_ie_p_2018.5.445.tgz

# bad output
l_openvino_toolkit_ie_p_2018.5.445.tgz: HTML document text, UTF-8 Unicode text, with very long lines

# good output
l_openvino_toolkit_ie_p_2018.5.445.tgz: gzip compressed data, was "l_openvino_toolkit_ie_p_2018.5.445.tar", last modified: Wed Dec 19 12:49:53 2018, max compression, from FAT filesystem (MS-DOS, OS/2, NT)

If the output matches the highlighted “good output”, then you can safely proceed to extract the archive. Otherwise, remove the file and try again.

Once you have successfully downloaded the OpenVINO toolkit, you can unarchive it using the following command:

$ tar -xf l_openvino_toolkit_ie_p_2018.5.445.tgz

The result of untarring the archive is a folder called

inference_engine_vpu_arm

Step #5: Configure OpenVINO on your Raspberry Pi

Let’s modify the

setupvars.sh

script with the absolute path to our OpenVINO directory.

To do so, we’re going to use the nano terminal text file editor:

$ nano openvino/inference_engine_vpu_arm/bin/setupvars.sh

The file will look like this:

Figure 5: Intel OpenVINO setupvars.sh file requires that you insert the path to the OpenVINO installation directory on the Raspberry Pi.

You need to replace

<INSTALLDIR>

with the following:

/home/pi/openvino/inference_engine_vpu_arm

It should now look just like so:

Figure 8: The installation directory has been updated for OpenVINO’s setupvars.sh on the Raspberry Pi.

To save the file press “ctrl + o, enter” followed by “ctrl + x“ to exit.

From there, let’s use

nano

again to edit our

~/.bashrc

. We will add a line to load OpenVINO’s

setupvars.sh

each time you invoke a Pi terminal. Go ahead and open the file:

$ nano ~/.bashrc

Scroll to the bottom and add the following lines:

# OpenVINO
source ~/openvino/inference_engine_vpu_arm/bin/setupvars.sh

Now save and exit from nano as we did previously.

Then, go ahead and

source

your

~/.bashrc

file:

$ source ~/.bashrc

Step #6: Configure USB rules for your Movidius NCS and OpenVINO on Raspberry Pi

OpenVINO requires that we set custom USB rules. It is quite straightforward, so let’s get started.

First, enter the following command to add the current user to the Raspbian “users” group:

$ sudo usermod -a -G users "$(whoami)"

Then logout and log back in. If you’re on SSH, you can type

exit

and then re-establish your SSH connection. Rebooting is also an option via

sudo reboot now

Once you’re back at your terminal, run the following script to set the USB rules:

$ cd ~
$ sh openvino/inference_engine_vpu_arm/install_dependencies/install_NCS_udev_rules.sh

Step #7: Create an OpenVINO virtual environment on Raspberry Pi

Let’s grab and install pip, a Python Package Manager.

To install pip, simply enter the following in your terminal:

$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python3 get-pip.py

We’ll be making use of virtual environments for Python development with OpenCV and OpenVINO.

If you aren’t familiar with virtual environments, please take a moment look at this article on RealPython or read the first half of this blog post on PyImageSearch.

Virtual environments will allow you to run independent, sequestered Python environments in isolation on your system. Today we’ll be setting up just one environment, but you could easily have an environment for each project.

Let’s go ahead and install

virtualenv

and

virtualenvwrapper

now — they allow for Python virtual environments:

$ sudo pip install virtualenv virtualenvwrapper
$ sudo rm -rf ~/get-pip.py ~/.cache/pip

To finish the install of these tools, we need to update our

~/.bashrc

again:

$ nano ~/.bashrc

Then add the following lines:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

Figure 9: Our Raspberry Pi ~/.bashrc profile has been updated to accommodate OpenVINO and virtualenvwrapper. Now we’ll be able to create a virtual environment for Python packages.

Alternatively, you can append the lines directly via bash commands:

$ echo -e "\n# virtualenv and virtualenvwrapper" >> ~/.bashrc
$ echo "export WORKON_HOME=$HOME/.virtualenvs" >> ~/.bashrc
$ echo "export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3" >> ~/.bashrc
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bashrc

Next, source the

~/.bashrc

profile:

$ source ~/.bashrc

Let’s now create a virtual environment to hold OpenVINO, OpenCV and related packages:

$ mkvirtualenv openvino -p python3

This command simply creates a Python 3 virtual environment named

openvino

You can (and should) name your environment(s) whatever you’d like — I like to keep them short and sweet while also providing enough information so I’ll remember what they are for.

Let’s verify that we’re “in” the

openvino

environment by taking a look at the bash prompt. It should show

(openvino)

at the beginning of the prompt as shown in the image:

If your virtual environment is not active, you can simply use the

workon

command.:

$ workon openvino

Figure 10: The workon openvino command activates our OpenVINO Python 3 virtual environment. We’re now ready to install Python packages and run computer vision code with Movidius and the Raspberry Pi.

Step #8: Install packages into your OpenVINO environment

Let’s install a handful of packages required for today’s demo script

$ workon openvino
$ pip install numpy
$ pip install "picamera[array]"
$ pip install imutils

Now that we’ve installed these packages in the

openvino

virtual environment, they are only available in the

openvino

environment. This is your sequestered area to work on OpenVINO projects (we use Python virtual environments here so we don’t risk ruining your system install of Python).

Additional packages for Caffe, TensorFlow, and mxnet may be installed via requirements.txt files using pip. You can read more about it at this Intel documentation link. This is not required for today’s tutorial.

Step #6: Link OpenVINO’s OpenCV into your Python 3 virtual environment

OpenCV is ready to go outside our virtual environment. But that’s bad practice to use the system environment. Let’s instead link the OpenVINO version of OpenCV into our Python virtual environment so we have it at our fingertips for today’s demo (and whatever future projects you dream up).

Here we are going to create a “symbolic link”. A symbolic link creates a special linkage between two places on your system (in our case it is a

.so

file — think of a sym-link as a “shortcut” that points to another file.

When running the command you’ll notice that we navigate into the destination of the link, and create our sym-link back to where the file actually lives.

I had a hard time finding OpenVINO’s OpenCV

.so

file, so I used the

find

command:

$ find / -name "cv2*.so"
...
/home/pi/openvino/inference_engine_vpu_arm/python/python3.5/cv2.cpython-35m-arm-linux-gnueabihf.so

I had to scroll through a bunch of output to find the OpenCV binary filepath. Thus, I’ve omitted the unneeded output above.

Ensure that you copy the Python 3.5 path and not the Python 2.7 one since we’re using Python 3.

From there, with the path in our clipboard, let’s create our sym-link into the

openvino

virtual environment

site-packages

$ cd ~/.virtualenvs/openvino/lib/python3.5/site-packages/
$ ln -s /home/pi/openvino/inference_engine_vpu_arm/python/python3.5/cv2.cpython-35m-arm-linux-gnueabihf.so cv2.so
$ cd ~

Take care to notice that the 2nd line wraps as it is especially long. I cannot stress this step enough — this step is critical. If you don’t create a symbolic link, you won’t be able to import OpenCV in your OpenVINO Python scripts. Also, ensure that the paths and filenames in the above commands are correct for your Raspberry Pi. I suggest tab-completion.

Step #7: Test your OpenVINO install on your Raspberry Pi

Let’s do a quick sanity test to see if OpenCV is ready to go before we try an OpenVINO example.

Open a terminal and perform the following:

$ workon openvino
$ python
>>> import cv2
>>> cv2.__version__
'4.0.1-openvino'
>>> exit()

The first command activates our OpenVINO virtual environment. From there we fire up the Python 3 binary in the environment and import OpenCV.

The version of OpenCV indicates that it is an OpenVINO optimized install!

Real-time object detection with Raspberry Pi and OpenVINO

Installing OpenVINO was pretty easy and didn’t even require a compile of OpenCV. The Intel team did a great job!

Now let’s put the Movidius Neural Compute Stick to work using OpenVINO.

For comparison’s sake, we’ll run the MobileNet SSD object detector with and without the Movidius to benchmark our FPS. We’ll compare the values to previous results of using Movidius NCS APIv1 (the non-OpenVINO method that I wrote about in early 2018).

Let’s get started!

Project structure

Go ahead and grab the “Downloads” for today’s blog post.

Once you’ve extracted the zip, you can use the

tree

command to inspect the project directory:

$ tree
.
├── MobileNetSSD_deploy.caffemodel
├── MobileNetSSD_deploy.prototxt
├── openvino_real_time_object_detection.py
└── real_time_object_detection.py

0 directories, 3 files

Our MobileNet SSD object detector files include the .caffemodel and .prototxt.txt files. These are pretrained (we will not be training MobileNet SSD today).

We’re going to review the

openvino_real_time_object_detection.py

script and compare it to the original real-time object detection script (

real_time_object_detection.py

Real-time object detection with OpenVINO, Movidius NCS, and Raspberry Pi

To demonstrate the power of OpenVINO on the Raspberry Pi with Movidius, we’re going to perform real-time deep learning object detection.

The Movidius/Myriad coprocessor will perform the actual deep learning inference, reducing the load on the Pi’s CPU.

We’ll still use the Raspberry Pi CPU to process the results and tell the Movidius what to do, but we’re reserving deep learning inference for the Myriad as its hardware is optimized and designed for deep learning inference.

As previously discussed in the “What is OpenVINO?” section, OpenVINO with OpenCV allows us to specify the processor for inference when using the OpenCV “DNN” module.

In fact, it only requires one line of code (typically) to use the Movidius NCS Myriad processor.

From there, the rest of the code is the same!

On the PyImageSearch blog I provide a detailed walkthrough of all Python scripts.

This is one of the few posts where I’ve decided to deviate from my typical format.

This post is first and foremost an install + configuration post. Therefore I’m going to skip over the details and instead demonstrate the power of OpenVINO by highlighting new lines of code inserted into a previous blog post (where all details are provided).

Please review that post if you want to get into the weeds with Real-time object detection with deep learning and OpenCV where I demonstrated the concept of using OpenCV’s DNN module in just 100 lines of code.

Today, we’re adding just one line of code that performs computation (and a comment + blank line). This brings the new total to 103 lines of code without using the previous complex Movidius APIv1 (215 lines of code).

If this is your first foray into OpenVINO, I think you’ll be just as astounded and pleased as I was when I learned how easy it is.

Let’s learn the changes necessary to accommodate OpenVINO’s API with OpenCV and Movidius.

Go ahead and open a file named

openvino_real_time_object_detection.py

and insert the following lines, paying close attention to Lines 33-35 (highlighted in yellow):

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
	help="minimum probability to filter weak detections")
ap.add_argument("-u", "--movidius", type=bool, default=0,
	help="boolean indicating if the Movidius should be used")
args = vars(ap.parse_args())

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# specify the target device as the Myriad processor on the NCS
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

# initialize the video stream, allow the cammera sensor to warmup,
# and initialize the FPS counter
print("[INFO] starting video stream...")
vs = VideoStream(usePiCamera=True).start()
time.sleep(2.0)
fps = FPS().start()

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 400 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=400)

	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

	# loop over the detections
	for i in np.arange(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with
		# the prediction
		confidence = detections[0, 0, i, 2]

		# filter out weak detections by ensuring the `confidence` is
		# greater than the minimum confidence
		if confidence > args["confidence"]:
			# extract the index of the class label from the
			# `detections`, then compute the (x, y)-coordinates of
			# the bounding box for the object
			idx = int(detections[0, 0, i, 1])
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# draw the prediction on the frame
			label = "{}: {:.2f}%".format(CLASSES[idx],
				confidence * 100)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(frame, label, (startX, y),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# update the FPS counter
	fps.update()

# stop the timer and display FPS information
fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Lines 33-35 (highlighted in yellow) are new. But only one of those lines is interesting.

On Line 35, we tell OpenCV’s DNN module to use the Myriad coprocessor using

net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

The Myriad processor is built into the Movidius Neural Compute Stick. You can use this same method if you’re running OpenVINO + OpenCV on a device with an embedded Myriad chip (i.e. without the bulky USB stick).

For a detailed explanation on the code, be sure to refer to this post.

Also, be sure to refer to this Movidius APIv1 blog post from early 2018 where I demonstrated object detection using Movidius and the Raspberry Pi. It’s incredible that 215 lines of significantly more complicated code are required for the previous Movidius API, in comparison to 103 lines of much easier to follow code using OpenVINO.

I think those line number differences speak for themselves in terms of reduced complexity, time, and development cost savings, but what are the actual results? How fast is OpenVINO with Movidius?

Let’s find out in the next section.

OpenVINO object detection results

FIgure 11: Object detection with OpenVINO, OpenCV, and the Raspberry Pi.

To run today’s script, first, you’ll need to grab the “Downloads” associated with this post.

From there, unpack the zip and navigate into the directory.

To perform object detection with OpenVINO, just execute the following command:

$ python openvino_real_time_object_detection.py
	--prototxt MobileNetSSD_deploy.prototxt \
	--model MobileNetSSD_deploy.caffemodel
[INFO] loading model...
[INFO] starting video stream...
[INFO] elasped time: 55.35
[INFO] approx. FPS: 8.31

As you can see, we’re reaching 8.31FPS over approximately one minute.

I’ve gathered additional results using MobileNet SSD as shown in the table below:

Figure 12: A benchmark comparison of the MobileNet SSD deep learning object detector using OpenVINO with the Movidius Neural Compute Stick.

OpenVINO and the Movidius NCS 2 are very fast, a huge speedup from previous versions.

It’s amazing that the results are > 8x in comparison to using only the RPi 3B+ CPU (no Movidius coprocessor).

The two rightmost columns (light blue columns 3 and 4) show the OpenVINO comparison between the NCS1 and the NCS2.

Note that the 2nd column statistic is with the RPi 3B (not the 3B+). It was taken in February 2018 using the previous API and previous RPi hardware.

So, what’s next?

Figure 13: The Raspberry Pi for Computer Vision book Kickstarter begins on Wednesday, April 10, 2019 at 10am EDT.

I’m currently getting code and materials together to start writing a new Raspberry Pi for Computer Vision book.

The book will cover everything needed to maximize computer vision + deep learning capability on resource-constrained devices such as the Raspberry Pi single board computer (SBC).

You’ll learn and develop your skills using techniques that I’ve amassed through my years of working with computer vision on the Raspberry Pi and other devices.

The book will come with over 40 chapters with tons of working code.

Included are preconfigured Raspbian .img files (for the Raspberry Pi 3B+/3B and Raspberry Pi Zero W) so you can skip the tedious installation headaches and get to the fun part (code and deployment).

Sound interesting?

You can read the official book announcement here.
I’ve also put together a sneak preview video so that you know just what to expect in my upcoming book. You don’t want to miss this video!
My tentative chapter listing/table of contents is available here as well.
Finally, be sure to mark your calendar for Wednesday at 10AM EDT when the Kickstarter goes live!

So what do you say?

Are you interested in learning how to use the Raspberry Pi for computer vision and deep learning?

If so, be sure to click the button below and enter your email address to receive book updates to your inbox:

Keep me in the loop!

Troubleshooting and Frequently Asked Questions (FAQ)

Did you encounter an error installing OpenCV and OpenVINO on your Raspberry Pi?

Don’t throw the blue USB stick into the toilet just yet.

The first time you install the software on your Raspberry Pi it can be very frustrating. The last thing I want for you to do is give up!

Here are some common question and answers — be sure to read them and see if they apply to you.

Q. How do I flash an operating system on to my Raspberry Pi memory card?

A. I recommend that you:

Grab a 16GB or 32GB memory card.
Flash Raspbian Stretch with Etcher to the card. Etcher is supported by most major operating systems.
Insert the card into your Raspberry Pi and begin with the “Assumptions” and “Step 1” sections in this blog post.

Q. Can I use Python 2.7?

A. I don’t recommend using Python 2.7 as it’s rapidly approaching its end of life. Python 3 is the standard now. I also haven’t tested OpenVINO with Python 2.7. But if you insist…

Here’s how to get up and running with Python 2.7:

$ sudo apt-get install python2.7 python2.7-dev

Then, before you create your virtual environment in Step #4, first install pip for Python 2.7:

$ sudo python2.7 get-pip.py

Also in Step #4: when you create your virtual environment, simply use the relevant Python version flag:

$ mkvirtualenv openvino_py27 -p python2.7

From there everything should be the same.

Q. Why can’t I just apt-get install OpenCV and have OpenVINO support?

A. Avoid this “solution” at all costs even though it might work. First, this method likely won’t install OpenVINO until it is more popular. Secondly, apt-get doesn’t play nice with virtual environments and you won’t have control over your compile and build.

Q. The

mkvirtualenv

and

workon

commands yield a “command not found error”. I’m not sure what to do next.

A. There a number of reasons why you would be seeing this error message, all of come from to Step #4:

First, ensure you have installed
```
virtualenv
```
and
```
virtualenvwrapper
```
properly using the
```
pip
```
package manager. Verify by running
```
pip freeze
```
and ensure that you see both
```
virtualenv
```
and
```
virtualenvwrapper
```
are in the list of installed packages.
Your
```
~/.bashrc
```
file may have mistakes. Examine the contents of your
```
~/.bashrc
```
file to see the proper
```
export
```
and
```
source
```
commands are present (check Step #4 for the commands that should be appended to
```
~/.bashrc
```
).
You might have forgotten to
```
source
```
your
```
~/.bashrc
```
. Make sure you run
```
source ~/.bashrc
```
after editing it to ensure you have access to the
```
mkvirtualenv
```
and
```
workon
```
commands.

Q. When I open a new terminal, logout, or reboot my Raspberry Pi, I cannot execute the

mkvirtualenv

workon

commands.

A. If you’re on the Raspbian desktop, this will likely occur. The default profile that is loaded when you launch a terminal, for some reason, doesn’t source the

~/.bashrc

file. Please refer to #2 from the previous question. Over SSH, you probably won’t run into this.

Q. When I try to import OpenCV, I encounter this message:

Import Error: No module named cv2

A. There are several reasons this could be happening and unfortunately, it is hard to diagnose. I recommend the following suggestions to help diagnose and resolve the error:

Ensure your
```
openvino
```
virtual environment is active by using the
```
workon openvino
```
command. If this command gives you an error, then verify that
```
virtualenv
```
and
```
virtualenvwrapper
```
are properly installed.
Try investigating the contents of the
```
site-packages
```
directory in your
```
openvino
```
virtual environment. You can find the
```
site-packages
```
directory in
```
~/.virtualenvs/openvino/lib/python3.5/site-packages/
```
. Ensure (1) there is a
```
cv2
```
sym-link directory in the
```
site-packages
```
directory and (2) it’s properly sym-linked.
Be sure to
```
find
```
the
```
cv2*.so
```
file as demonstrated in Step #6.

Q. What if my question isn’t listed here?

A. Please leave a comment below or send me an email. If you post a comment below, just be aware that code doesn’t format well in the comment form and I may have to respond to you via email instead.

Summary

Today we learned about Intel’s OpenVINO toolkit and how it can be used to improve deep learning inference speed on the Raspberry Pi.

You also learned how to install the OpenVINO toolkit, including the OpenVINO-optimized version of OpenCV on the Raspberry Pi.

We then ran a simple MobileNet SSD deep learning object detection model. It only required one line of code to set the target device to the Myriad processor on the Movidius stick.

We also demonstrated that the Movidius NCS + OpenVINO is quite fast, dramatically outperforming object detection speed on the Raspberry Pi’s CPU.

And if you’re interested in learning more about how to build real-world computer vision + deep learning projects on the Raspberry Pi, be sure to check out my upcoming book, Raspberry Pi for Computer Vision. I’ll be launching a Kickstarter pre-sale which begins just two days from now on Wednesday, April 10th at 10AM EDT.

Mark your calendar to take advantage of pre-sale bargain prices on the RPi book — see you then!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just drop your email in the form below!

Downloads:

The post OpenVINO, OpenCV, and Movidius NCS on the Raspberry Pi appeared first on PyImageSearch.

In today’s tutorial, you’ll learn how to stream live video over a network with OpenCV. Specifically, you’ll learn how to implement Python + OpenCV scripts to capture and stream video frames from a camera to a server.

Every week or so I receive a comment on a blog post or a question over email that goes something like this:

Hi Adrian, I’m working on a project where I need to stream frames from a client camera to a server for processing using OpenCV. Should I use an IP camera? Would a Raspberry Pi work? What about RTSP streaming? Have you tried using FFMPEG or GStreamer? How do you suggest I approach the problem?

It’s a great question — and if you’ve ever attempted live video streaming with OpenCV then you know there are a ton of different options.

You could go with the IP camera route. But IP cameras can be a pain to work with. Some IP cameras don’t even allow you to access the RTSP (Real-time Streaming Protocol) stream. Other IP cameras simply don’t work with OpenCV’s

cv2.VideoCapture

function. An IP camera may be too expensive for your budget as well.

In those cases, you are left with using a standard webcam — the question then becomes, how do you stream the frames from that webcam using OpenCV?

Using FFMPEG or GStreamer is definitely an option. But both of those can be a royal pain to work with.

Today I am going to show you my preferred solution using message passing libraries, specifically ZMQ and ImageZMQ, the latter of which was developed by PyImageConf 2018 speaker, Jeff Bass. Jeff has put a ton of work into ImageZMQ and his efforts really shows.

As you’ll see, this method of OpenCV video streaming is not only reliable but incredibly easy to use, requiring only a few lines of code.

To learn how to perform live network video streaming with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Live video streaming over network with OpenCV and ImageZMQ

In the first part of this tutorial, we’ll discuss why, and under which situations, we may choose to stream video with OpenCV over a network.

From there we’ll briefly discuss message passing along with ZMQ, a library for high performance asynchronous messaging for distributed systems.

We’ll then implement two Python scripts:

A client that will capture frames from a simple webcam
And a server that will take the input frames and run object detection on them

Will be using Raspberry Pis as our clients to demonstrate how cheaper hardware can be used to build a distributed network of cameras capable of piping frames to a more powerful machine for additional processing.

By the end of this tutorial, you’ll be able to apply live video streaming with OpenCV to your own applications!

Why stream videos/frames over a network?

Figure 1: A great application of video streaming with OpenCV is a security camera system. You could use Raspberry Pis and a library called ImageZMQ to stream from the Pi (client) to the server.

There are a number of reasons why you may want to stream frames from a video stream over a network with OpenCV.

To start, you could be building a security application that requires all frames to be sent to a central hub for additional processing and logging.

Or, your client machine may be highly resource constrained (such as a Raspberry Pi) and lack the necessary computational horsepower required to run computationally expensive algorithms (such as deep neural networks, for example).

In these cases, you need a method to take input frames captured from a webcam with OpenCV and then pipe them over the network to another system.

There are a variety of methods to accomplish this task (discussed in the introduction of the post), but today we are going to specifically focus on message passing.

What is message passing?

Figure 2: The concept of sending a message from a process, through a message broker, to other processes. With this method/concept, we can stream video over a network using OpenCV and ZMQ with a library called ImageZMQ.

Message passing is a programming paradigm/concept typically used in multiprocessing, distributed, and/or concurrent applications.

Using message passing, one process can communicate with one or more other processes, typically using a message broker.

Whenever a process wants to communicate with another process, including all other processes, it must first send its request to the message broker.

The message broker receives the request and then handles sending the message to the other process(es).

If necessary, the message broker also sends a response to the originating process.

As an example of message passing let’s consider a tremendous life event, such as a mother giving birth to a newborn child (process communication depicted in Figure 2 above). Process A, the mother, wants to announce to all other processes (i.e., the family), that she had a baby. To do so, Process A constructs the message and sends it to the message broker.

The message broker then takes that message and broadcasts it to all processes.

All other processes then receive the message from the message broker.

These processes want to show their support and happiness to Process A, so they construct a message saying their congratulations:

Figure 3: Each process sends an acknowledgment (ACK) message back through the message broker to notify Process A that the message is received. The ImageZMQ video streaming project by Jeff Bass uses this approach.

These responses are sent to the message broker which in turn sends them back to Process A (Figure 3).

This example is a dramatic simplification of message passing and message broker systems but should help you understand the general algorithm and the type of communication the processes are performing.

You can very easily get into the weeds studying these topics, including various distributed programming paradigms and types of messages/communication (1:1 communication, 1:many, broadcasts, centralized, distributed, broker-less etc.).

As long as you understand the basic concept that message passing allows processes to communicate (including processes on different machines) then you will be able to follow along with the rest of this post.

What is ZMQ?

Figure 4: The ZMQ library serves as the backbone for message passing in the ImageZMQ library. ImageZMQ is used for video streaming with OpenCV. Jeff Bass designed it for his Raspberry Pi network at his farm.

ZeroMQ, or simply ZMQ for short, is a high-performance asynchronous message passing library used in distributed systems.

Both RabbitMQ and ZeroMQ are some of the most highly used message passing systems.

However, ZeroMQ specifically focuses on high throughput and low latency applications — which is exactly how you can frame live video streaming.

When building a system to stream live videos over a network using OpenCV, you would want a system that focuses on:

High throughput: There will be new frames from the video stream coming in quickly.
Low latency: As we’ll want the frames distributed to all nodes on the system as soon as they are captured from the camera.

ZeroMQ also has the benefit of being extremely easy to both install and use.

Jeff Bass, the creator of ImageZMQ (which builds on ZMQ), chose to use ZMQ as the message passing library for these reasons — and I couldn’t agree with him more.

The ImageZMQ library

Figure 5: The ImageZMQ library is designed for streaming video efficiently over a network. It is a Python package and integrates with OpenCV.

Jeff Bass is the owner of Yin Yang Ranch, a permaculture farm in Southern California. He was one of the first people to join PyImageSearch Gurus, my flagship computer vision course. In the course and community he has been an active participant in many discussions around the Raspberry Pi.

Jeff has found that Raspberry Pis are perfect for computer vision and other tasks on his farm. They are inexpensive, readily available, and astoundingly resilient/reliable.

At PyImageConf 2018 Jeff spoke about his farm and more specifically about how he used Raspberry Pis and a central computer to manage data collection and analysis.

The heart of his project is a library that he put together called ImageZMQ.

ImageZMQ solves the problem of real-time streaming from the Raspberry Pis on his farm. It is based on ZMQ and works really well with OpenCV.

Plain and simple, it just works. And it works really reliably.

I’ve found it to be more reliable than alternatives such as GStreamer or FFMPEG streams. I’ve also had better luck with it than using RTSP streams.

You can learn the details of ImageZMQ by studying Jeff’s code on GitHub.

Jeff’s slides from PyImageConf 2018 are also available here.

In a few days, I’ll be posting my interview with Jeff Bass on the blog as well.

Let’s configure our clients and server with ImageZMQ and put it them to work!

Configuring your system and installing required packages

Figure 6: To install ImageZMQ for video streaming, you’ll need Python, ZMQ, and OpenCV.

Installing ImageZMQ is quite easy.

First, let’s pip install a few packages into your Python virtual environment (assuming you’re using one):

$ workon <env_name> # my environment is named py3cv4
$ pip install opencv-contrib-python
$ pip install zmq
$ pip install imutils

From there, clone the

imagezmq

repo:

$ cd ~
$ git clone https://github.com/jeffbass/imagezmq.git

You may then (1) copy or (2) sym-link the source directory into your virtual environment site-packages.

Let’s go with the sym-link option:

$ cd ~/.virtualenvs/lib/python3.5/site-packages
$ ln -s ~/imagezmq/imagezmq imagezmq

As a third alternative to the two options discussed, you may place

imagezmq

into each project folder in which you plan to use it.

Preparing clients for ImageZMQ

ImageZMQ must be installed on each client and the central server.

In this section, we’ll cover one important difference for clients.

Our code is going to use the hostname of the client to identify it. You could use the IP address in a string for identification, but setting a client’s hostname allows you to more easily identify the purpose of the client.

In this example, we’ll assume you are using a Raspberry Pi running Raspbian. Of course, your client could run Windows Embedded, Ubuntu, macOS, etc., but since our demo uses Raspberry Pis, let’s learn how to change the hostname on the RPi.

To change the hostname on your Raspberry Pi, fire up a terminal (this could be over an SSH connection if you’d like).

Then run the

raspi-config

command:

$ sudo raspi-config

You’ll be presented with this terminal screen:

Figure 7: Configuring a Raspberry Pi hostname with raspi-config. Shown is the raspi-config home screen.

Navigate to “2 Network Options” and press enter.

Figure 8: Raspberry Pi raspi-config network settings page.

Then choose the option “N1 Hostname”.

Figure 9: Setting the Raspberry Pi hostname to something easily identifiable/memorable. Our video streaming with OpenCV and ImageZMQ script will use the hostname to identify Raspberry Pi clients.

You can now change your hostname and select “<Ok>”.

You will be prompted to reboot — a reboot required.

I recommend naming your Raspberry Pis like this:

pi-location

. Here are a few examples:

```
pi-garage
```
```
pi-frontporch
```
```
pi-livingroom
```
```
pi-driveway
```
…you get the idea.

This way when you pull up your router page on your network, you’ll know what the Pi is for and its corresponding IP address. On some networks, you could even connect via SSH without providing the IP address like this:

$ ssh pi@pi-frontporch

As you can see, it will likely save some time later.

Defining the client and server relationship

Figure 10: The client/server relationship for ImageZMQ video streaming with OpenCV.

Before we actually implement network video streaming with OpenCV, let’s first define the client/server relationship to ensure we’re on the same page and using the same terms:

Client: Responsible for capturing frames from a webcam using OpenCV and then sending the frames to the server.
Server: Accepts frames from all input clients.

You could argue back and forth as to which system is the client and which is the server.

For example, a system that is capturing frames via a webcam and then sending them elsewhere could be considered a server — the system is undoubtedly serving up frames.

Similarly, a system that accepts incoming data could very well be the client.

However, we are assuming:

There are at least one (and likely many more) systems responsible for capturing frames.
There is only a single system used for actually receiving and processing those frames.

For these reasons, I prefer to think of the system sending the frames as the client and the system receiving/processing the frames as the server.

You may disagree with me, but that is the client-server terminology we’ll be using throughout the remainder of this tutorial.

Project structure

Be sure to grab the “Downloads” for today’s project.

From there, unzip the files and navigate into the project directory.

You may use the

tree

command to inspect the structure of the project:

$ $ tree
.
├── MobileNetSSD_deploy.caffemodel
├── MobileNetSSD_deploy.prototxt
├── client.py
└── server.py

0 directories, 4 files

Note: If you’re going with the third alternative discussed above, then you would need to place the

imagezmq

source directory in the project as well.

The first two files listed in the project are the pre-trained Caffe MobileNet SSD object detection files. The server (

server.py

) will take advantage of these Caffe files using OpenCV’s DNN module to perform object detection.

The

client.py

script will reside on each device which is sending a stream to the server. Later on, we’ll upload

client.py

onto each of the Pis (or another machine) on your network so they can send video frames to the central location.

Implementing the client OpenCV video streamer (i.e., video sender)

Let’s start by implementing the client which will be responsible for:

Capturing frames from the camera (either USB or the RPi camera module)
Sending the frames over the network via ImageZMQ

Open up the

client.py

file and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
import imagezmq
import argparse
import socket
import time

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-s", "--server-ip", required=True,
	help="ip address of the server to which the client will connect")
args = vars(ap.parse_args())

# initialize the ImageSender object with the socket address of the
# server
sender = imagezmq.ImageSender(connect_to="tcp://{}:5555".format(
	args["server_ip"]))

We start off by importing packages and modules on Lines 2-6:

Pay close attention here to see that we’re importing
```
imagezmq
```
in our client-side script.
```
VideoStream
```
will be used to grab frames from our camera.
Our
```
argparse
```
import will be used to process a command line argument containing the server’s IP address (
```
--server-ip
```
is parsed on Lines 9-12).
The
```
socket
```
module of Python is simply used to grab the hostname of the Raspberry Pi.
Finally,
```
time
```
will be used to allow our camera to warm up prior to sending frames.

Lines 16 and 17 simply create the

imagezmq

sender

object and specify the IP address and port of the server. The IP address will come from the command line argument that we already established. I’ve found that port

doesn’t usually have conflicts, so it is hardcoded. You could easily turn it into a command line argument if you need to as well.

Let’s initialize our video stream and start sending frames to the server:

# get the host name, initialize the video stream, and allow the
# camera sensor to warmup
rpiName = socket.gethostname()
vs = VideoStream(usePiCamera=True).start()
#vs = VideoStream(src=0).start()
time.sleep(2.0)
 
while True:
	# read the frame from the camera and send it to the server
	frame = vs.read()
	sender.send_image(rpiName, frame)

Now, we’ll grab the hostname, storing the value as

rpiName

(Line 21). Refer to “Preparing clients for ImageZMQ” above to set your hostname on a Raspberry Pi.

From there, our

VideoStream

object is created to connect grab frames from our PiCamera. Alternatively, you can use any USB camera connected to the Pi by commenting Line 22 and uncommenting Line 23.

This is the point where you should also set your camera resolution. We are just going to use the maximum resolution so the argument is not provided. But if you find that there is a lag, you are likely sending too many pixels. If that is the case, you may reduce your resolution quite easily. Just pick from one of the resolutions available for the PiCamera V2 here: PiCamera ReadTheDocs. The second table is for V2.

Once you’ve chosen the resolution, edit Line 22 like this:

vs = VideoStream(usePiCamera=True, resolution=(320, 240)).start()

Note: The resolution argument won’t make a difference for USB cameras since they are all implemented differently. As an alternative, you can insert a

frame = imutils.resize(frame, width=320)

between Lines 28 and 29 to resize the
frame
manually.

From there, a warmup sleep time of

2.0

seconds is set (Line 24).

Finally, our

while

loop on Lines 26-29 grabs and sends the frames.

As you can see, the client is quite simple and straightforward!

Let’s move on to the actual server.

Implementing the OpenCV video server (i.e., video receiver)

The live video server will be responsible for:

Accepting incoming frames from multiple clients.
Applying object detection to each of the incoming frames.
Maintaining an “object count” for each of the frames (i.e., count the number of objects).

Let’s go ahead and implement up the server — open up the

server.py

file and insert the following code:

# import the necessary packages
from imutils import build_montages
from datetime import datetime
import numpy as np
import imagezmq
import argparse
import imutils
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
	help="minimum probability to filter weak detections")
ap.add_argument("-mW", "--montageW", required=True, type=int,
	help="montage frame width")
ap.add_argument("-mH", "--montageH", required=True, type=int,
	help="montage frame height")
args = vars(ap.parse_args())

On Lines 2-8 we import packages and libraries. In this script, most notably we’ll be using:

```
build_montages
```
: To build a montage of all incoming frames.
```
imagezmq
```
: For streaming video from clients. In our case, each client is a Raspberry Pi.
```
imutils
```
: My package of OpenCV and other image processing convenience functions available on GitHub and PyPi.
```
cv2
```
: OpenCV’s DNN module will be used for deep learning object detection inference.

Are you wondering where

imutils.video.VideoStream

is? iWe usually use my

VideoStream

class to read frames from a webcam. However, don’t forget that we’re using

imagezmq

for streaming frames from clients. The server doesn’t have a camera directly wired to it.

Let’s process five command line arguments with argparse:

```
--prototxt
```
: The path to our Caffe deep learning prototxt file.
```
--model
```
: The path to our pre-trained Caffe deep learning model. I’ve provided MobileNet SSD in the “Downloads” but with some minor changes, you could elect to use an alternative model.
```
--confidence
```
: Our confidence threshold to filter weak detections.
```
--montageW
```
: This is not width in pixels. Rather this is the number of columns for our montage. We’re going to stream from four raspberry Pis today, so you could do 2×2, 4×1, or 1×4. You could also do, for example, 3×3 for nine clients, but 5 of the boxes would be empty.
```
--montageH
```
: The number of rows for your montage. See the
```
--montageW
```
explanation.

Let’s initialize our

ImageHub

object along with our deep learning object detector:

# initialize the ImageHub object
imageHub = imagezmq.ImageHub()

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

Our server needs an

ImageHub

to accept connections from each of the Raspberry Pis. It essentially uses sockets and ZMQ for receiving frames across the network (and sending back acknowledgments).

Our MobileNet SSD object

CLASSES

are specified on Lines 29-32. If you aren’t familiar with the MobileNet Single Shot Detector, please refer to this blog post or Deep Learning for Computer Vision with Python.

From there we’ll instantiate our Caffe object detector on Line 36.

Initializations come next:

# initialize the consider set (class labels we care about and want
# to count), the object count dictionary, and the frame  dictionary
CONSIDER = set(["dog", "person", "car"])
objCount = {obj: 0 for obj in CONSIDER}
frameDict = {}

# initialize the dictionary which will contain  information regarding
# when a device was last active, then store the last time the check
# was made was now
lastActive = {}
lastActiveCheck = datetime.now()

# stores the estimated number of Pis, active checking period, and
# calculates the duration seconds to wait before making a check to
# see if a device was active
ESTIMATED_NUM_PIS = 4
ACTIVE_CHECK_PERIOD = 10
ACTIVE_CHECK_SECONDS = ESTIMATED_NUM_PIS * ACTIVE_CHECK_PERIOD

# assign montage width and height so we can view all incoming frames
# in a single "dashboard"
mW = args["montageW"]
mH = args["montageH"]
print("[INFO] detecting: {}...".format(", ".join(obj for obj in
	CONSIDER)))

In today’s example, I’m only going to

CONSIDER

three types of objects from the MobileNet SSD list of

CLASSES

. We’re considering (1) dogs, (2) persons, and (3) cars on Line 40.

We’ll soon use this

CONSIDER

set to filter out other classes that we don’t care about such as chairs, plants, monitors, or sofas which don’t typically move and aren’t interesting for this security type project.

Line 41 initializes a dictionary for our object counts to be tracked in each video feed. Each count is initialized to zero.

A separate dictionary,

frameDict

is initialized on Line 42. The

frameDict

dictionary will contain the hostname key and the associated latest frame value.

Lines 47 and 48 are variables which help us determine when a Pi last sent a frame to the server. If it has been a while (i.e. there is a problem), we can get rid of the static, out of date image in our montage. The

lastActive

dictionary will have hostname keys and timestamps for values.

Lines 53-55 are constants which help us to calculate whether a Pi is active. Line 55 itself calculates that our check for activity will be

seconds. You can reduce this period of time by adjusting

ESTIMATED_NUM_PIS

and

ACTIVE_CHECK_PERIOD

on Lines 53 and 54.

Our

mW

and

mH

variables on Lines 59 and 60 represent the width and height (columns and rows) for our montage. These values are pulled directly from the command line

args

dictionary.

Let’s loop over incoming streams from our clients and processing the data!

# start looping over all the frames
while True:
	# receive RPi name and frame from the RPi and acknowledge
	# the receipt
	(rpiName, frame) = imageHub.recv_image()
	imageHub.send_reply(b'OK')

	# if a device is not in the last active dictionary then it means
	# that its a newly connected device
	if rpiName not in lastActive.keys():
		print("[INFO] receiving data from {}...".format(rpiName))

	# record the last active time for the device from which we just
	# received a frame
	lastActive[rpiName] = datetime.now()

We begin looping on Line 65.

Lines 68 and 69 grab an image from the

imageHub

and send an ACK message. The result of

imageHub.recv_image

rpiName

, in our case the hostname, and the video

frame

itself.

It is really as simple as that to receive frames from an ImageZMQ video stream!

Lines 73-78 perform housekeeping duties to determine when a Raspberry Pi was

lastActive

Let’s perform inference on a given incoming

frame

# resize the frame to have a maximum width of 400 pixels, then
	# grab the frame dimensions and construct a blob
	frame = imutils.resize(frame, width=400)
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)),
		0.007843, (300, 300), 127.5)

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

	# reset the object count for each object in the CONSIDER set
	objCount = {obj: 0 for obj in CONSIDER}

Lines 82-90 perform object detection on the

frame

The
```
frame
```
dimensions are computed.
A
```
blob
```
is created from the image (see this post for more details about how OpenCV’s blobFromImage function works).
The
```
blob
```
is passed through the neural net.

From there, on Line 93 we reset the object counts to zero (we will be populating the dictionary with fresh count values shortly).

Let’s loop over the detections with the goal of (1) counting, and (2) drawing boxes around objects that we are considering:

# loop over the detections
	for i in np.arange(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with
		# the prediction
		confidence = detections[0, 0, i, 2]

		# filter out weak detections by ensuring the confidence is
		# greater than the minimum confidence
		if confidence > args["confidence"]:
			# extract the index of the class label from the
			# detections
			idx = int(detections[0, 0, i, 1])

			# check to see if the predicted class is in the set of
			# classes that need to be considered
			if CLASSES[idx] in CONSIDER:
				# increment the count of the particular object
				# detected in the frame
				objCount[CLASSES[idx]] += 1

				# compute the (x, y)-coordinates of the bounding box
				# for the object
				box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
				(startX, startY, endX, endY) = box.astype("int")

				# draw the bounding box around the detected object on
				# the frame
				cv2.rectangle(frame, (startX, startY), (endX, endY),
					(255, 0, 0), 2)

On Line 96 we begin looping over each of the

detections

. Inside the loop, we proceed to:

Extract the object
```
confidence
```
and filter out weak detections (Lines 99-103).
Grab the label
```
idx
```
(Line 106) and ensure that the label is in the
```
CONSIDER
```
set (Line 110). For each detection that has passed the two checks (
```
confidence
```
threshold and in
```
CONSIDER
```
), we will:
- Increment the
```
objCount
```
  for the respective object (Line 113).
- Draw a
```
rectangle
```
  around the object (Lines 117-123).

Next, let’s annotate each frame with the hostname and object counts. We’ll also build a montage to display them in:

# draw the sending device name on the frame
	cv2.putText(frame, rpiName, (10, 25),
		cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)

	# draw the object count on the frame
	label = ", ".join("{}: {}".format(obj, count) for (obj, count) in
		objCount.items())
	cv2.putText(frame, label, (10, h - 20),
		cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255,0), 2)

	# update the new frame in the frame dictionary
	frameDict[rpiName] = frame

	# build a montage using images in the frame dictionary
	montages = build_montages(frameDict.values(), (w, h), (mW, mH))

	# display the montage(s) on the screen
	for (i, montage) in enumerate(montages):
		cv2.imshow("Home pet location monitor ({})".format(i),
			montage)

	# detect any kepresses
	key = cv2.waitKey(1) & 0xFF

On Lines 126-133 we make two calls to

cv2.putText

to draw the Raspberry Pi hostname and object counts.

From there we update our

frameDict

with the

frame

corresponding to the RPi hostname.

Lines 139-144 create and display a montage of our client frames. The montage will be

mW

frames wide and

mH

frames tall.

Keypresses are captured via Line 147.

The last block is responsible for checking our

lastActive

timestamps for each client feed and removing frames from the montage that have stalled. Let’s see how it works:

# if current time *minus* last time when the active device check
	# was made is greater than the threshold set then do a check
	if (datetime.now() - lastActiveCheck).seconds > ACTIVE_CHECK_SECONDS:
		# loop over all previously active devices
		for (rpiName, ts) in list(lastActive.items()):
			# remove the RPi from the last active and frame
			# dictionaries if the device hasn't been active recently
			if (datetime.now() - ts).seconds > ACTIVE_CHECK_SECONDS:
				print("[INFO] lost connection to {}".format(rpiName))
				lastActive.pop(rpiName)
				frameDict.pop(rpiName)

		# set the last active check time as current time
		lastActiveCheck = datetime.now()

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# do a bit of cleanup
cv2.destroyAllWindows()

There’s a lot going on in Lines 151-162. Let’s break it down:

We only perform a check if at least
```
ACTIVE_CHECK_SECONDS
```
have passed (Line 151).
We loop over each key-value pair in
```
lastActive
```
(Line 153):
- If the device hasn’t been active recently (Line 156) we need to remove data (Lines 158 and 159). First we remove (
```
pop
```
  ) the
```
rpiName
```
  and timestamp from
```
lastActive
```
  . Then the
```
rpiName
```
  and frame are removed from the
```
frameDict
```
  .
The
```
lastActiveCheck
```
is updated to the current time on Line 162.

Effectively this will help us get rid of expired frames (i.e. frames that are no longer real-time). This is really important if you are using the ImageHub server for a security application. Perhaps you are saving key motion events like a Digital Video Recorder (DVR). The worst thing that could happen if you don’t get rid of expired frames is that an intruder kills power to a client and you don’t realize the frame isn’t updating. Think James Bond or Jason Bourne sort of spy techniques.

Last in the loop is a check to see if the

"q"

key has been pressed — if so we

break

from the loop and destroy all active montage windows (Lines 165-169).

Streaming video over network with OpenCV

Now that we’ve implemented both the client and the server, let’s put them to the test.

Make sure you use the “Downloads” section of this post to download the source code.

From there, upload the client to each of your Pis using SCP:

$ scp client.py pi@192.168.1.10:~
$ scp client.py pi@192.168.1.11:~
$ scp client.py pi@192.168.1.12:~
$ scp client.py pi@192.168.1.13:~

In this example, I’m using four Raspberry Pis, but four aren’t required — you can use more or less. Be sure to use applicable IP addresses for your network.

You also need to follow the installation instructions to install ImageZMQ on each Raspberry Pi. See the “Configuring your system and installing required packages” section in this blog post.

Before we start the clients, we must start the server. Let’s fire it up with the following command:

$ python server.py --prototxt MobileNetSSD_deploy.prototxt \
	--model MobileNetSSD_deploy.caffemodel --montageW 2 --montageH 2

Once your server is running, go ahead and start each client pointing to the server. Here is what you need to do on each client, step-by-step:

Open an SSH connection to the client:
```
ssh pi@192.168.1.10
```
Start screen on the client:
```
screen
```
Source your profile:
```
source ~/.profile
```
Activate your environment:
```
workon py3cv4
```
Install ImageZMQ using instructions in “Configuring your system and installing required packages”.

Run the client:

python client.py --server-ip 192.168.1.5

As an alternative to these steps, you may start the client script on reboot.

Automagically, your server will start bringing in frames from each of your Pis. Each frame that comes in is passed through the MobileNet SSD. Here’s a quick demo of the result:

A full video demo can be seen below:

What’s next?

Is your brain spinning with new Raspberry Pi project ideas right now?

The Raspberry Pi is my favorite community driven product for Computer Vision, IoT, and Edge Computing.

The possibilities with the Raspberry Pi are truly endless:

Maybe you have a video streaming idea based on this post.
Or perhaps you want to learn about deep learning with the Raspberry Pi.
Interested in robotics? Why not build a small computer vision-enabled robot or self-driving RC car?
Face recognition, classroom attendance, and security? All possible.

I’ve been so excited about the Raspberry Pi that I decided to write a book with over 40 practical, hands-on chapters that you’ll be able to learn from and hack with.

Inside the book, I’ll be sharing my personal tips and tricks for working with the Raspberry Pi (you can apply them to other resource-constrained devices too). You can view the full Raspberry Pi for Computer Vision table of contents here.

The book is currently in development. That said, you can reserve your copy by pre-ordering now and get a great deal on my other books/courses.

The pre-order sale ends on Friday, May 10th, 2019 at 10:00AM EDT. Don’t miss out on these huge savings!

I want to pre-order my copy!

Summary

In this tutorial, you learned how to stream video over a network using OpenCV and the ImageZMQ library.

Instead of relying on IP cameras or FFMPEG/GStreamer, we used a simple webcam and a Raspberry Pi to capture input frames and then stream them to a more powerful machine for additional processing using a distributed system concept called message passing.

Thanks to Jeff Bass’ hard work (the creator of ImageZMQ) our implementation required only a few lines of code.

If you are ever in a situation where you need to stream live video over a network, definitely give ImageZMQ a try — I think you’ll find it super intuitive and easy to use.

I’ll be back in a few days with an interview with Jeff Bass as well!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Live video streaming over network with OpenCV and ImageZMQ appeared first on PyImageSearch.

In this tutorial, you will learn how to configure your Google Coral TPU USB Accelerator on Raspberry Pi and Ubuntu. You’ll then learn how to perform classification and object detection using Google Coral’s USB Accelerator.

A few weeks ago, Google released “Coral”, a super fast, “no internet required” development board and USB accelerator that enables deep learning practitioners to deploy their models “on the edge” and “closer to the data”.

Using Coral, deep learning developers are no longer required to have an internet connection, meaning that the Coral TPU is fast enough to perform inference directly on the device rather than sending the image/frame to the cloud for inference and prediction.

The Google Coral comes in two flavors:

A single-board computer with an onboard Edge TPU. The dev board could be thought of an “advanced Raspberry Pi for AI” or a competitor to NVIDIA’s Jetson Nano.
A USB accelerator that plugs into a device (such as a Raspberry Pi). The USB stick includes an Edge TPU built into it. Think of Google’s Coral USB Accelerator as a competitor to Intel’s Movidius NCS.

Today we’ll be focusing on the Coral USB Accelerator as it’s easier to get started with (and it fits nicely with our theme of Raspberry Pi-related posts the past few weeks).

To learn how to configure your Google Coral USB Accelerator (and perform classification + object detection), just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Getting started with Google Coral’s TPU USB Accelerator

Figure 1: The Google Coral TPU Accelerator adds deep learning capability to resource-constrained devices like the Raspberry Pi (source).

In this post I’ll be assuming that you have:

Your Google Coral USB Accelerator stick
A fresh install of a Debian-based Linux distribution (i.e., Raspbian, Ubuntu, etc.)
Understand basic Linux commands and file paths

If you don’t already own a Google Coral Accelerator, you can purchase one via Google’s official website.

I’ll be configuring the Coral USB Accelerator on Raspbian, but again, provided that you have a Debian-based OS, these commands will still work.

Let’s get started!

Downloading and installing Edge TPU runtime library

If you are using a Raspberry Pi, you first need to install

feh

, used by the Edge TPU runtime example scripts to display output images:

$ sudo apt-get install feh

The next step is to download the Edge TPU runtime and Python library. The easiest way to download the package is to simply use the command line +

wget

$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/edgetpu_api.tar.gz

Now that the TPU runtime has been downloaded, we can extract it, change directory into

python-tflite-source

, and then install it (notice that

sudo

permissions are not required):

$ tar xzf edgetpu_api.tar.gz
$ cd python-tflite-source
$ bash ./install.sh
...
Creating /home/pi/.local/lib/python3.5/site-packages/edgetpu.egg-link (link to .)
Adding edgetpu 1.2.0 to easy-install.pth file

Installed /home/pi/python-tflite-source
Processing dependencies for edgetpu==1.2.0
Searching for Pillow==4.0.0
Best match: Pillow 4.0.0
Adding Pillow 4.0.0 to easy-install.pth file

Using /usr/lib/python3/dist-packages
Searching for numpy==1.12.1
Best match: numpy 1.12.1
Adding numpy 1.12.1 to easy-install.pth file

Using /usr/lib/python3/dist-packages
Finished processing dependencies for edgetpu==1.2.0

During the install you’ll be prompted “Would you like to enable the maximum operating frequency?” — be careful with this setting!

According to Google’s official getting started guide, enabling this option will:

Improve your inference speed…
…but cause the USB Accelerator to become very hot.

If you were to touch it/brush up against the USB stick, it may burn you, so be careful with it!

My personal recommendation is to select

(for “No, I don’t want maximum operating frequency”), at least for your first install. You can always increase the operating frequency later.

Secondly, it’s important to note that you need at least Python 3.5 for the Edge TPU runtime library.

You cannot use Python 2.7 or any Python 3 version below Python 3.5.

The

install.sh

scripts assumes you’re using Python 3.5, so if you’re not, you’ll want to open up the

install.sh

script, scroll down to the final line of the file (i.e., the

setup.py

) where you’ll see this line:

python3.5 setup.py develop --user

If you’re using Python 3.6 you’ll simply want to change the Python version number:

python3.6 setup.py develop --user

After that, you’ll be able to successfully run the

install.sh

script.

Overall, the entire install process on a Raspberry Pi took just over one minute. If you’re using a more powerful system than the RPi then the install should be even faster.

Classification, object detection, and face detection using the Google Coral USB Accelerator

Now that we’ve installed the TPU runtime library, let’s put the Coral USB Accelerator to the test!

First, make sure you are in the

python-tflite-source/edgetpu

directory. If you followed my instructions and put

python-tflite-source

in your home directory then the following command will work for you:

$ cd ~/python-tflite-source/edgetpu

The next step is to download the pre-trained classification and object detection models. The full list of pre-trained models Google provides can be found here, including:

MobileNet V1 and V2 trained on ImageNet, iNat Insects, iNat Plants, and iNat Birds
Inception V1, V2, V2, and V4, all trained on ImageNet
MobileNet + SSD V1 and V2 trained on COCO
MobileNet + SSD V2 for face detection

Again, refer to this link for the pre-trained models Google Coral provides.

For the sake of this tutorial, we’ll be using the following models:

MobileNet V2 trained on ImageNet
MobileNet + SSD V2 for face detection
MobileNet + SSD V2 trained on COCO

You can use the following commands to download the models and follow along with this tutorial:

$ mkdir ~/edgetpu_models
$ wget https://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/mobilenet_v2_1.0_224_quant_edgetpu.tflite -P ~/edgetpu_models
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/imagenet_labels.txt -P ~/edgetpu_models
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite -P ~/edgetpu_models
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite -P ~/edgetpu_models
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/coco_labels.txt -P ~/edgetpu_models

For convenience, I’ve included all models + example images used in this tutorial in the “Downloads” section — I would recommend using the downloads to ensure you can follow along with the guide.

Again, notice how the models are downloaded to the

~/edgetpu_models

directory — that is important as it ensures the paths used in the examples below will work out of the box for you.

Let’s start by performing a simple image classification example:

$ cd python-tflite-source/edgetpu
$ python3 demo/classify_image.py \
    --model ~/edgetpu_models/ mobilenet_v2_1.0_224_quant_edgetpu.tflite \
    --label ~/edgetpu_models/imagenet_labels.txt \
    --image test_data/parrot.jpg 
---------------------------
macaw
Score :  0.99609375

Figure 2: The Google Coral has made a deep learning classification inference on a Macaw/parrot.

As you can see, MobileNet (trained on ImageNet) has correctly labeled the image as “Macaw”, a type of parrot.

Note: If you are using a Python virtual environment (covered below) you would want to use

python

rather than
python3
as the Python binary.

Now let’s try performing face detection using the Google Coral USB Accelerator:

$ cd python-tflite-source/edgetpu
$ python3 demo/object_detection.py \
    --model ~/edgetpu_models/mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite \
    --input test_data/face.jpg
-----------------------------------------
score =  0.99609375
box =  [474.22854804992676, 38.03488787482766, 738.8013491630554, 353.5309683683231]
-----------------------------------------
score =  0.9921875
box =  [205.4297697544098, 110.28378465056959, 487.75309658050537, 439.73802454331343]
-----------------------------------------
score =  0.83203125
box =  [6.2277887016534805, 182.35811898071842, 127.13575917482376, 326.5376813379348]
-----------------------------------------
score =  0.5
box =  [859.8422718048096, 213.5472493581642, 1008.978108882904, 383.9367261515483]

Figure 3: Deep learning face detection with the Google Coral and Raspberry Pi.

Here the MobileNet + SSD face detector was able to detect all four faces in the image. This is especially impressive given the poor lighting conditions and the partially obscured face on the far right.

The next example shows how to perform object detection using a MobileNet + SSD trained on the COCO dataset:

$ cd python-tflite-source/edgetpu
$ python3 demo/object_detection.py \
    --model ~/edgetpu_models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite \
    --label ~/edgetpu_models/coco_labels.txt \
    --input test_data/owl.jpg
-----------------------------------------
bird
score =  0.9921875
box =  [474.58224296569824, 40.04487991333008, 1063.5828018188477, 1135.0372314453125]
-----------------------------------------
bird
score =  0.06640625
box =  [208.7918758392334, 288.1847858428955, 1408.0253601074219, 1200.0]
-----------------------------------------
bird
score =  0.06640625
box =  [159.07530784606934, 0.0, 1473.2084274291992, 934.4905853271484]

Figure 4: Deep learning object detection with the Raspberry Pi and Google Coral.

Notice there are three detections but only one bird in the image — why is that?

The reason is that the

object_detection.py

script is not filtering on a minimum probability. You could easily modify the script to ignore detections with < 50% probability (I’ll leave that as an exercise to you, the reader, to implement).

For fun, I decided to try an image that was not included in the example TPU runtime library demos.

Here’s an example of applying the face detector to a custom image:

$ python3 demo/object_detection.py \
    --model ~/edgetpu_models/mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite \
    --input ~/IMG_7687.jpg
-----------------------------------------
score =  0.98046875
box =  [190.66683948040009, 0.0, 307.4474334716797, 125.00646710395813]

Figure 5: Testing face detection (using my own face) with the Google Coral and Raspberry Pi.

Sure enough, my face is detected!

Finally, here’s an example of running the MobileNet + SSD on the same image:

$ python3 demo/object_detection.py \
    --model ~/edgetpu_models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite \
    --label ~/edgetpu_models/coco_labels.txt \
    --input ~/IMG_7687.jpg
-----------------------------------------
person
score =  0.87890625
box =  [58.70787799358368, 10.639026761054993, 371.2196350097656, 494.61638927459717]
-----------------------------------------
dog
score =  0.58203125
box =  [50.500258803367615, 358.102411031723, 162.57299482822418, 500.0]
-----------------------------------------
dog
score =  0.33984375
box =  [13.502731919288635, 287.04309463500977, 152.83603966236115, 497.8201985359192]
-----------------------------------------
couch
score =  0.26953125
box =  [0.0, 88.88640999794006, 375.0, 423.55993390083313]
-----------------------------------------
couch
score =  0.16015625
box =  [3.753773868083954, 64.79595601558685, 201.68977975845337, 490.678071975708]
-----------------------------------------
dog
score =  0.12109375
box =  [65.94736874103546, 335.2701663970947, 155.95845878124237, 462.4992609024048]
-----------------------------------------
dog
score =  0.12109375
box =  [3.5936199128627777, 335.3758156299591, 118.05401742458344, 497.33099341392517]
-----------------------------------------
couch
score =  0.12109375
box =  [49.873560667037964, 97.65596687793732, 375.0, 247.15487658977509]
-----------------------------------------
dog
score =  0.12109375
box =  [92.47469902038574, 338.89272809028625, 350.16247630119324, 497.23270535469055]
-----------------------------------------
couch
score =  0.12109375
box =  [20.54794132709503, 99.93192553520203, 375.0, 369.604617357254]

Figure 6: An example of running the MobileNet SSD object detector on the Google Coral + Raspberry Pi.

Again, we can improve results by filtering on a minimum probability to remove the extraneous detections. Doing so would leave only two detections: person (87.89%) and dog (58.20%).

Installing the `edgetpu` runtime into Python virtual environments

Figure 7: Importing egetpu in Python inside of my coral virtual environment on the Raspberry Pi.

It’s a best practice to use Python virtual environments for development, and as you know, we make heavy use of Python virtual environments on the PyImageSearch blog.

Installing the

edgetpu

library into a Python virtual environment definitely requires a few more steps, but is well worth it to ensure you libraries are kept in sequestered, independent environments.

The first step is to install both

virtualenv

and

virtualenvwrapper

$ sudo pip3 install virtualenv virtualenvwrapper

You’ll notice that I’m using

sudo

here — this is super important as when installing the TPU runtime, the

install.sh

script created

~/.local

directory. If we try to install

virtualenv

and

virtualenvwrapper

via

pip

they would actually go into the

~/.local/bin

directory (which is what we don’t want).

The solution is to use

sudo

with

pip3

(like we did above) so

virtualenv

and

virtualenvwrapper

end up in

/usr/local/bin

The next step is to open our

~/.bashrc

file:

$ nano ~/.bashrc

Then, scroll down to the bottom and insert the following lines to

~/.bashrc

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

You can then re-load the

.bashrc

using

source

$ source ~/.bashrc

We can now create our Python 3 virtual environment:

$ mkvirtualenv coral -p python3

I’m naming my virtual environment

coral

but you can call it whatever you like.

Finally, sym-link in the

edgetpu

library to your Python virtual environment:

$ cd ~/.virtualenvs/coral/lib/python3.5/site-packages/
$ ln -s ~/python-tflite-source/edgetpu edgetpu
$ cd ~

Assuming you followed my exact instructions, your path to the

edgetpu

directory should match mine. If you didn’t follow my exact instructions then you’ll want to double-check and triple-check your paths.

As a sanity test, let’s try to import the

edgetpu

library into our Python virtual environment:

$ workon coral
$ python
>>> import edgetpu
>>>

As you can see, everything is working and we can now execute the demo scripts above using our Python virtual environment!

What about custom models on Google’s Coral?

You’ll notice that I’m only using pre-trained deep learning models on the Google Coral in this post — what about custom models that you train yourself?

Google does provide some documentation on that but it’s much more advanced, far too much for me to include in this blog post.

If you’re interested in learning how to train your own custom models for Google’s Coral I would recommend you take a look at my upcoming book, Raspberry Pi for Computer Vision where I’ll be covering the Google Coral in detail.

How do I use Google Coral’s Python runtime library in my own custom scripts?

Use the

edgetpu

library in conjunction with OpenCV and your own custom Python scripts is outside the scope of this post.

I’ll be covering how to use Google Coral in your own Python scripts in a future blog post as well as in my Raspberry Pi for Computer Vision book.

Thoughts, tips, and suggestions when using Google’s TPU USB Accelerator

Overall, I really liked the Coral USB Accelerator. I thought it was super easy to configure and install, and while not all the demos ran out of the box, with some basic knowledge of file paths, I was able to get them running in a few minutes.

In the future, I would like to see the Google TPU runtime library more compatible with Python virtual environments.

Technically, I could create a Python virtual environment and then edit the

install.sh

script to install into that virtual environment, but editing the

install.sh

script shouldn’t be a strict requirement — instead, I’d like to see that script detect my Python binary/environment and then install for that specific Python binary.

I’ll also add that inference on the Raspberry Pi is a bit slower than what’s advertised by the Google Coral TPU Accelerator — that’s actually not a problem with the TPU Accelerator, but rather the Raspberry Pi.

What do I mean by that?

Keep in mind that the Raspberry Pi 3B+ uses USB 2.0 but for more optimal inference speeds the Google Coral USB Accelerator recommends USB 3.

Since the RPi 3B+ doesn’t have USB 3, that’s not much we can do about that until the RPi 4 comes out — once it does, we’ll have even faster inference on the Pi using the Coral USB Accelerator.

Finally, I’ll note that once or twice during the object detection examples it appeared that the Coral USB Accelerator “locked up” and wouldn’t perform inference (I think it got “stuck” trying to load the model), forcing me to

ctrl + c

out of the script.

Killing the script must have prevented a critical “shut down” script to run on the Coral — any subsequent executions of the demo Python scripts would result in an error.

To fix the problem I had to unplug the Coral USB accelerator and then plug it back in. Again, I’m not sure why that happened and I couldn’t find any documentation on the Google Coral site that referenced the issue.

Interested in using the Google Coral in your own projects?

I bet you’re just as excited about the Google Coral as me. Along with the Movidius NCS and Jetson Nano, these devices are bringing computer vision and deep learning to resource constrained systems such as embedded devices and the Raspberry Pi.

In my opinion, embedded CV and DL is the next big wave in the AI community. It’s so big that it may even be a tsunami — will you be riding that wave?

To help you get your start in embedded Computer Vision and Deep Learning, I have decided to write a brand new book — Raspberry Pi for Computer Vision.

Inside this book you will learn how to:

Build practical, real-world computer vision applications on the Pi
Create computer vision and Internet of Things (IoT) projects and applications with the RPi
Optimize your OpenCV code and algorithms on the resource constrained Pi
Perform Deep Learning on the Raspberry Pi (including utilizing the Movidius NCS and OpenVINO toolkit)
Configure your Google Coral, perform image classification and object detection, and even train + deploy your own custom models to the Coral Edge TPU!
Utilize the NVIDIA Jetson Nano to run multiple deep neural networks on a single board, including image classification, object detection, segmentation, and more!

I’m running a Kickstarter campaign to fund the creation of the new book, and to celebrate, I’m offering 25% OFF my existing books and courses if you pre-order a copy of RPi for CV.

In fact, the Raspberry Pi for Computer Vision book is practically free if you pre-order it with Deep Learning for Computer Vision with Python or the PyImageSearch Gurus course.

The clock is ticking and these discounts won’t last — the Kickstarter pre-sale shuts down on May 10th at 10AM EDT, after which I’m taking the deals down.

Reserve your pre-sale book now and while you are there, grab another course or book at a discounted rate.

Take me to the Kickstarter page!

Summary

In this tutorial, you learned how to get started with the Google Coral USB Accelerator.

We started by installing the Edge TPU runtime library on your Debian-based operating system (we specifically used Raspbian for the Raspberry Pi).

After that, we learned how to run the example demo scripts included in the Edge TPU library download.

We also learned how to install the

edgetpu

library into a Python virtual environment (that way we can keep our packages/projects nice and tidy).

We wrapped up the tutorial by discussing some of my thoughts, feedback, and suggestions when using the Coral USB Accelerator (be sure to refer them first if you have any questions).

I hope you enjoyed this tutorial!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Getting started with Google Coral’s TPU USB Accelerator appeared first on PyImageSearch.

In this tutorial, you will learn how to use OpenCV and machine learning to automatically detect Parkinson’s disease in hand-drawn images of spirals and waves.

Today’s tutorial is inspired from PyImageSearch reader, Joao Paulo Folador, a PhD student from Brazil.

Joao is interested in utilizing computer vision and machine learning to automatically detect and predict Parkinson’s disease based on geometric drawings (i.e., spirals and sign waves).

While I am familiar with Parkinson’s disease, I had not heard of the geometric drawing test — a bit of research led me to a 2017 paper, Distinguishing Different Stages of Parkinson’s Disease Using Composite Index of Speed and Pen-Pressure of Sketching a Spiral, by Zham et al.

The researchers found that the drawing speed was slower and the pen pressure lower among Parkinson’s patients — this was especially pronounced for patients with a more acute/advanced forms of the disease.

One of the symptoms of Parkinson’s is tremors and rigidity in the muscles, making it harder to draw smooth spirals and waves.

Joao postulated that it might be possible to detect Parkinson’s disease using the drawings alone rather than having to measure the speed and pressure of the pen on paper.

Reducing the requirement of tracking pen speed and pressure:

Eliminates the need for additional hardware when performing the test.
Makes it far easier to automatically detect Parkinson’s.

Graciously, Joao and his advisor allowed me access to the dataset they collected of both spirals and waves drawn by (1) patients with Parkinson’s, and (2) healthy participants.

I took a look at the dataset and considered our options.

Originally, Joao wanted to apply deep learning to the project, but after consideration, I carefully explained that deep learning, while powerful, isn’t always the right tool for the job! You wouldn’t want to use a hammer to drive in a screw, for instance.

Instead, you look at your toolbox, carefully consider your options, and grab the right tool.

I explained this to Joao and then demonstrated how we can predict Parkinson’s in images with 83.33% accuracy using standard computer vision and machine learning algorithms.

To learn how to apply computer vision and OpenCV to detect Parkinson’s based on geometric drawings, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Detecting Parkinson’s with OpenCV, Computer Vision, and the Spiral/Wave Test

In the first part of this tutorial, we’ll briefly discuss Parkinson’s disease, including how geometric drawings can be used to detect and predict Parkinson’s.

We’ll then examine our dataset of drawings gathered from both patients with and without Parkinson’s.

After reviewing the dataset, I will teach how to use the HOG image descriptor to quantify the input images and then how we can train a Random Forest classifier on top of the extracted features.

We’ll wrap up by examining our results.

What is Parkinson’s disease?

Figure 1: Patients with Parkinson’s disease have nervous system issues. Symptoms include movement issues such as tremors and rigidity. In this blog post, we’ll use OpenCV and machine learning to detect Parkinson’s disease from hand drawings consisting of spirals and waves.

Parkinson’s disease is a nervous system disorder that affects movement. The disease is progressive and is marked by five different stages (source).

Stage 1: Mild symptoms that do not typically interfere with daily life, including tremors and movement issues on only one side of the body.
Stage 2: Symptoms continue to become worse with both tremors and rigidity now affecting both sides of the body. Daily tasks become challenging.
Stage 3: Loss of balance and movements with falls becoming frequent and common. The patient is still capable of (typically) living independently.
Stage 4: Symptoms become severe and constraining. The patient is unable to live alone and requires help to perform daily activities.
Stage 5: Likely impossible to walk or stand. The patient is most likely wheelchair bound and may even experience hallucinations.

While Parkinson’s cannot be cured, early detection along with proper medication can significantly improve symptoms and quality of life, making it an important topic as computer vision and machine learning practitioners to explore.

Drawing spirals and waves to detect Parkinson’s disease

Figure 2: A 2017 study by Zham et al. concluded that it is possible to detect Parkinson’s by asking the patient to draw a spiral while tracking the speed of pen movement and pressure. No image processing was conducted in this study. (image source)

A 2017 study by Zham et al. found that it was possible to detect Parkinson’s by asking the patient to draw a spiral and then track:

Speed of drawing
Pen pressure

We’ll be leveraging the fact that two of the most common Parkinson’s symptoms include tremors and muscle rigidity which directly impact the visual appearance of a hand drawn spiral and wave.

The variation in visual appearance will enable us to train a computer vision + machine learning algorithm to automatically detect Parkinson’s disease.

The spiral and wave dataset

Figure 3: Today’s Parkinson’s image dataset is curated by Andrade and Folado from the NIATS of Federal University of Uberlândia. We will use Python and OpenCV to train a model for automatically classifying Parkinson’s from similar spiral/wave drawings.

The dataset we’ll be using here today was curated by Adriano de Oliveira Andrade and Joao Paulo Folado from the NIATS of Federal University of Uberlândia.

The dataset itself consists of 204 images and is pre-split into a training set and a testing set, consisting of:

Spiral: 102 images, 72 training, and 30 testing
Wave: 102 images, 72 training, and 30 testing

Figure 3 above shows examples of each of the drawings and corresponding classes.

While it would be challenging, if not impossible, for a person to classify Parkinson’s vs. healthy in some of these drawings, others show a clear deviation in visual appearance — our goal is to quantify the visual appearance of these drawings and then train a machine learning model to classify them.

Preparing a computing environment for today’s project

Today’s environment is straightforward to get up and running on your system.

You will need the following software:

OpenCV
NumPy
Scikit-learn
Scikit-image
imutils

Each package can be installed with pip, Python’s package manager.

But before you dive into pip, read this tutorial to set up your virtual environment and to install OpenCV with pip.

Below you can find the commands you’ll need to configure your development environment.

$ workon cv # insert your virtual environment name such as `cv`
$ pip install opencv-contrib-python # see the tutorial linked above
$ pip install scikit-learn
$ pip install scikit-image
$ pip install imutils

Project structure

Go ahead and grab today’s “Downloads” associated with today’s post. The .zip file contains the spiral and wave dataset along with a single Python script.

You may use the

tree

command in a terminal to inspect the structure of the files and folders:

$ tree --dirsfirst --filelimit 10
.
├── dataset
│   ├── spiral
│   │   ├── testing
│   │   │   ├── healthy [15 entries]
│   │   │   └── parkinson [15 entries]
│   │   └── training
│   │       ├── healthy [36 entries]
│   │       └── parkinson [36 entries]
│   └── wave
│       ├── testing
│       │   ├── healthy [15 entries]
│       │   └── parkinson [15 entries]
│       └── training
│           ├── healthy [36 entries]
│           └── parkinson [36 entries]
└── detect_parkinsons.py

15 directories, 1 file

Our

dataset/

is first broken down into

spiral/

and

wave/

. Each of those folders is further split into

testing/

and

training/

. Finally our images reside in

healthy/

parkinson/

folders.

We’ll be reviewing a single Python script today:

detect_parkinsons.py

. This script will read all of the images, extract features, and train a machine learning model. Finally, results will be displayed in a montage.

Implementing the Parkinson’s detector script

To implement our Parkinson’s detector you may be tempted to throw deep learning and Convolutional Neural Networks (CNNs) at the problem — there’s a problem with that approach though.

To start, we don’t have much training data, only 72 images for training. When confronted with a lack of tracking data we typically apply data augmentation — but data augmentation in this context is also problematic.

You would need to be extremely careful as improper use of data augmentation could potentially make a healthy patient’s drawing look like a Parkinson’s patient’s drawing (or vice versa).

And more to the point, effectively applying computer vision to a problem is all about bringing the right tool to the job — you wouldn’t use a screwdriver to bang in a nail, for instance.

Just because you may know how to apply deep learning to a problem doesn’t necessarily mean that deep learning is “always” the best choice for the problem.

In this example, I’ll show you how the Histogram of Oriented Gradients (HOG) image descriptor along with a Random Forest classifier can perform quite well given the limited amount of training data.

Open up a new file, name it

detect_parkinsons.py

, and insert the following code:

# import the necessary packages
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from skimage import feature
from imutils import build_montages
from imutils import paths
import numpy as np
import argparse
import cv2
import os

We begin with our imports on Lines 2-11:

We’ll be making heavy use of scikit-learn as is evident in the first three imports:
- The classifier we are using is the
```
RandomForestClassifier
```
  .
- We’ll use a
```
LabelEncoder
```
  to encode labels as integers.
- A
```
confusion_matrix
```
  will be built so that we can derive raw accuracy, sensitivity, and specificity.
Histogram of Oriented Gradients (HOG) will come from the
```
feature
```
import of scikit-image.
Two modules from
```
imutils
```
will be put to use:
- We will
```
build_montages
```
  for visualization.
- Our
```
paths
```
  import will help us to extract the file paths to each of the images in our dataset.
NumPy will help us calculate statistics and grab random indices.
The
```
argparse
```
import will allow us to parse command line arguments.
OpenCV (
```
cv2
```
) will be used to read, process, and display images.
Our program will accommodate both Unix and Windows file paths with the
```
os
```
module.

Let’s define a function to quantify a wave/spiral

image

with the HOG method:

def quantify_image(image):
	# compute the histogram of oriented gradients feature vector for
	# the input image
	features = feature.hog(image, orientations=9,
		pixels_per_cell=(10, 10), cells_per_block=(2, 2),
		transform_sqrt=True, block_norm="L1")

	# return the feature vector
	return features

We will extract features from each input image with the

quantify_image

function.

First introduced by Dalal and Triggs in their CVPR 2005 paper, Histogram of Oriented Gradients for Human Detection, HOG will be used to quantify our image.

HOG is a structural descriptor that will capture and quantify changes in local gradient in the input image. HOG will naturally be able to quantify how the directions of a both spirals and waves change.

And furthermore, HOG will be able to capture if these drawings have more of a “shake” to them, as we might expect from a Parkinson’s patient.

Another application of HOG is this PyImageSearch Gurus sample lesson. Be sure to refer to the sample lesson for a full explanation on the

feature.hog

parameters.

The resulting features are a 12,996-dim feature vector (list of numbers) quantifying the wave or spiral. We’ll train a Random Forest classifier on top of the features from all images in the dataset.

Moving on, let’s load our data and extract features:

def load_split(path):
	# grab the list of images in the input directory, then initialize
	# the list of data (i.e., images) and class labels
	imagePaths = list(paths.list_images(path))
	data = []
	labels = []

	# loop over the image paths
	for imagePath in imagePaths:
		# extract the class label from the filename
		label = imagePath.split(os.path.sep)[-2]

		# load the input image, convert it to grayscale, and resize
		# it to 200x200 pixels, ignoring aspect ratio
		image = cv2.imread(imagePath)
		image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
		image = cv2.resize(image, (200, 200))

		# threshold the image such that the drawing appears as white
		# on a black background
		image = cv2.threshold(image, 0, 255,
			cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

		# quantify the image
		features = quantify_image(image)

		# update the data and labels lists, respectively
		data.append(features)
		labels.append(label)

	# return the data and labels
	return (np.array(data), np.array(labels))

The

load_split

function has a goal of accepting a dataset

path

and returning all feature

data

and associated class

labels

. Let’s break it down step by step:

The function is defined to accept a
```
path
```
to the dataset (either waves or spirals) on Line 23.
From there we grab input
```
imagePaths
```
, taking advantage of imutils (Line 26).
Both
```
data
```
and
```
labels
```
lists are initialized (Lines 27 and 28).
From there we loop over all
```
imagePaths
```
beginning on Line 31:
- Each
```
label
```
  is extracted from the path (Line 33).
- Each
```
image
```
  is loaded and preprocessed (Lines 37-44). The thresholding step segments the drawing from the input image, making the drawing appear as white foreground on a black background.
- Features are extracted via our
```
quantify_image
```
  function (Line 47).
- The
```
features
```
  and
```
label
```
  are appended to the
```
data
```
  and
```
labels
```
  lists respectively (Lines 50-51).
Finally
```
data
```
and
```
labels
```
are converted to NumPy arrays and returned conveniently in a tuple (Line 54).

Let’s go ahead and parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-t", "--trials", type=int, default=5,
	help="# of trials to run")
args = vars(ap.parse_args())

Our script handles two command line arguments:

```
--dataset
```
: The path to the input dataset (either waves or spirals).
```
--trials
```
: The number of trials to run (by default we run
```
5
```
trials).

To prepare for training we’ll perform initializations:

# define the path to the training and testing directories
trainingPath = os.path.sep.join([args["dataset"], "training"])
testingPath = os.path.sep.join([args["dataset"], "testing"])

# loading the training and testing data
print("[INFO] loading data...")
(trainX, trainY) = load_split(trainingPath)
(testX, testY) = load_split(testingPath)

# encode the labels as integers
le = LabelEncoder()
trainY = le.fit_transform(trainY)
testY = le.transform(testY)

# initialize our trials dictionary
trials = {}

Here we are building paths to training and testing input directories (Lines 65 and 66).

From there we load our training and testing splits by passing each path to

load_split

(Lines 70 and 71).

Our

trials

dictionary is initialized on Line 79 (recall that by default we will run

trials).

Let’s start our trials now:

# loop over the number of trials to run
for i in range(0, args["trials"]):
	# train the model
	print("[INFO] training model {} of {}...".format(i + 1,
		args["trials"]))
	model = RandomForestClassifier(n_estimators=100)
	model.fit(trainX, trainY)

	# make predictions on the testing data and initialize a dictionary
	# to store our computed metrics
	predictions = model.predict(testX)
	metrics = {}

	# compute the confusion matrix and and use it to derive the raw
	# accuracy, sensitivity, and specificity
	cm = confusion_matrix(testY, predictions).flatten()
	(tn, fp, fn, tp) = cm
	metrics["acc"] = (tp + tn) / float(cm.sum())
	metrics["sensitivity"] = tp / float(tp + fn)
	metrics["specificity"] = tn / float(tn + fp)

	# loop over the metrics
	for (k, v) in metrics.items():
		# update the trials dictionary with the list of values for
		# the current metric
		l = trials.get(k, [])
		l.append(v)
		trials[k] = l

On Line 82, we loop over each trial. In each trial, we:

Initialize our Random Forest classifier and train the model (Lines 86 and 87). For more information about Random Forests, including how they are used in context of computer vision, be sure to refer to PyImageSearch Gurus.
Make
```
predictions
```
on testing data (Line 91).
Compute accuracy, sensitivity, and specificity
```
metrics
```
(Lines 96-100).
Update our
```
trials
```
dictionary (Lines 103-108).

Looping over each of our metrics, we’ll print statistical information:

# loop over our metrics
for metric in ("acc", "sensitivity", "specificity"):
	# grab the list of values for the current metric, then compute
	# the mean and standard deviation
	values = trials[metric]
	mean = np.mean(values)
	std = np.std(values)

	# show the computed metrics for the statistic
	print(metric)
	print("=" * len(metric))
	print("u={:.4f}, o={:.4f}".format(mean, std))
	print("")

On Line 111, we loop over each

metric

Then we proceed to grab the

values

from the

trials

(Line 114).

Using the

values

, the mean and standard deviation are computed for each metric (Lines 115 and 116).

From there, the statistics are shown in the terminal.

Now comes the eye candy — we’re going to create a montage so that we can share our work visually:

# randomly select a few images and then initialize the output images
# for the montage
testingPaths = list(paths.list_images(testingPath))
idxs = np.arange(0, len(testingPaths))
idxs = np.random.choice(idxs, size=(25,), replace=False)
images = []

# loop over the testing samples
for i in idxs:
	# load the testing image, clone it, and resize it
	image = cv2.imread(testingPaths[i])
	output = image.copy()
	output = cv2.resize(output, (128, 128))

	# pre-process the image in the same manner we did earlier
	image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
	image = cv2.resize(image, (200, 200))
	image = cv2.threshold(image, 0, 255,
		cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

First, we randomly sample images from our testing set (Lines 126-128).

Our

images

list will hold each wave or spiral image along with annotations added via OpenCV drawing functions (Line 129).

We proceed to loop over the random image indices on Line 132.

Inside the loop, each image is processed in the same manner as during training (Lines 134-142).

From there we’ll automatically classify the image using our new HOG + Random Forest based classifier and add color-coded annotations:

# quantify the image and make predictions based on the extracted
	# features using the last trained Random Forest
	features = quantify_image(image)
	preds = model.predict([features])
	label = le.inverse_transform(preds)[0]

	# draw the colored class label on the output image and add it to
	# the set of output images
	color = (0, 255, 0) if label == "healthy" else (0, 0, 255)
	cv2.putText(output, label, (3, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
		color, 2)
	images.append(output)

# create a montage using 128x128 "tiles" with 5 rows and 5 columns
montage = build_montages(images, (128, 128), (5, 5))[0]

# show the output montage
cv2.imshow("Output", montage)
cv2.waitKey(0)

Each

image

is quantified with HOG

features

(Line 146).

Then the image is classified by passing those

features

model.predict

(Lines 147 and 148).

The class label is colored green for

"healthy"

and red otherwise (Line 152). The

label

is drawn in the top left corner of the image (Lines 153 and 154).

Each

output

image is then added to an

images

list (Line 155) so that we can develop a

montage

(Line 158). You can learn more about creating Montages with OpenCV.

The

montage

is then displayed via Line 161 until a key is pressed.

Training the Parkinson’s detector model

Figure 4: Using Python, OpenCV, and machine learning (Random Forests), we have classified Parkinson’s patients using their hand-drawn spirals with 83.33% accuracy.

Let’s put our Parkinson’s disease detector to the test!

Use the “Downloads” section of this tutorial to download the source code and dataset.

From there, navigate to where you downloaded the .zip file, unarchive it, and execute the following command to train our “wave” model:

$ python detect_parkinsons.py --dataset dataset/wave
[INFO] loading data...
[INFO] training model 1 of 5...
[INFO] training model 2 of 5...
[INFO] training model 3 of 5...
[INFO] training model 4 of 5...
[INFO] training model 5 of 5...
acc
===
u=0.7133, o=0.0452

sensitivity
===========
u=0.6933, o=0.0998

specificity
===========
u=0.7333, o=0.0730

Examining our output you’ll see that we obtained 71.33% classification accuracy on the testing set, with a sensitivity of 69.33% (true-positive rate) and specificity of 73.33% (true-negative rate).

It’s important that we measure both sensitivity and specificity as:

Sensitivity measures the true positives that were also predicted as positives.
Specificity measures the true negatives that were also predicted as negative.

Machine learning models, especially machine learning models in the medical space, need to take utmost care when balancing true positives and true negatives:

We don’t want to classify someone as “No Parkinson’s” when they are in fact positive for Parkinson’s.
And similarly, we don’t want to classify someone as “Parkinson’s positive” when in fact they don’t have the disease.

Let’s now train our model on the “spiral” drawings:

$ python detect_parkinsons.py --dataset dataset/spiral
[INFO] loading data...
[INFO] training model 1 of 5...
[INFO] training model 2 of 5...
[INFO] training model 3 of 5...
[INFO] training model 4 of 5...
[INFO] training model 5 of 5...
acc
===
u=0.8333, o=0.0298

sensitivity
===========
u=0.7600, o=0.0533

specificity
===========
u=0.9067, o=0.0327

This time we reach 83.33% accuracy on the testing set, with a sensitivity of 76.00% and specificity of 90.67%.

Looking at the standard deviations we can also see less significantly less variance and a more compact distribution.

When automatically detecting Parkinson’s disease in hand drawings, at least when utilizing this particular dataset, the “spiral” drawing seems to be much more useful and informative.

Fill your toolbox with the right computer vision tools for the job

Deep learning methods are all the rage right now, and yes, they are super powerful, but deep learning doesn’t make other computer vision techniques obsolete.

Instead, you need to bring the right tool to the job. You wouldn’t try to bang in a screw with a hammer, you would instead use a screwdriver. The same concept is true with computer vision — you bring the right tool to the job.

In order to help build your toolbox of computer vision algorithms I have put together the PyImageSearch Gurus course.

Inside the course you’ll learn:

Machine learning and image classification
Automatic License/Number Plate Recognition (ANPR)
Face recognition
How to train HOG + Linear SVM object detectors
Content-based Image Retrieval (i.e., image search engines)
Processing image datasets with Hadoop and MapReduce
Hand gesture recognition
Deep learning fundamentals
…and much more!

PyImageSearch Gurus is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons, with other 2,161 pages of content. You won’t find a more detailed computer vision course anywhere else online, I guarantee it.

The PyImageSearch Gurus course also includes private community forums. I participate in the Gurus forum virtually every day, so it’s a great way to get expert advice, both from me and from the other advanced students, on a daily basis.

To learn more about the PyImageSearch Gurus course + community (and grab 10 FREE sample lessons), just click the button below:

Click here to learn more about PyImageSearch Gurus!

Summary

In this tutorial, you learned how to detect Parkinson’s disease in geometric drawings (specifically spirals and waves) using OpenCV and computer vision. We utilized the Histogram of Oriented Gradients image descriptor to quantify each of the input images.

After extracting features from the input images we trained a Random Forest classifier with 100 total decision trees in the forest, obtaining:

83.33% accuracy for spiral
71.33% accuracy for the wave

It’s also interesting to note that the Random Forest trained on the spiral dataset obtained 76.00% sensitivity, meaning that the model was capable of predicting a true positive (i.e., “Yes, the patient has Parkinson’s”) nearly 76% of the time.

This tutorial serves as yet another example of how computer vision can be applied to the medical domain (click here for more medical tutorials on PyImageSearch).

I hope you enjoyed it and find it helpful when performing your own research or building your own medical computer vision applications.

To download the source code to this post, and be notified when future tutorials are published on PyImageSearch, just enter your email address in the form below!

Downloads:

The post Detecting Parkinson’s Disease with OpenCV, Computer Vision, and the Spiral/Wave Test appeared first on PyImageSearch.

In this tutorial, you will learn how to get started with your NVIDIA Jetson Nano, including:

First boot
Installing system packages and prerequisites
Configuring your Python development environment
Installing Keras and TensorFlow on the Jetson Nano
Changing the default camera
Classification and object detection with the Jetson Nano

I’ll also provide my commentary along the way, including what tripped me up when I set up my Jetson Nano, ensuring you avoid the same mistakes I made.

By the time you’re done with this tutorial, your NVIDIA Jetson Nano will be configured and ready for deep learning!

To learn how to get started with the NVIDIA Jetson Nano, just keep reading!

Getting started with the NVIDIA Jetson Nano

Figure 1: In this blog post, we’ll get started with the NVIDIA Jetson Nano, an AI edge device capable of 472 GFLOPS of computation. At around $100 USD, the device is packed with capability including a Maxwell architecture 128 CUDA core GPU covered up by the massive heatsink shown in the image. (image source)

In the first part of this tutorial, you will learn how to download and flash the NVIDIA Jetson Nano .img file to your micro-SD card. I’ll then show you how to install the required system packages and prerequisites.

From there you will configure your Python development library and learn how to install the Jetson Nano-optimized version of Keras and TensorFlow on your device.

I’ll then show you how to access the camera on your Jetson Nano and even perform image classification and object detection on the Nano as well.

We’ll then wrap up the tutorial with a brief discussion on the Jetson Nano — a full benchmark and comparison between the NVIDIA Jetson Nano, Google Coral, and Movidius NCS will be published in a future blog post.

Before you get started with the Jetson Nano

Before you can even boot up your NVIDIA Jetson Nano you need three things:

A micro-SD card (minimum 16GB)
A 5V 2.5A MicroUSB power supply
An ethernet cable

I really want to stress the minimum of a 16GB micro-SD card. The first time I configured my Jetson Nano I used a 16GB card, but that space was eaten up fast, particularly when I installed the Jetson Inference library which will download a few gigabytes of pre-trained models.

I, therefore, recommend a 32GB micro-SD card for your Nano.

Secondly, when it comes to your 5V 2.5A MicroUSB power supply, in their documentation NVIDIA specifically recommends this one from Adafruit.

Finally, you will need an ethernet cable when working with the Jetson Nano which I find really, really frustrating.

The NVIDIA Jetson Nano is marketed as being a powerful IoT and edge computing device for Artificial Intelligence…

…and if that’s the case, why is there not a WiFi adapter on the device?

I don’t understand NVIDIA’s decision there and I don’t believe it should be up to the end user of the product to “bring their own WiFi adapter”.

If the goal is to bring AI to IoT and edge computing then there should be WiFi.

But I digress.

You can read more about NVIDIA’s recommendations for the Jetson Nano here.

Download and flash the .img file to your micro-SD card

Before we can get started installing any packages or running any demos on the Jetson Nano, we first need to download the Jetson Nano Developer Kit SD Card Image from NVIDIA’s website.

NVIDIA provides documentation for flashing the .img file to a micro-SD card for Windows, macOS, and Linux — you should choose the flash instructions appropriate for your particular operating system.

First boot of the NVIDIA Jetson Nano

After you’ve downloaded and flashed the .img file to your micro-SD card, insert the card into the micro-SD card slot.

I had a hard time finding the card slot — it’s actually underneath the heat sync, right where my finger is pointing to:

Figure 2: Where is the microSD card slot on the NVIDIA Jetson Nano? The microSD receptacle is hidden under the heatsink as shown in the image.

I think NVIDIA could have made the slot a bit more obvious, or at least better documented it on their website.

After sliding the micro-SD card home, connect your power supply and boot.

Assuming your Jetson Nano is connected to an HDMI output, you should see the following (or similar) displayed to your screen:

Figure 3: To get started with the NVIDIA Jetson Nano AI device, just flash the .img (preconfigured with Jetpack) and boot. From here we’ll be installing TensorFlow and Keras in a virtual environment.

The Jetson Nano will then walk you through the install process, including setting your username/password, timezone, keyboard layout, etc.

Installing system packages and prerequisites

In the remainder of this guide, I’ll be showing you how to configure your NVIDIA Jetson Nano for deep learning, including:

Installing system package prerequisites.
Installing Keras and TensorFlow and Keras on the Jetson Nano.
Installing the Jetson Inference engine.

Let’s get started by installing the required system packages:

$ sudo apt-get install git cmake
$ sudo apt-get install libatlas-base-dev gfortran
$ sudo apt-get install libhdf5-serial-dev hdf5-tools
$ sudo apt-get install python3-dev

Provided you have a good internet connection, the above commands should only take a few minutes to finish up.

Configuring your Python environment

The next step is to configure our Python development environment.

Let’s first install

pip

, Python’s package manager:

$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python3 get-pip.py
$ rm get-pip.py

We’ll be using Python virtual environments in this guide to keep our Python development environments independent and separate from each other.

Using Python virtual environments are a best practice and will help you avoid having to maintain a micro-SD for each development environment you want to use on your Jetson Nano.

To manage our Python virtual environments we’ll be using virtualenv and virtualenvwrapper which we can install using the following command:

$ sudo pip install virtualenv virtualenvwrapper

Once we’ve installed

virtualenv

and

virtualenvwrapper

we need to update our

~/.bashrc

file. I’m choosing to use

nano

but you can use whatever editor you are most comfortable with:

$ nano ~/.bashrc

Scroll down to the bottom of the

~/.bashrc

file and add the following lines:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

After adding the above lines, save and exit the editor.

Next, we need to reload the contents of the

~/.bashrc

file using the

source

command:

$ source ~/.bashrc

We can now create a Python virtual environment using the

mkvirtualenv

command — I’m naming my virtual environment

deep_learning

, but you can name it whatever you would like:

$ mkvirtualenv deep_learning -p python3

Installing TensorFlow and Keras on the NVIDIA Jetson Nano

Before we can install TensorFlow and Keras on the Jetson Nano, we first need to install NumPy.

First, make sure you are inside the

deep_learning

virtual environment by using the

workon

command:

$ workon deep_learning

From there, you can install NumPy:

$ pip install numpy

Installing NumPy on my Jetson Nano took ~10-15 minutes to install as it had to be compiled on the system (there currently no pre-built versions of NumPy for the Jetson Nano).

The next step is to install Keras and TensorFlow on the Jetson Nano. You may be tempted to do a simple

pip install tensorflow-gpu

— do not do this!

Instead, NVIDIA has provided an official release of TensorFlow for the Jetson Nano.

You can install the official Jetson Nano TensorFlow by using the following command:

$ pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3

Installing NVIDIA’s

tensorflow-gpu

package took ~40 minutes on my Jetson Nano.

The final step here is to install SciPy and Keras:

$ pip install scipy
$ pip install keras

These installs took ~35 minutes.

Compiling and installing Jetson Inference on the Nano

The Jetson Nano .img already has JetPack installed so we can jump immediately to building the Jetson Inference engine.

The first step is to clone down the

jetson-inference

repo:

$ git clone https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ git submodule update --init

We can then configure the build using

cmake

$ mkdir build
$ cd build
$ cmake ..

There are two important things to note when running

cmake

The
```
cmake
```
command will ask for root permissions so don’t walk away from the Nano until you’ve provided your root credentials.
During the configure process,
```
cmake
```
will also download a few gigabytes of pre-trained sample models. Make sure you have a few GB to spare on your micro-SD card! (This is also why I recommend a 32GB microSD card instead of a 16GB card).

After

cmake

has finished configuring the build, we can compile and install the Jetson Inference engine:

$ make
$ sudo make install

Compiling and installing the Jetson Inference engine on the Nano took just over 3 minutes.

What about installing OpenCV?

I decided to cover installing OpenCV on a Jetson Nano in a future tutorial. There are a number of

cmake

configurations that need to be set to take full advantage of OpenCV on the Nano, and frankly, this post is long enough as is.

Again, I’ll be covering how to configure and install OpenCV on a Jetson Nano in a future tutorial.

Running the NVIDIA Jetson Nano demos

When using the NVIDIA Jetson Nano you have two options for input camera devices:

A CSI camera module, such as the Raspberry Pi camera module (which is compatible with the Jetson Nano, by the way)
A USB webcam

I’m currently using all of my Raspberry Pi camera modules for my upcoming book, Raspberry Pi for Computer Vision so I decided to use my Logitech C920 which is plug-and-play compatible with the Nano (you could use the newer Logitech C960 as well).

The examples included with the Jetson Nano Inference library can be found in

jetson-inference

```
detectnet-camera
```
: Performs object detection using a camera as an input.
```
detectnet-console
```
: Also performs object detection, but using an input image rather than a camera.
```
imagenet-camera
```
: Performs image classification using a camera.
```
imagenet-console
```
: Classifies an input image using a network pre-trained on the ImageNet dataset.
```
segnet-camera
```
: Performs semantic segmentation from an input camera.
```
segnet-console
```
: Also performs semantic segmentation, but on an image.
A few other examples are included as well, including deep homography estimation and super resolution.

However, in order to run these examples, we need to slightly modify the source code for the respective cameras.

In each example you’ll see that the

DEFAULT_CAMERA

value is set to

-1

, implying that an attached CSI camera should be used.

However, since we are using a USB camera, we need to change the

DEFAULT_CAMERA

value from

-1

(or whatever the correct

/dev/video

V4L2 camera is).

Luckily, this change is super easy to do!

Let’s start with image classification as an example.

First, change directory into

~/jetson-inference/imagenet-camera

$ cd ~/jetson-inference/imagenet-camera

From there, open up

imagenet-camera.cpp

$ nano imagenet-camera.cpp

You’ll then want to scroll down to approximately Line 37 where you’ll see the

DEFAULT_CAMERA

value:

#define DEFAULT_CAMERA -1        // -1 for onboard camera, or change to index of /dev/video V4L2 camera (>=0)

Simply change that value from

-1

#define DEFAULT_CAMERA 0        // -1 for onboard camera, or change to index of /dev/video V4L2 camera (>=0)

From there, save and exit the editor.

After editing the C++ file you will need to recompile the example which is as simple as:

$ cd ../build
$ make
$ sudo make install

Keep in mind that

make

is smart enough to not recompile the entire library. It will only recompile files that have changed (in this case, the ImageNet classification example).

Once compiled, change to the

aarch64/bin

directory and execute the

imagenet-camera

binary:

$ cd aarch64/bin/
$ ./imagenet-camera
imagenet-camera
  args (1):  0 [./imagenet-camera]  

[gstreamer] initialized gstreamer, version 1.14.1.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVCAMERA
[gstreamer] gstCamera pipeline string:
v4l2src device=/dev/video0 ! video/x-raw, width=(int)1280, height=(int)720, format=YUY2 ! videoconvert ! video/x-raw, format=RGB ! videoconvert !appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_V4L2

imagenet-camera:  successfully initialized video device
    width:  1280
   height:  720
    depth:  24 (bpp)


imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   2

[TRT]  TensorRT version 5.0.6
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  loading network profile from engine cache... networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, networks/bvlc_googlenet.caffemodel loaded

Here you can see that the GoogLeNet is loaded into memory, after which inference starts:

Image classification is running at ~10 FPS on the Jetson Nano at 1280×720.

IMPORTANT: If this is the first time you are loading a particular model then it could take 5-15 minutes to load the model.

Internally, the Jetson Nano Inference library is optimizing and preparing the model for inference. This only has to be done once so subsequent runs of the program will be significantly faster (in terms of model loading time, not inference).

Now that we’ve tried image classification, let’s look at the object detection example on the Jetson Nano which is located in

~/jetson-inference/detectnet-camera/detectnet-camera.cpp

Again, if you are using a USB webcam you’ll want to edit approximately Line 39 of

detectnet-camera.cpp

and change

DEFAULT_CAMERA

from

-1

and then recompile via

make

(again, only necessary if you are using a USB webcam).

After compiling you can find the

detectnet-camera

binary in

~/jetson-inference/build/aarch64/bin

Let’s go ahead and run the object detection demo on the Jetson Nano now:

$ ./detectnet-camera 
detectnet-camera
  args (1):  0 [./detectnet-camera]  

[gstreamer] initialized gstreamer, version 1.14.1.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVCAMERA
[gstreamer] gstCamera pipeline string:
v4l2src device=/dev/video0 ! video/x-raw, width=(int)1280, height=(int)720, format=YUY2 ! videoconvert ! video/x-raw, format=RGB ! videoconvert !appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_V4L2

detectnet-camera:  successfully initialized video device
    width:  1280
   height:  720
    depth:  24 (bpp)


detectNet -- loading detection network model from:
          -- prototxt     networks/ped-100/deploy.prototxt
          -- model        networks/ped-100/snapshot_iter_70800.caffemodel
          -- input_blob   'data'
          -- output_cvg   'coverage'
          -- output_bbox  'bboxes'
          -- mean_pixel   0.000000
          -- class_labels networks/ped-100/class_labels.txt
          -- threshold    0.500000
          -- batch_size   2

[TRT]  TensorRT version 5.0.6
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file networks/ped-100/snapshot_iter_70800.caffemodel.2.1.GPU.FP16.engine
[TRT]  loading network profile from engine cache... networks/ped-100/snapshot_iter_70800.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, networks/ped-100/snapshot_iter_70800.caffemodel loaded

Here you can see that we are loading a model named

ped-100

used for pedestrian detection (I’m actually not sure what the specific architecture is as it’s not documented on NVIDIA’s website — if you know what architecture is being used, please leave a comment on this post).

Below you can see an example of myself being detected using the Jetson Nano object detection demo:

According to the output of the program, we’re obtaining ~5 FPS for object detection on 1280×720 frames when using the Jetson Nano. Not too bad!

How does the Jetson Nano compare to the Movidius NCS or Google Coral?

This tutorial is simply meant to be a getting started guide for your Jetson Nano — it is not meant to compare the Nano to the Coral or NCS.

I’m in the process of comparing each of the respective embedded systems and will be providing a full benchmark/comparison in a future blog post.

In the meantime, take a look at the following guides to help you configure your embedded devices and start running benchmarks of your own:

How do I deploy custom models to the Jetson Nano?

One of the benefits of the Jetson Nano is that once you compile and install a library with GPU support (compatible with the Nano, of course), your code will automatically use the Nano’s GPU for inference.

For example:

Earlier in this tutorial, we installed Keras + TensorFlow on the Nano. Any Python scripts that leverage Keras/TensorFlow will automatically use the GPU.

And similarly, any pre-trained Keras/TensorFlow models we use will also automatically use the Jetson Nano GPU for inference.

Pretty awesome, right?

Provided the Jetson Nano supports a given deep learning library (Keras, TensorFlow, Caffe, Torch/PyTorch, etc.), we can easily deploy our models to the Jetson Nano.

The problem here is OpenCV.

OpenCV’s Deep Neural Network (

dnn

) module does not support NVIDIA GPUs, including the Jetson Nano.

OpenCV is working to provide NVIDIA GPU support for their

dnn

module. Hopefully, it will be released by the end of the summer/autumn.

But until then we cannot leverage OpenCV’s easy to use

cv2.dnn

functions.

If using the

cv2.dnn

module is an absolute must for you right now, then I would suggest taking a look at Intel’s OpenVINO toolkit, the Movidius NCS, and their other OpenVINO-compatible products, all of which are optimized to work with OpenCV’s deep neural network module.

If you’re interested in learning more about the Movidius NCS and OpenVINO (including benchmark examples), be sure to refer to this tutorial.

Interested in using the NVIDIA Jetson Nano in your own projects?

I bet you’re just as excited about the NVIDIA Jetson Nano as I am. In contrast to pairing the Raspberry Pi with with either the Movidius NCS or Google Coral, the Jetson Nano has it all built right in (minus WiFi) to powerfully conduct computer vision and deep learning at the edge.

In my opinion, embedded CV and DL is the next big wave in the AI community. It’s so big that it may even be a tsunami — will you be riding that wave?

To help you get your start in embedded Computer Vision and Deep Learning, I have decided to write a brand new book — Raspberry Pi for Computer Vision.

I’ve chosen to focus on the Raspberry Pi as it is the best entry-level device for getting started into the world of computer vision for IoT.

But I’m not stopping there. Inside the book, we’ll:

Augment the Raspberry Pi with the Google Coral and Movidius NCS coprocessors.
Apply the same skills we learn with the RPi to a device with more horsepower: NVIDIA’s Jetson Nano.

Additionally, you’ll learn how to:

Build practical, real-world computer vision applications on the Pi.
Create computer vision and Internet of Things (IoT) projects and applications with the RPi.
Optimize your OpenCV code and algorithms on the resource-constrained Pi.
Perform Deep Learning on the Raspberry Pi (including utilizing the Movidius NCS and OpenVINO toolkit).
Configure your Google Coral, perform image classification and object detection, and even train + deploy your own custom models to the Coral Edge TPU!
Utilize the NVIDIA Jetson Nano to run multiple deep neural networks on a single board, including image classification, object detection, segmentation, and more!

I’m running a Kickstarter campaign to fund the creation of the new book, and to celebrate, I’m offering 25% OFF my existing books and courses if you pre-order a copy of RPi for CV.

In fact, the Raspberry Pi for Computer Vision book is practically free if you pre-order it with Deep Learning for Computer Vision with Python or the PyImageSearch Gurus course.

The clock is ticking and these discounts won’t last — the Kickstarter pre-sale shuts down on this Friday (May 10th) at 10AM EDT, after which I’m taking the deals down.

Reserve your pre-sale book now and while you are there, grab another course or book at a discounted rate.

Take me to the Kickstarter page!

Summary

In this tutorial, you learned how to get started with the NVIDIA Jetson Nano.

Specifically, you learned how to install the required system packages, configure your development environment, and install Keras and TensorFlow on the Jetson Nano.

We wrapped up learning how to change the default camera and perform image classification and object detection on the Jetson Nano using the pre-supplied scripts.

I’ll be providing a full comparison and benchmarks of the NVIDIA Jetson Nano, Google, Coral, and Movidius NCS in a future tutorial.

To be notified when future tutorials are published here on PyImageSearch (including the Jetson Nano vs. Google Coral vs. Movidus NCS benchmark), just enter your email address in the form below!

The post Getting started with the NVIDIA Jetson Nano appeared first on PyImageSearch.

A few weeks ago I published a tutorial on how to get started with the Google Coral USB Accelerator. That tutorial was meant to help you configure your device and run your first demo script.

Today we are going to take it a step further and learn how to utilize the Google Coral in your own custom Python scripts!

Inside today’s tutorial you will learn:

Image classification with the Coral USB Accelerator
Image classification in video with the Google Coral Accelerator
Object detection with the Google Coral
Object detection in video with the Coral USB Accelerator

After reading this guide, you will have a strong understanding of how to utilize the Google Coral for image classification and object detection in your own applications.

To learn how to perform image classification and object detection with the Google Coral USB Accelerator, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Object detection and image classification with Google Coral USB Accelerator

For this guide I will be making the following assumptions:

You already own a Google Coral USB Accelerator.
You have followed my previous tutorial on how to install and configure Google Coral.

If you haven’t followed by install guide, please refer to it before continuing. Finally, I’ll note that I’m connecting my Google Coral USB Accelerator to my Raspberry Pi to gather results — I’m doing this for two reasons:

I’m currently writing a book on using the Raspberry Pi for Computer Vision which will also cover the Google Coral.
I cover the Raspberry Pi quite often on the PyImageSearch blog and I know many readers are interested in how they can leverage it for computer vision.

If you don’t have a Raspberry Pi but still want to use your Google Coral USB Accelerator, that’s okay, but make sure you are running a Debian-based OS.

Again, refer to my previous Google Coral getting started guide for more information.

Project structure

Let’s review the project included in today’s “Downloads”:

$ tree --dirsfirst
.
├── inception_v4
│   ├── imagenet_labels.txt
│   └── inception_v4_299_quant_edgetpu.tflite
├── mobilenet_ssd_v2
│   ├── coco_labels.txt
│   └── mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
├── mobilenet_v2
│   ├── imagenet_labels.txt
│   └── mobilenet_v2_1.0_224_quant_edgetpu.tflite
├── classify_image.py
├── classify_video.py
├── detect_image.py
├── detect_video.py
├── janie.jpg
└── thanos.jpg

3 directories, 12 files

Today we will be reviewing four Python scripts:

```
classify_image.py
```
– Classifies a single image with the Google Coral.
```
classify_video.py
```
– Real-time classification of every frame from a webcam video stream using the Coral.
```
detect_image.py
```
– Performs object detection using Google’s Coral deep learning coprocessor.
```
detect_video.py
```
– Real-time object detection using Google Coral and a webcam.

We have three pre-trained TensorFlow Lite models + labels available in the “Downloads”:

Classification (trained on ImageNet):
- ```
inception_v4/
```
  – The Inception V4 classifier.
- ```
mobilenet_v2/
```
  – MobileNet V2 classifier.
Object detection (trained on COCO):
- ```
mobilenet_ssd_v2/
```
  – MobileNet V2 Single Shot Detector (SSD).

If you are curious about how to train your own classification and object detection models, be sure to refer to Deep Learning for Computer Vision with Python.

For both

classify_image.py

and

detect_image.py

, I’ve provided two testing images in the “Downloads”:

```
janie.jpg
```
– My adorable beagle.
```
thanos.jpg
```
– Character from Avengers: End Game.

For the

classify_video.py

and

detect_video.py

scripts, we’ll be capturing frames directly from a camera connected to the Raspberry Pi. You can use one of the following with today’s example scripts:

PiCamera V2 – The official Raspberry Pi Foundation camera.
USB Webcam – Any USB camera that supports V4L will work, such as a Logitech branded webcam.

Image classification with the Coral USB Accelerator

Figure 1: Image classification using Python with the Google Coral TPU USB Accelerator and the Raspberry Pi.

Let’s get started with image classification on the Google Coral!

Open up the

classify_image.py

file and insert the following code:

# import the necessary packages
from edgetpu.classification.engine import ClassificationEngine
from PIL import Image
import argparse
import imutils
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to TensorFlow Lite classification model")
ap.add_argument("-l", "--labels", required=True,
	help="path to labels file")
ap.add_argument("-i", "--image", required=True,
	help="path to input image")
args = vars(ap.parse_args())

We start of by importing packages. Most notably, we are importing

ClassificationEngine

from

edgetpu

on Line 2.

From there we’ll parse three command line arguments via Lines 10-17:

```
--model
```
: The path to our TensorFlow Lite classifier.
```
--labels
```
: Class labels file path associated with our model.
```
--image
```
: Our input image path.

Using these three command line arguments, our script will be able to handle compatible pre-trained models and any image you throw at it all from the command line. Command line arguments are one of the number one problems people e-mail me about, so be sure to review my tutorial on argparse and command line arguments if you need a refresher.

Let’s go ahead and load the

labels

# initialize the labels dictionary
print("[INFO] parsing class labels...")
labels = {}

# loop over the class labels file
for row in open(args["labels"]):
	# unpack the row and update the labels dictionary
	(classID, label) = row.strip().split(" ", maxsplit=1)
	labels[int(classID)] = label.strip()

Lines 21-27 facilitate loading class

labels

from a text file into a Python dictionary. Later on, the Coral API will return the predicted

classID

(an integer). We can then take that integer class label and lookup the associated

label

value in this dictionary.

Moving on, now let’s load our classification

model

with the

edgetpu

API:

# load the Google Coral classification model
print("[INFO] loading Coral model...")
model = ClassificationEngine(args["model"])

Our pre-trained TensorFlow Lite classification

model

is instantiated via the

ClassificationEngine

class (Line 31) where we pass in the path to our model via command line argument.

Let’s go ahead and load + preprocess our

image

# load the input image
image = cv2.imread(args["image"])
image = imutils.resize(image, width=500)
orig = image.copy()

# prepare the image for classification by converting (1) it from BGR
# to RGB channel ordering and then (2) from a NumPy array to PIL
# image format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = Image.fromarray(image)

Our

image

is loaded (Line 34) and then preprocessed (Lines 35-42).

Take note that we made an original

copy

of the image — we’ll be annotating this copy of the image with the output predictions later in the script.

How easy is it to perform classification inference on an image with the Google Coral Python API?

Let’s find out now:

# make predictions on the input image
print("[INFO] making predictions...")
start = time.time()
results = model.ClassifyWithImage(image, top_k=5)
end = time.time()
print("[INFO] classification took {:.4f} seconds...".format(
	end - start))

On Line 47, we make classification predictions on the input

image

using the

ClassifyWithImage

function (a super easy one-liner function call). I really like how the

edgetpu

API allows us to specify that we only want the top results with the

top_k

parameter.

Timestamps are sandwiched around this classification line and the elapse time is then printed via Lines 49 and 50.

From here we’ll process the

results

# loop over the results
for (i, (classID, score)) in enumerate(results):
	# check to see if this is the top result, and if so, draw the
	# label on the image
	if i == 0:
		text = "Label: {}, {:.2f}%".format(labels[classID],
			score * 100)
		cv2.putText(orig, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
			0.8, (0, 0, 255), 2)

	# display the classification result to the terminal
	print("{}. {}: {:.2f}%".format(i + 1, labels[classID],
		score * 100))

# show the output image
cv2.imshow("Image", orig)
cv2.waitKey(0)

Looping over the

results

(Line 53) we first find the top result and annotate the image with the label and percentage score (Lines 56-60).

For good measure, we’ll also print the other results and scores (but only in our terminal) via Lines 63 and 64.

Finally, the annotated original (OpenCV format) image is displayed to the screen (Lines 67 and 68).

That was straightforward. Let’s put our classification script to the test!

To see image classification in action with the Google Coral, make sure you use the “Downloads” section of this guide to download the code + pre-trained models — from there, execute the following command:

$ python classify_image.py --model inception_v4/inception_v4_299_quant_edgetpu.tflite --labels inception_v4/imagenet_labels.txt --image janie.jpg
[INFO] parsing class labels...
[INFO] loading Coral model...
W0507 08:04:36.445022    5885 package_registry.cc:65] Minimum runtime version required by package (5) is lower than expected (10).
[INFO] making predictions...
[INFO] classification took 1.2446 seconds...
1. beagle: 97.27%

The output of the image classification script can be seen in Figure 1 at the top of this section.

Here you can see that Janie, my dog, is correctly classified as “beagle”.

Image classification in video with the Google Coral Accelerator

Figure 2: Real-time classification with the Google Coral TPU USB Accelerator and Raspberry Pi using Python. OpenCV was used for preprocessing, annotation, and display.

In the previous section, we learned how to perform image classification to a single image — but what if we wanted to perform image classification to a video stream?

I’ll be showing you how to accomplish exactly that.

Open up a new file, name it

classify_video.py

and insert the following code:

# import the necessary packages
from edgetpu.classification.engine import ClassificationEngine
from imutils.video import VideoStream
from PIL import Image
import argparse
import imutils
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to TensorFlow Lite classification model")
ap.add_argument("-l", "--labels", required=True,
	help="path to labels file")
args = vars(ap.parse_args())

There are two differences in our first code block for real-time classification compared to our previous single image classification script:

On Line 2 we’ve added the
```
VideoStream
```
import for working with our webcam.
We no longer have a
```
--image
```
argument since by default we will be using our webcam.

Just as before, let’s load the

labels

and

model

, but now we also need to instantiate our

VideoStream

# initialize the labels dictionary
print("[INFO] parsing class labels...")
labels = {}

# loop over the class labels file
for row in open(args["labels"]):
	# unpack the row and update the labels dictionary
	(classID, label) = row.strip().split(" ", maxsplit=1)
	label = label.strip().split(",", maxsplit=1)[0]
	labels[int(classID)] = label

# load the Google Coral classification model
print("[INFO] loading Coral model...")
model = ClassificationEngine(args["model"])

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
#vs = VideoStream(usePiCamera=False).start()
time.sleep(2.0)

Lines 19-31 are identical to our previous script where we load our class labels and store them in a dictionary.

On Line 35 we instantiate our

VideoStream

object so that we can read frames in our webcam (covered in the next code block). A

2.0

second sleep is added so our camera has time to warm up (Line 37).

Note: By default, this script will use a USB webcam. If you would like to use a Raspberry Pi camera module, simply comment out Line 35 and uncomment Line 36.

Let’s begin our loop:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 500 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=500)
	orig = frame.copy()

	# prepare the frame for classification by converting (1) it from
	# BGR to RGB channel ordering and then (2) from a NumPy array to
	# PIL image format
	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frame = Image.fromarray(frame)

We start looping on Line 40.

Line 43 grabs a

frame

from the threaded video stream.

We go ahead and preprocess it exactly as we did in the previous script (Lines 44-51).

With the

frame

in the correct PIL format, now we can make predictions and draw our annotations:

# make predictions on the input frame
	start = time.time()
	results = model.ClassifyWithImage(frame, top_k=1)
	end = time.time()

	# ensure at least one result was found
	if len(results) > 0:
		# draw the predicted class label, probability, and inference
		# time on the output frame
		(classID, score) = results[0]
		text = "{}: {:.2f}% ({:.4f} sec)".format(labels[classID],
			score * 100, end - start)
		cv2.putText(orig, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
			0.5, (0, 0, 255), 2)

	# show the output frame and wait for a key press
	cv2.imshow("Frame", orig)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Just as before, Line 55 performs inference.

From there, the top result is extracted and the classification label +

score

is annotated on the

orig

frame (Lines 59-66).

The frame is displayed on the screen (Line 69).

If the

"q"

key is pressed, we’ll break from the loop and clean up (Lines 70-78).

Let’s give image classification in video streams with the Google Coral a try!

Make sure you use the “Downloads” section of this guide to download the code + pre-trained models — from there, execute the following command:

$ python classify_video.py --model mobilenet_v2/mobilenet_v2_1.0_224_quant_edgetpu.tflite --labels mobilenet_v2/imagenet_labels.txt 
[INFO] parsing class labels...
[INFO] loading Coral model...
W0507 07:52:49.077803    2830 package_registry.cc:65] Minimum runtime version required by package (5) is lower than expected (10).
[INFO] starting video stream...

An example of real-time image classification can be seen above in Figure 2.

Using the Google Coral USB Accelerator, the MobileNet classifier (trained on ImageNet) is fully capable of running in real-time on the Raspberry Pi.

Object detection with the Google Coral

Figure 3: Deep learning-based object detection of an image using Python, Google Coral, and the Raspberry Pi.

We’ve already learned how to apply image classification with the Google Coral — but what if we not only wanted to classify an object in an image but also detect where in the image the object is?

Such a task is called object detection, a technique I’ve covered quite a few times on the PyImageSearch blog (refer to this deep learning-based object detection guide if you are new to the concept).

Open up the

detect_image.py

file and let’s get coding:

# import the necessary packages
from edgetpu.detection.engine import DetectionEngine
from PIL import Image
import argparse
import imutils
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to TensorFlow Lite object detection model")
ap.add_argument("-l", "--labels", required=True,
	help="path to labels file")
ap.add_argument("-i", "--image", required=True,
	help="path to input image")
ap.add_argument("-c", "--confidence", type=float, default=0.3,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Our packages are imported on Lines 2-7. For Google Coral object detection with Python, we use the

DetectionEngine

from the

edgetpu

API.

Our command line arguments are similar to the

classify_image.py

script with one exception — we’re also going to supply a

--confidence

argument representing the minimum probability to filter out weak detections (Lines 17 and 18).

Now we’ll load the labels in the same manner as in our classification scripts:

# initialize the labels dictionary
print("[INFO] parsing class labels...")
labels = {}

# loop over the class labels file
for row in open(args["labels"]):
	# unpack the row and update the labels dictionary
	(classID, label) = row.strip().split(maxsplit=1)
	labels[int(classID)] = label.strip()

And from there we’ll load our object detection

model

# load the Google Coral object detection model
print("[INFO] loading Coral model...")
model = DetectionEngine(args["model"])

We can now load our input image and perform preprocessing:

# load the input image
image = cv2.imread(args["image"])
image = imutils.resize(image, width=500)
orig = image.copy()

# prepare the image for object detection by converting (1) it from
# BGR to RGB channel ordering and then (2) from a NumPy array to PIL
# image format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = Image.fromarray(image)

After preprocessing, it is time to perform object detection inference:

# make predictions on the input image
print("[INFO] making predictions...")
start = time.time()
results = model.DetectWithImage(image, threshold=args["confidence"],
	keep_aspect_ratio=True, relative_coord=False)
end = time.time()
print("[INFO] object detection took {:.4f} seconds...".format(
	end - start))

Lines 49 and 50 use Google Coral’s object detection API to make predictions.

Being able to pass our confidence threshold (via the

threshold

parameter), is extremely convenient in this API. Honestly, I wish OpenCV’s DNN API would follow suit. It saves an if-statement later on as you can imagine.

Let’s process our

results

# loop over the results
for r in results:
	# extract the bounding box and box and predicted class label
	box = r.bounding_box.flatten().astype("int")
	(startX, startY, endX, endY) = box
	label = labels[r.label_id]

	# draw the bounding box and label on the image
	cv2.rectangle(orig, (startX, startY), (endX, endY),
		(0, 255, 0), 2)
	y = startY - 15 if startY - 15 > 15 else startY + 15
	text = "{}: {:.2f}%".format(label, r.score * 100)
	cv2.putText(orig, text, (startX, y),
		cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# show the output image
cv2.imshow("Image", orig)
cv2.waitKey(0)

Looping over the

results

(Line 56), we first extract the bounding

box

coordinates (Lines 58 and 59). Conveniently, the

box

is already scaled relative to our input image dimensions (from any behind the scenes resizing the API does to fit the image into the CNN).

From there we can easily extract the class

label

via Line 60.

Next, we draw the bounding box rectangle (Lines 63 and 64) and draw the predicted object

text

on the image (Lines 65-68).

Our

orig

image (with object detection annotations) is then displayed via Lines 71 and 72.

Let’s put object detection with the Google Coral USB Accelerator to the test!

Use the “Downloads” section of this tutorial to download the source code + pre-trained models.

From there, open up a terminal and execute the following command:

$ python detect_image.py \
	--model mobilenet_ssd_v2/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite \
	--labels mobilenet_ssd_v2/coco_labels.txt --image thanos.jpg 
[INFO] parsing class labels...
[INFO] loading Coral model...
W0507 08:00:58.843066    4919 package_registry.cc:65] Minimum runtime version required by package (5) is lower than expected (10).
[INFO] making predictions...
[INFO] object detection took 0.2318 seconds...

Just for fun, I decided to apply object detection to a screen capture of Avengers: Endgame movie (don’t worry, there aren’t any spoilers!)

Here we can see that Thanos, a character from the film, is detected (Figure 3)…although I’m not sure he’s an actual “person” if you know what I mean.

Object detection in video with the Coral USB Accelerator

Figure 4: Real-time object detection with Google’s Coral USB deep learning coprocessor, the perfect companion for the Raspberry Pi.

Our final script will cover how to perform object detection in real-time video with the Google Coral.

Open up a new file, name it

detect_video.py

, and insert the following code:

# import the necessary packages
from edgetpu.detection.engine import DetectionEngine
from imutils.video import VideoStream
from PIL import Image
import argparse
import imutils
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to TensorFlow Lite object detection model")
ap.add_argument("-l", "--labels", required=True,
	help="path to labels file")
ap.add_argument("-c", "--confidence", type=float, default=0.3,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

To start, we import our required packages and parse our command line arguments (Lines 2-8) Again, we’re using

VideoStream

so we can access our webcam (since we’re performing object detection on webcam frames, we don’t have a

--image

command line argument).

Next, we’ll load our

labels

and instantiate both our

model

and video stream:

# initialize the labels dictionary
print("[INFO] parsing class labels...")
labels = {}

# loop over the class labels file
for row in open(args["labels"]):
	# unpack the row and update the labels dictionary
	(classID, label) = row.strip().split(maxsplit=1)
	labels[int(classID)] = label.strip()

# load the Google Coral object detection model
print("[INFO] loading Coral model...")
model = DetectionEngine(args["model"])

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
#vs = VideoStream(usePiCamera=False).start()
time.sleep(2.0)

From there, we’ll loop over frames from the video stream:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 500 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=500)
	orig = frame.copy()

	# prepare the frame for object detection by converting (1) it
	# from BGR to RGB channel ordering and then (2) from a NumPy
	# array to PIL image format
	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frame = Image.fromarray(frame)

	# make predictions on the input frame
	start = time.time()
	results = model.DetectWithImage(frame, threshold=args["confidence"],
		keep_aspect_ratio=True, relative_coord=False)
	end = time.time()

Our frame processing loop begins on Line 41. We proceed to:

Grab and preprocess our frame (Lines 44-52).
Perform object detection inference with the Google Coral (Lines 56 and 57).

From there we’ll process the results and display our output:

# loop over the results
	for r in results:
		# extract the bounding box and box and predicted class label
		box = r.bounding_box.flatten().astype("int")
		(startX, startY, endX, endY) = box
		label = labels[r.label_id]

		# draw the bounding box and label on the image
		cv2.rectangle(orig, (startX, startY), (endX, endY),
			(0, 255, 0), 2)
		y = startY - 15 if startY - 15 > 15 else startY + 15
		text = "{}: {:.2f}%".format(label, r.score * 100)
		cv2.putText(orig, text, (startX, y),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

	# show the output frame and wait for a key press
	cv2.imshow("Frame", orig)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Here we loop over each of the detected objects, grab the bounding box + class label, and annotate the frame (Lines 61-73).

The frame (with object detection annotations) is displayed via Line 76.

We’ll continue to process more frames unless the

"q"

(quit) key is pressed at which point we break and clean up (Lines 77-85).

Let’s put this Python + Coral object detection script to work!

To perform video object detection with the Google Coral, make sure you use the “Downloads” section of the guide to download the code + pre-trained models.

From there you can execute the following command to start the object detection script:

$ python detect_video.py \
	--model mobilenet_ssd_v2/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite \
	--labels mobilenet_ssd_v2/coco_labels.txt 
[INFO] parsing class labels...
[INFO] loading Coral model...
W0507 07:43:19.420830     377 package_registry.cc:65] Minimum runtime version required by package (5) is lower than expected (10).
[INFO] starting video stream...

For our final example of applying real-time object detection with the Google Coral, I decided to let Janie in my office for a bit as I recorded a demo (and even decided to sing her a little song) — you can see the result in Figure 4 above.

The problem with the Raspberry Pi 3B+ and Google Coral USB Accelerator

Figure 5: USB 3.0 is much faster than USB 2.0. To take full advantage of Google Coral’s deep learning capabilities a USB 3.0 port is required, however, the Raspberry Pi 3B+ does not include USB 3.0 capability. (image source)

You might have noticed that our inference results are pretty similar to what we obtain with the Movidius NCS — doesn’t Google advertise the Coral USB Accelerator as being faster than the NCS?

What’s the problem here?

Is it the Google Coral?

Is it our code?

Is our device configured incorrectly?

Actually, it’s none of the above.

The problem here is the Raspberry Pi 3B+ only supports USB 2.0.

The bottleneck is the I/O taking place from the CPU, to USB, to the Coral USB Accelerator, and back.

Inference speed will dramatically improve once the Raspberry Pi 4 is released (which will certainly support USB 3, giving us the fastest possible inference speeds with the Coral USB Accelerator).

What about custom models?

This tutorial has focused on state-of-the-art deep learning models that have been pre-trained on popular image datasets, including ImageNet (for classification) and COCO (for object detection).

But what if you wanted to run your own pre-trained models on the Google Coral?

Is this possible?

And if so, how can we do it?

I’ll be answering that exact question inside my upcoming book, Raspberry Pi for Computer Vision.

The book will be released in Autumn 2019, but if you pre-order your copy now, you’ll be getting a discount (the price of the book will increase once it officially releases later this year).

If you’re interested in computer vision + deep learning on embedded devices such as the:

Raspberry Pi
Movidius NCS
Google Coral
Jetson Nano

…then you should definitely pre-order your copy now.

Click here to pre-order your copy of RPi for CV!

Summary

In this tutorial, you learned how to utilize your Google Coral USB Accelerator for:

Image classification
Image classification in video
Object detection
Object detection in video

Specifically, we used pre-trained deep learning models, including:

Inception V4 (trained on ImageNet)
MobileNet V4 (trained on ImageNet)
MobileNet SSD V2 (trained on COCO)

Our results were far, far better than trying to use the Raspberry Pi CPU alone for deep learning inference.

Overall, I was very impressed with how easy it is to use the Google Coral and the

edgetpu

library in my own custom Python scripts.

I’m looking forward to seeing how the package develops (and hope they make it this easy to convert and run custom deep learning models on the Coral).

To download the source code and pre-trained to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

The post Object detection and image classification with Google Coral USB Accelerator appeared first on PyImageSearch.

In this tutorial, you will learn how to perform transfer learning with Keras, Deep Learning, and Python on your own custom datasets.

Imagine this:

You’re just hired by Yelp to work in their computer vision department.

Yelp has just launched a new feature on its website that allows reviewers to take photos of their food/dishes and then associate them with particular items on a restaurant’s menu.

It’s a neat feature…

…but they are getting a lot of unwanted image spam.

Certain nefarious users aren’t taking photos of their dishes…instead, they are taking photos of… (well, you can probably guess).

Your task?

Figure out how to create an automated computer vision application that can distinguish between “food” and “not food”, thereby allowing Yelp to continue with their new feature launch and provide value to their users.

So, how are you going to build such an application?

The answer lies in transfer learning via deep learning.

Today marks the start of a brand new set of tutorials on transfer learning using Keras. Transfer learning is the process of:

Taking a network pre-trained on a dataset
And utilizing it to recognize image/object categories it was not trained on

Essentially, we can utilize the robust, discriminative filters learned by state-of-the-art networks on challenging datasets (such as ImageNet or COCO), and then apply these networks to recognize objects the model was never trained on.

In general, there are two types of transfer learning in the context of deep learning:

Transfer learning via feature extraction
Transfer learning via fine-tuning

When performing feature extraction, we treat the pre-trained network as an arbitrary feature extractor, allowing the input image to propagate forward, stopping at pre-specified layer, and taking the outputs of that layer as our features.

Fine-tuning, on the other hand, requires that we update the model architecture itself by removing the previous fully-connected layer heads, providing new, freshly initialized ones, and then training the new FC layers to predict our input classes.

We’ll be covering both techniques in this series here on the PyImageSearch blog, but today we are going to focus on feature extraction.

To learn how to perform transfer learning via feature extraction with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Transfer learning with Keras and Deep Learning

Note: Many of the transfer learning concepts I’ll be covering in this series tutorials also appear in my book, Deep Learning for Computer Vision with Python. Inside the book, I go into much more detail (and include more of my tips, suggestions, and best practices). If you would like more detail on transfer learning after going through this guide, definitely take a look at my book.

In the first part of this tutorial, we will review two methods of transfer learning: feature extraction and fine-tuning.

I’ll then provide a detailed discussion of how to perform transfer learning via feature extraction (the primary focus of this tutorial).

From there, we’ll review Food-5k dataset, a dataset containing 5,000 images falling into two classes: “food” and “not-food”.

We’ll utilize transfer learning via feature extraction to recognize both of these classes in this tutorial.

Once we have a good handle on the dataset, we’ll start coding.

We’ll have a number of Python files to review, each accomplishing a specific step, including:

Creating a configuration file.
Building our dataset (i.e., putting the images in the proper directory structure).
Extracting features from our input images using Keras and pre-trained CNNs.
Training a Logistic Regression model on top of the extracted features.

Parts of the code we’ll be reviewing here today will also be utilized in the rest of the transfer learning series — if you intend on following along with the tutorials, take the time now to ensure you understand the code.

Two types of transfer learning: feature extraction and fine-tuning

Figure 1: Via “transfer learning”, we can utilize a pre-existing model such as one trained to classify dogs vs. cats. Using that pre-trained model we can break open the CNN and then apply “transfer learning” to another, completely different dataset (such as bears). We’ll learn how to apply transfer learning with Keras and deep learning in the rest of this blog post.

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning, please refer to the text.

Consider a traditional machine learning scenario where we are given two classification challenges.

In the first challenge, our goal is to train a Convolutional Neural Network to recognize dogs vs. cats in an image.

Then, in the second project, we are tasked with recognizing three separate species of bears: grizzly bears, polar bears, and giant pandas.

Using standard practices in machine learning/deep learning, we could treat these challenges as two separate problems:

First, we would gather a sufficient labeled dataset of dogs and cats, followed by training a model on the dataset
We would then repeat the process a second time, only this time, gathering images of our bear breeds, and then training a model on top of the labeled dataset.

Transfer learning proposes a different paradigm — what if we could utilize an existing pre-trained classifier as a starting point for a new classification, object detection, or instance segmentation task?

Using transfer learning in the context of the proposed challenges above, we would:

First train a Convolutional Neural Network to recognize dogs versus cats
Then, use the same CNN trained on the dog and cat data and use it to distinguish between the bear classes, even though no bear data was mixed with the dog and cat data during the initial training

Does this sound too good to be true?

It’s actually not.

Deep neural networks trained on large-scale datasets such as ImageNet and COCO have proven to be excellent at the task of transfer learning.

These networks learn a set of rich, discriminative features capable of recognizing 100s to 1,000s of object classes — it only makes sense that these filters can be reused for tasks other than what the CNN was originally trained on.

In general, there are two types of transfer learning when applied to deep learning for computer vision:

Treating networks as arbitrary feature extractors.
Removing the fully-connected layers of an existing network, placing a new set of FC layers on top of the CNN, and then fine-tuning these weights (and optionally previous layers) to recognize the new object classes.

In this blog post, we’ll focus primarily on the first method of transfer learning, treating networks as feature extractors.

We’ll discuss fine-tuning networks later in this series on transfer learning with deep learning.

Transfer learning via feature extraction

Figure 2: Left: The original VGG16 network architecture that outputs probabilities for each of the 1,000 ImageNet class labels. Right: Removing the FC layers from VGG16 and instead of returning the final POOL layer. This output will serve as our extracted features.

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on feature extraction, please refer to the text.

Typically, you’ll treat a Convolutional Neural Network as an end-to-end image classifier:

We input an image to the network.
The image forward propagates through the network.
We obtain our final classification probabilities at the end of the network.

However, there is no “rule” that says we must allow the image to forward propagate through the entire network.

Instead, we can:

Stop propagation at an arbitrary, but pre-specified layer (such as an activation or pooling layer).
Extract the values from the specified layer.
Treat the values as a feature vector.

For example, let’s consider the VGG16 network by Simonyan and Zisserman in Figure 2 (left) at the top of this section.

Along with the layers in the network, I have also included the input and output volume shapes for each layer.

When treating networks a feature extractor, we essentially “chop off” the network at our pre-specified layer (typically prior to the fully-connected layers, but it really depends on your particular dataset).

If we were to stop propagation before the fully-connected layers in VGG16, the last layer in the network would become the max-pooling layer (Figure 2, right), which will have an output shape of 7 x 7 x 512. Flattening, this volume into a feature vector we would obtain a list of 7 x 7 x 512 = 25,088 values — this list of numbers serves as our feature vector used to quantify the input image.

We can then repeat the process for our entire dataset of images.

Given a total of N images in our network, our dataset would now be represented as a list of N vectors, each of 25,088-dim.

Once we have our feature vectors, we can train off-the-shelf machine learning models such as Linear SVM, Logistic Regression, Decision Trees, or Random Forests on top of these features to obtain a classifier that can recognize new classes of images.

That said, the two most common machine learning models you’ll see for transfer learning via feature extraction are:

Logistic Regression
Linear SVM

Why those two models?

First, keep in mind our feature extractor is a CNN.

CNN’s are non-linear models capable of learning non-linear features — we are assuming that the features learned by the CNN are already robust and discriminative.

The second, and perhaps arguably more important reason, is that our feature vectors tend to be very large and have high dimensionality.

We, therefore, need a fast model that can be trained on top of the features — linear models tend to be very fast to train.

For example, our dataset of 5,000 images, each represented by a feature vector of 25,088-dim, can be trained in a few seconds using a Logistic Regression model.

To wrap up this section, I want you to keep in mind that the CNN itself is not capable of recognizing these new classes.

Instead, we are using the CNN as an intermediary feature extractor.

The downstream machine learning classifier will take care of learning the underlying patterns of the features extracted from the CNN.

The Foods-5K dataset

Figure 3: We will apply transfer learning to the Foods-5K dataset using Python, Keras, and Deep Learning.

The dataset we’ll be using here today is the Food-5K dataset, curated by the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology.

The dataset, as the name suggests, consists of 5,000 images, belonging to two classes:

Food
Non-food

Our goal of is to train a classifier such that we can distinguish between these two classes.

MSPG has provided us with pre-split training, validation, and testing splits. We’ll be using these splits both in this guide on transfer learning via extraction as well as the rest of our tutorials on feature extraction.

Downloading the Food-5K dataset

Go ahead and grab the zip associated with today’s “Downloads”.

Once you’ve download the source code, change directory into

transfer-learning-keras

$ unzip transfer-learning-keras.zip
$ cd transfer-learning-keras
$ mkdir Food-5K
$ cd Food-5K

In my experience, I’ve found that downloading the Food-11 dataset is unreliable.

Therefore I’m presenting two options to download the dataset:

Option 1: Use wget in your terminal

The wget application comes on Ubuntu and other Linux distros. On macOS, you must install it:

$ brew install wget

To download the Food-5K dataset, let’s use

wget

in our terminal:

$ wget --passive-ftp --ftp-user FoodImage@grebvm2.epfl.ch \
	--ftp-password Cahc1moo ftp://tremplin.epfl.ch/Food-5K.zip

Note: At least on macOS, I’ve found that if the

wget

command fails once, just run it again and then the download will start.

Option 2: Use FileZilla

FileZilla is a GUI application for FTP and SCP connections. You may download it for your OS here.

Once you’ve installed and launched the application, enter the credentials:

Host: tremplin.epfl.ch
Username: FoodImage@grebvm2.epfl.ch
Password: Cahc1moo

You can then connect and download the file into the appropriate destination.

Figure 4: Downloading the Food-5K dataset with FileZilla.

The username and password combination were obtained from the official Food-5K dataset website. If the username/password combination stops working for you, check to see if the dataset curators changed the login credentials.

Once downloaded, we can go ahead and unzip the dataset:

$ unzip Food-5k.zip

Project structure

Now that we have today’s zip and the dataset, let’s inspect the entire project directory.

First, navigate back up to the project’s root:

$ cd ..

Then, use the

tree

command with arguments as shown:

$ tree --dirsfirst --filelimit 10
.
├── Food-5K
│   ├── evaluation [1000 entries]
│   ├── training [3000 entries]
│   ├── validation [1000 entries]
│   └── Food-5K.zip
├── dataset
├── output
├── pyimagesearch
│   ├── __init__.py
│   └── config.py
├── build_dataset.py
├── extract_features.py
└── train.py

7 directories, 6 files

As you can see, the

Food-5K/

contains

evaluation/

training/

, and

validation/

sub-directories. Each sub-directory contains 1,000

.jpg

image files.

Our

dataset/

directory, while empty now, will soon contain the Food-5K images in a more organized form (to be discussed in the section, “Building our dataset for feature extraction”).

Upon successfully executing today’s Python scripts, the

output/

directory will house our extracted features (stored in three separate

.csv

files) as well as our label encoder and model (both of which are in

.cpickle

format).

Our Python scripts include:

```
pyimagesearch/config.py
```
: Our custom configuration file will help us manage our dataset, class names, and paths. It is written in Python directly so that we can use
```
os.path
```
to build OS-specific formatted file paths directly in the script.
```
build_dataset.py
```
: Using the configuration, this script will create an organized dataset on disk, making it easy to extract features from.
```
extract_features.py
```
: The transfer learning magic begins here. This Python script will use a pre-trained CNN to extract raw features, storing the results in a
```
.csv
```
file. The label encoder
```
.cpickle
```
file will also be output via this script.
```
train.py
```
: Our training script will train a Logistic Regression model on top of the previously computed features. We will evaluate and save the resulting model as a
```
.cpickle
```
.

The

config.py

and

build_dataset.py

scripts will be re-used in the rest of the series on transfer learning so make sure you pay close attention to them!

Our configuration file

Let’s get started by reviewing our configuration file.

Open up

config.py

in the

pyimagesearch

submodule and insert the following code:

# import the necessary packages
import os

# initialize the path to the *original* input directory of images
ORIG_INPUT_DATASET = "Food-5K"

# initialize the base path to the *new* directory that will contain
# our images after computing the training and testing split
BASE_PATH = "dataset"

We begin with a single import. We’ll use the

os

module (Line 2) in this config to concatenate paths properly.

The

ORIG_INPUT_DATASET

is the path to the original input dataset (i.e., where you downloaded and unarchived the Food-5K dataset).

The next path,

BASE_PATH

, will be where our dataset is organized (the result of executing

build_dataset.py

Note: The directory structure is not especially useful for this particular post, but it will be later in the series once we get to fine-tuning. Again, I consider organizing datasets in this manner a “best practice” for reasons you’ll see in this series.

Let’s specify more dataset configs as well as our class labels and batch size:

# define the names of the training, testing, and validation
# directories
TRAIN = "training"
TEST = "evaluation"
VAL = "validation"

# initialize the list of class label names
CLASSES = ["non_food", "food"]

# set the batch size
BATCH_SIZE = 32

The path to output training, evaluation, and validation directories is specified on Lines 13-15.

The

CLASSES

are specified in list form on Line 18. As previously mentioned, we’ll be working with

"food"

and

"non_food"

images.

When extracting features, we’ll break our data into bite-sized chunks called batches. The

BATCH_SIZE

is specified on Line 21.

Finally, we can build the rest of our paths:

# initialize the label encoder file path and the output directory to
# where the extracted features (in CSV file format) will be stored
LE_PATH = os.path.sep.join(["output", "le.cpickle"])
BASE_CSV_PATH = "output"

# set the path to the serialized model after training
MODEL_PATH = os.path.sep.join(["output", "model.cpickle"])

Our label encoder path is concatenated on Line 25 where the result of joining the paths is

output/le.cpickle

on Linux/Mac or

output\le.cpickle

on Windows.

The extracted features will live in a CSV file in the path specified in

BASE_CSV_PATH

Lastly, we assemble the path to our exported model file in

MODEL_PATH

Building our dataset for feature extraction

Before we can extract features from our set of input images, let’s take the time to organize our images on disk.

I prefer to have my dataset on disk organized in the format of:

dataset_name/class_label/example_of_class_label.jpg

Maintaining this directory structure:

Not only keeps our dataset organized on disk…
…but also enables us to utilize Keras’
flow_from_directory
function when we get to fine-tuning later in this series of tutorials.

Since the Food-5K dataset also provides pre-supplied data splits, our final directory structure will have the form:

dataset_name/split_name/class_label/example_of_class_label.jpg

Let’s go ahead and build our dataset + directory structure now.

Open up the

build_dataset.py

file and insert the following code:

# import the necessary packages
from pyimagesearch import config
from imutils import paths
import shutil
import os

# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.ORIG_INPUT_DATASET, split])
	imagePaths = list(paths.list_images(p))

Our packages are imported on Lines 2-5. We’ll use our

config

(Line 2) throughout this script to recall our settings. The other three imports —

paths

shutil

, and

os

— will allow us to traverse directories, create folders, and copy files.

On Line 8 we begin looping over our training, testing, and validation splits.

Lines 11 and 12 create a list of all

imagePaths

in the split.

From there we’ll go ahead and loop over the

imagePaths

# loop over the image paths
	for imagePath in imagePaths:
		# extract class label from the filename
		filename = imagePath.split(os.path.sep)[-1]
		label = config.CLASSES[int(filename.split("_")[0])]

		# construct the path to the output directory
		dirPath = os.path.sep.join([config.BASE_PATH, split, label])

		# if the output directory does not exist, create it
		if not os.path.exists(dirPath):
			os.makedirs(dirPath)

		# construct the path to the output image file and copy it
		p = os.path.sep.join([dirPath, filename])
		shutil.copy2(imagePath, p)

For each

imagePath

in the split, we proceed to:

Extract the class
```
label
```
from the filename (Lines 17 and 18).
Construct the path to the output directory based on the
```
BASE_PATH
```
,
```
split
```
, and
```
label
```
(Line 21).
Create
```
dirPath
```
(if necessary) via Lines 24 and 25.
Copy the image into the destination path (Lines 28 and 29).

Now that

build_dataset.py

has been coded, use the “Downloads” section of the tutorial to download an archive of the source code.

You can then execute

build_dataset.py

using the following command:

$ python build_dataset.py
[INFO] processing 'training split'...
[INFO] processing 'evaluation split'...
[INFO] processing 'validation split'...

Here you can see that our script executed successfully.

To verify your directory structure on disk, use the

ls

command:

$ ls dataset/
evaluation  training  validation

Inside the dataset directory, we have our training, evaluation, and validation splits.

And inside each of those directories, we have class labels directories:

$ ls dataset/training/
food  non_food

Extracting features from our dataset using Keras and pre-trained CNNs

Let’s move on to the actual feature extraction component of transfer learning.

All code used for feature extraction using a pre-trained CNN will live inside

extract_features.py

— open up that file and insert the following code:

# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from keras.applications import VGG16
from keras.applications import imagenet_utils
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
from pyimagesearch import config
from imutils import paths
import numpy as np
import pickle
import random
import os

# load the VGG16 network and initialize the label encoder
print("[INFO] loading network...")
model = VGG16(weights="imagenet", include_top=False)
le = None

On Lines 2-12, all the packages necessary for extracting features are imported. Most notably this includes

VGG16

VGG16 is the convolutional neural network (CNN) we are using for transfer learning (Line 3).

On Line 16, we load the

model

while specifying two parameters:

```
weights="imagenet"
```
: Pre-trained ImageNet weights are loaded for transfer learning.
```
include_top=False
```
: We do not include the fully-connected head with the softmax classifier. In other words, we chop off the head of the network.

With weights dialed in and by loading our model without the head, we are now ready for transfer learning. We will use the output values of the network directly, storing the results as feature vectors.

Finally, our label encoder is initialized on Line 17.

Let’s loop over our data splits:

# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.BASE_PATH, split])
	imagePaths = list(paths.list_images(p))

	# randomly shuffle the image paths and then extract the class
	# labels from the file paths
	random.shuffle(imagePaths)
	labels = [p.split(os.path.sep)[-2] for p in imagePaths]

	# if the label encoder is None, create it
	if le is None:
		le = LabelEncoder()
		le.fit(labels)

	# open the output CSV file for writing
	csvPath = os.path.sep.join([config.BASE_CSV_PATH,
		"{}.csv".format(split)])
	csv = open(csvPath, "w")

Looping over each

split

(training, testing, and validation) begins on Line 20.

First, we grab all

imagePaths

for the

split

(Lines 23 and 24).

Paths are randomly shuffled via Line 28, and from there, our class

labels

are extracted from the paths themselves (Line 29).

If necessary, our label encoder is instantiated and fitted (Lines 32-34), ensuring we can convert the string class labels to integers.

Next, we construct the path to output our CSV files (Lines 37-39). We will have three CSV files — one for each data split. Each CSV will have N number of rows — one for each of the images in the data split.

The next step is to loop over our

imagePaths

BATCH_SIZE

chunks:

# loop over the images in batches
	for (b, i) in enumerate(range(0, len(imagePaths), config.BATCH_SIZE)):
		# extract the batch of images and labels, then initialize the
		# list of actual images that will be passed through the network
		# for feature extraction
		print("[INFO] processing batch {}/{}".format(b + 1,
			int(np.ceil(len(imagePaths) / float(config.BATCH_SIZE)))))
		batchPaths = imagePaths[i:i + config.BATCH_SIZE]
		batchLabels = le.transform(labels[i:i + config.BATCH_SIZE])
		batchImages = []

To create our batches of

imagePaths

, we use Python’s

range

function. The function accepts three parameters:

start

stop

, and

step

. You can read more about

range

in this detailed explanation.

Our batches will step through the entire list of

imagePaths

. The

step

is our batch size (

unless you adjust it in the configuration settings).

On Lines 48 and 49 the current batch of image paths and labels are extracted using array slicing. Our

batchImages

list is then initialized on Line 50.

Let’s go ahead and populate our

batchImages

now:

# loop over the images and labels in the current batch
		for imagePath in batchPaths:
			# load the input image using the Keras helper utility
			# while ensuring the image is resized to 224x224 pixels
			image = load_img(imagePath, target_size=(224, 224))
			image = img_to_array(image)

			# preprocess the image by (1) expanding the dimensions and
			# (2) subtracting the mean RGB pixel intensity from the
			# ImageNet dataset
			image = np.expand_dims(image, axis=0)
			image = imagenet_utils.preprocess_input(image)

			# add the image to the batch
			batchImages.append(image)

Looping over

batchPaths

(Line 53), we will load each

image

, preprocess it, and gather it into

batchImages

The

image

itself is loaded on Line 56.

Preprocessing includes:

Resizing to 224×224 pixels via the
```
target_size
```
parameter on Line 56.
Converting to array format (Line 57).
Adding a batch dimension (Line 62).
Mean subtraction (Line 63).

If these preprocessing steps appear foreign, please refer to Deep Learning for Computer Vision with Python.

Finally, the

image

is added to the batch via Line 66.

Now we will pass the batch of images through our network to extract features:

# pass the images through the network and use the outputs as
		# our actual features, then reshape the features into a
		# flattened volume
		batchImages = np.vstack(batchImages)
		features = model.predict(batchImages, batch_size=config.BATCH_SIZE)
		features = features.reshape((features.shape[0], 7 * 7 * 512))

Our batch of images is sent through the network via Lines 71 and 72.

Keep in mind that we have removed the fully-connected layer head of the network. Instead, the forward propagation stops at the max-pooling layer. We will treat the output of the max-pooling layer as a list of

features

, also known as a “feature vector”.

The output dimension of the max-pooling layer is (batch_size, 7 x 7 x 512). We can thus

reshape

the

features

into a NumPy array of shape

(batch_size, 7 * 7 * 512)

, treating the output of the CNN as a feature vector.

Let’s wrap up this script:

# loop over the class labels and extracted features
		for (label, vec) in zip(batchLabels, features):
			# construct a row that exists of the class label and
			# extracted features
			vec = ",".join([str(v) for v in vec])
			csv.write("{},{}\n".format(label, vec))

	# close the CSV file
	csv.close()

# serialize the label encoder to disk
f = open(config.LE_PATH, "wb")
f.write(pickle.dumps(le))
f.close()

Maintaining our batch efficiency, the

features

and associated class labels are written to our CSV file (Lines 76-80).

Inside the CSV file, the class

label

is the first field in each row (enabling us to easily extract it from the row during training). The feature

vec

follows.

Each CSV file will be closed via Line 83. Recall that upon completion we will have one CSV file per data split.

Finally, we can dump the label encoder to disk (Lines 86-88).

Let’s go ahead and extract features from our dataset using the VGG16 network pre-trained on ImageNet.

Use the “Downloads” section of this tutorial to download the source code, and from there, execute the following command:

$ python extract_features.py
[INFO] loading network...
[INFO] processing 'training split'...
...
[INFO] processing batch 92/94
[INFO] processing batch 93/94
[INFO] processing batch 94/94
[INFO] processing 'evaluation split'...
...
[INFO] processing batch 30/32
[INFO] processing batch 31/32
[INFO] processing batch 32/32
[INFO] processing 'validation split'...
...
[INFO] processing batch 30/32
[INFO] processing batch 31/32
[INFO] processing batch 32/32

On an NVIDIA K80 GPU it took 2m55s to extract features from the 5,000 images in the Food-5K dataset.

You can use a CPU instead, but it will take quite a bit longer.

Implementing our training script

The final step for transfer learning via feature extraction is to implement a Python script that will take our extracted features from the CNN and then train a Logistic Regression model on top of the features.

Again, keep in mind that our CNN did not predict anything! Instead, the CNN was treated as an arbitrary feature extractor.

We inputted an image to the network, it was forward propagated, and then we extracted the layer outputs from the max-pooling layer — these outputs serve as our feature vectors.

To see how we can train a model on these feature vectors, open up the

train.py

file and let’s get to work:

# import the necessary packages
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from pyimagesearch import config
import numpy as np
import pickle
import os

def load_data_split(splitPath):
	# initialize the data and labels
	data = []
	labels = []

	# loop over the rows in the data split file
	for row in open(splitPath):
		# extract the class label and features from the row
		row = row.strip().split(",")
		label = row[0]
		features = np.array(row[1:], dtype="float")

		# update the data and label lists
		data.append(features)
		labels.append(label)

	# convert the data and labels to NumPy arrays
	data = np.array(data)
	labels = np.array(labels)

	# return a tuple of the data and labels
	return (data, labels)

On Lines 2-7 we import our required packages. Notably, we’ll use

LogisticRegression

as our machine learning classifier. Fewer imports are required for our training script as compared to extracting features. This is partly because the training script itself is actually simpler.

Let’s define a function named

load_data_split

on Line 9. This function is responsible for loading all data and labels given the path of a data split CSV file (the

splitPath

parameter).

Inside the function, we start by initializing, our

data

and

labels

lists (Lines 11 and 12).

From there we open the CSV and loop over all rows beginning on Line 15. In the loop, we:

Load all comma separated values from the
```
row
```
into a list (Line 17).
Grab the class
```
label
```
via Line 18 (it is the first value in the list).
Extract all
```
features
```
from the row (Line 19). These are all values in the list except the class label. The result is our feature vector.
From there, we append the feature vector and
```
label
```
to the
```
data
```
and
```
labels
```
lists respectively (Lines 22 and 23).

Finally, the

data

and

labels

are returned to the calling function (Line 30).

With the

load_data_spit

function ready to go, let’s put it to work by loading our data:

# derive the paths to the training and testing CSV files
trainingPath = os.path.sep.join([config.BASE_CSV_PATH,
	"{}.csv".format(config.TRAIN)])
testingPath = os.path.sep.join([config.BASE_CSV_PATH,
	"{}.csv".format(config.TEST)])

# load the data from disk
print("[INFO] loading data...")
(trainX, trainY) = load_data_split(trainingPath)
(testX, testY) = load_data_split(testingPath)

# load the label encoder from disk
le = pickle.loads(open(config.LE_PATH, "rb").read())

Lines 33-41 load our training and testing feature data from disk. We’re using our function from the previous code block to handle the loading process.

Line 44 loads our label encoder.

With our data in memory, we’re now ready to train our machine learning classifier:

# train the model
print("[INFO] training model...")
model = LogisticRegression(solver="lbfgs", multi_class="auto")
model.fit(trainX, trainY)

# evaluate the model
print("[INFO] evaluating...")
preds = model.predict(testX)
print(classification_report(testY, preds, target_names=le.classes_))

# serialize the model to disk
print("[INFO] saving model...")
f = open(config.MODEL_PATH, "wb")
f.write(pickle.dumps(model))
f.close()

Lines 48 and 49 are responsible for initializing and training our Logistic Regression

model

Note: To learn more about Logistic Regression and other machine learning algorithms in detail, be sure to refer to PyImageSearch Gurus, my flagship computer vision course and community.

Lines 53 and 54 facilitate evaluating the

model

on the testing set and printing classification statistics in the terminal.

Finally, the

model

is output in Python’s pickle format (Lines 58-60).

That’s a wrap for our training script! As you’ve learned, writing code for training a Logistic Regression model on top of feature data is very straightforward. In the next section, we will run the training script.

If you are wondering how we would handle so much feature data that it can’t fit into memory all at once, stay tuned for next week’s tutorial.

Note: This tutorial is long enough as is, so I haven’t covered how to tune the hyperparameters to the Logistic Regression model, something I definitely recommend doing to ensure to obtain the highest accuracy possible. If you’re interested in learning more about transfer learning, and how to tune hyperparameters, during feature extraction, be sure to refer to Deep Learning for Computer Vision with Python where I cover the techniques in more detail.

Training a model on the extracted features

At this point, we are ready to perform the final step on transfer learning via feature extraction with Keras.

Let’s briefly review what we have done so far:

Downloaded the Food-5K dataset (5,000 images belonging to two classes, “food” and “non-food”, respectively).
Restructured the original directory structure of the dataset in a format more suitable for transfer learning (in particular, fine-tuning which we’ll be covering later in this series).
Extracted features from the images using VGG16 pre-trained on ImageNet.

And now, we’re going to train a Logistic Regression model on top of these extracted features.

Again, keep in mind that VGG16 was not trained to recognize the “food” versus “non-food” classes. Instead, it was trained to recognize 1,000 ImageNet classes.

But, by leveraging:

Feature extraction with VGG16
And applying a Logistic Regression classifier on top of those extracted features

…we will be able to recognize the new classes, even though VGG16 was never trained to recognize them!

Go ahead and use the “Downloads” section of this tutorial to download the source code to this guide.

From there, open up a terminal and execute the following command:

$ python train.py
[INFO] loading data...
[INFO] training model...
[INFO] evaluating...
              precision    recall  f1-score   support

        food       0.99      0.98      0.98       500
    non_food       0.98      0.99      0.99       500

   micro avg       0.98      0.98      0.98      1000
   macro avg       0.99      0.98      0.98      1000
weighted avg       0.99      0.98      0.98      1000

[INFO] saving model...

Training on my machine took only 27 seconds, and as you can see from our output, we are obtaining 98-99% accuracy on the testing set!

When should I use transfer learning and feature extraction?

Transfer learning via feature extraction is often one of the easiest methods to obtain a baseline accuracy in your own projects.

Whenever I am confronted with a new deep learning project, I often throw feature extraction with Keras at it just to see what happens:

In some cases, the accuracy is sufficient.
In others, it requires me to tune the hyperparameters to my Logistic Regression model or try another pre-trained CNN.
And in other situations, I need to explore fine-tuning or even training from scratch with a custom CNN architecture.

Regardless, in the best case transfer learning via feature extraction gives me good accuracy and the project can be completed.

And in the worst case I’ll gain a baseline to beat with my future experiments.

What’s next — where do I learn more about transfer learning and feature extraction?

In this tutorial, you learned how to perform transfer learning via feature extraction and then train a model on top of the extracted features.

But I know as soon as this post is published I’m going to receive emails and questions in the comments regarding:

“How do I classify images outside my training/testing set?”
“How do I load an image from disk, extract features from it using a CNN, and then classify it using the Logistic Regression model?”
“How do I correctly preprocess my input image before classification?”

Today’s tutorial is long enough as it is. I can’t include those sections of Deep Learning for Computer Vision with Python inside this post.

If you’d like to learn more about transfer learning, including:

More details on the concept of transfer learning
How to perform feature extraction
How to fine-tune networks
How to classify images outside your training/testing set using both feature extraction and fine-tuning

…then you’ll definitely want to refer to Deep Learning for Computer Vision with Python.

Besides chapters on transfer learning, you’ll also find:

Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Grab my free sample chapters!

Summary

Today marked the start of our series on transfer learning with Keras and Deep Learning.

The two primary forms of feature extraction via deep learning are:

Feature extraction
Fine-tuning

The focus of today’s tutorial was on feature extraction, the process of treating a pre-trained network as an arbitrary feature extractor.

The steps to perform transfer learning via feature extraction include:

Starting with a pre-trained network (typically on a dataset such as ImageNet or COCO; large enough for the model to learn discriminative filters).
Allowing an input image to forward propagate to an arbitrary (pre-specified) layer.
Taking the output of that layer and treating it as a feature vector.
Training a “standard” machine learning model on the dataset of extracted features.

The benefit of performing transfer learning via feature extraction is that we do not need to train (or re-train) our neural network.

Instead, the network serves as a black box feature extractor.

Those extracted features, which are assumed to be non-linear in nature (since they were extracted from a CNN), are then passed into a linear model for classification.

If you’re interested in learning more about transfer learning, feature extraction, and fine-tuning, be sure to refer to my book, Deep Learning for Computer Vision with Python where I cover the topic in more detail.

I hope you enjoyed today’s post! Stay tuned for next week when we discuss how to work with feature extraction when our dataset is too large too fit into memory.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

The post Transfer Learning with Keras and Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to use Keras for feature extraction on image datasets too big to fit into memory. You’ll utilize ResNet-50 (pre-trained on ImageNet) to extract features from a large image dataset, and then use incremental learning to train a classifier on top of the extracted features.

Today is part two in our three-part series on transfer learning with Keras:

Part 1: Transfer Learning with Keras and Deep Learning (last week’s tutorial)
Part 2: Keras: Feature extraction on large datasets (today’s post)
Part 3: Fine-tuning with Keras and Deep Learning (next week’s tutorial)

Last week we discussed how to perform transfer learning using Keras — inside that tutorial we focused primarily on transfer learning via feature extraction.

Using this method we were able to utilize CNNs to recognize classes it was never trained on!

The problem with that method is that it assumes that all of our extracted features can fit into memory — that may not always be the case!

For example, suppose we have a dataset of 50,000 images and wanted to utilize the ResNet-50 network for feature extraction via the final layer prior to the FC layers — that output volume would be of size 7 x 7 x 2048 = 100,352-dim.

If we had 50,000 of such 100,352-dim feature vectors (assuming 32-bit floats), then we would need a total of 40.14GB of RAM to store the entire set of feature vectors in memory!

Most people don’t have 40GB+ of RAM in their machines, so in those situations, we need to be able to perform incremental learning and train our model on incremental subsets of the data.

The rest of today’s tutorial will show you how to do exactly that.

To learn how to utilize Keras for feature extraction on large datasets, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras: Feature extraction on large datasets with Deep Learning

In the first part of this tutorial, we’ll briefly discuss the concept of treating networks as feature extractors (which was covered in more detail in last week’s tutorial).

From there we’ll investigate the scenario in which your extracted feature dataset is too large to fit into memory — in those situations, we’ll need to apply incremental learning to our dataset.

Next, we’ll implement Python source code that can be used for:

Keras feature extraction
Followed by incremental learning on the extracted features

Let’s get started!

Networks as feature extractors

Figure 1: Left: The original VGG16 network architecture that outputs probabilities for each of the 1,000 ImageNet class labels. Right: Removing the FC layers from VGG16 and instead returning the final POOL layer. This output will serve as our extracted features.

When performing deep learning feature extraction, we treat the pre-trained network as an arbitrary feature extractor, allowing the input image to propagate forward, stopping at pre-specified layer, and taking the outputs of that layer as our features.

Doing so, we can still utilize the robust, discriminative features learned by the CNN. We can also use them to recognize classes the CNN was never trained on!

An example of feature extraction via deep learning can be seen in Figure 1 at the top of this section.

Here we take the VGG16 network, allow an image to forward propagate to the final max-pooling layer (prior to the fully-connected layers), and extract the activations at that layer.

The output of the max-pooling layer has a volume shape of 7 x 7 x 512 which we flatten into a feature vector of 21,055-dim.

Given a dataset of N images, we can repeat the process of feature extraction for all images in the dataset, leaving us with a total of N x 21,055-dim feature vectors.

Given these features, we can train a “standard” machine learning model (such as Logistic Regression or Linear SVM) on these features.

Note: Feature extraction via deep learning was covered in much more detail in last week’s post — refer to it if you have any questions on how feature extraction works.

What if your extracted features are too large to fit into memory?

Feature extraction via deep learning is all fine and good…

…but what happens when your extracted features are too large to fit into memory?

Keep in mind that (most implementations of, including scikit-learn) Logistic Regression and SVMs require your entire dataset to be accessible all at once for training (i.e., the entire dataset must fit into RAM).

That’s great, but if you have 50GB, 100GB, or even 1TB of extracted features, what are you going to do?

Most people don’t have access to machines with so much memory.

So, what do you do then?

Solution: Incremental learning (i.e., “online learning”)

Figure 2: The process of incremental learning plays a role in deep learning feature extraction on large datasets.

When your entire dataset does not fit into memory you need to perform incremental learning (sometimes called “online learning”).

Incremental learning enables you to train your model on small subsets of the data called batches.

Using incremental learning the training process becomes:

Load a small batch of data from the dataset
Train the model on the batch
Repeat looping through the dataset in batches, training as we go, until we reach convergence

But wait — doesn’t that process sound familiar?

It should.

It’s exactly how we train neural networks.

Neural networks are excellent examples of incremental learners.

And in fact, if you check out the scikit-learn documentation, you’ll find that the classification models for incremental learning are either NNs themselves or directly related to NNs (i.e.,

Perceptron

and

SGDClassifier

Instead of using scikit-learn’s incremental learning models, we are going to implement our own neural network using Keras.

This NN will be trained on top of our extracted features from the CNN.

Our training process now becomes:

Extract all features from our image dataset using a CNN.
Train a simple, feedforward neural network on top of the extracted features.

The Food-5K dataset

Figure 3: The Foods-5K dataset will be used for this example of deep learning feature extraction with Keras.

The dataset we’ll be using here today is the Food-5K dataset, curated by the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology.

This dataset consists of 5,000 images, each belonging to one of two classes:

Food
Non-food

Our goal today is to:

Utilize Keras feature extraction to extract features from the Food-5K dataset using ResNet-50 pre-trained on ImageNet.
Train a simple neural network on top of these features to recognize classes the CNN was never trained to recognize.

It’s worth noting that the entire Food-5K dataset, after feature extraction, will only occupy ~2GB of RAM if loaded all at once — that’s not the point.

The point of today’s post is to show you how to use incremental learning to train a model on the extracted features.

That way, regardless of whether you are working with 1GB of data or 100GB of data, you will know the exact steps to train a model on top of features extracted via deep learning.

Downloading the Food-5K dataset

To start, make sure you grab the source code for today’s tutorial using the “Downloads” section of the blog post.

Once you’ve downloaded the source code, change directory into

transfer-learning-keras

$ unzip keras-feature-extraction.zip
$ cd keras-feature-extraction
$ mkdir Food-5K
$ cd Food-5K

In my experience, I’ve found that downloading the Food-5K dataset to be a bit unreliable.

Therefore I’m presenting two options to download the dataset:

Option 1: Use

wget

in your terminal

The

wget

application comes on Ubuntu and other Linux distros. On macOS, you must install it:

$ brew install wget

To download the Food-5K dataset, let’s use

wget

in our terminal:

$ wget --passive-ftp --ftp-user FoodImage@grebvm2.epfl.ch \
	--ftp-password Cahc1moo ftp://tremplin.epfl.ch/Food-5K.zip

Note: At least on macOS, I’ve found that if the

wget

command fails once, just run it again and then the download will start.

Option 2: Use FileZilla

FileZilla is a GUI application for FTP and SCP connections. You may download it for your OS here.

Once you’ve installed and launched the application, enter the credentials:

Host: tremplin.epfl.ch
Username: FoodImage@grebvm2.epfl.ch
Password: Cahc1moo

You can then connect and download the file into the appropriate destination.

Figure 4: Downloading the Food-5K dataset using FileZilla.

The username and password combination was obtained from the official Food-5K dataset website. If the username/password combination stops working for you, check to see if the dataset curators changed the login credentials.

Once downloaded, we can go ahead and unzip the dataset (ensuring that you are in the

Food-5K/

directory that we previously used the cd command to move into):

$ unzip Food-5k.zip

Project structure

Go ahead and navigate back to the root directory:

$ cd ..

From there, we’re able to analyze our project structure with the

tree

command:

$ tree --dirsfirst --filelimit 10
.
├── Food-5K
│   ├── evaluation [1000 entries]
│   ├── training [3000 entries]
│   ├── validation [1000 entries]
│   └── Food-5K.zip
├── dataset
├── output
├── pyimagesearch
│   ├── __init__.py
│   └── config.py
├── build_dataset.py
├── extract_features.py
└── train.py

8 directories, 6 files

The

config.py

file contains our configuration settings in Python form. Our other Python scripts will take advantage of the config.

Using our

build_dataset.py

script, we’ll organize and output the contents of the

Food-5K/

directory to the dataset folder.

From there, the

extract_features.py

script will use transfer learning via feature extraction to compute feature vectors for each image. These features will be output to a CSV file.

Both

build_dataset.py

and

extract_features.py

were reviewed in detail last week; however, we’ll briefly walk through them again today.

Finally, we’ll review

train.py

. In this Python script, we will use incremental learning to train a simple neural network on the extracted features. This script is different than last week’s tutorial and we will focus our energy here.

Our configuration file

Let’s get started by reviewing our

config.py

file where we’ll store our configurations, namely the paths to our input dataset of images along with our output paths of extracted features.

Open up the

config.py

file and insert the following code:

# import the necessary packages
import os

# initialize the path to the *original* input directory of images
ORIG_INPUT_DATASET = "Food-5K"

# initialize the base path to the *new* directory that will contain
# our images after computing the training and testing split
BASE_PATH = "dataset"

# define the names of the training, testing, and validation
# directories
TRAIN = "training"
TEST = "evaluation"
VAL = "validation"

# initialize the list of class label names
CLASSES = ["non_food", "food"]

# set the batch size
BATCH_SIZE = 32

# initialize the label encoder file path and the output directory to
# where the extracted features (in CSV file format) will be stored
LE_PATH = os.path.sep.join(["output", "le.cpickle"])
BASE_CSV_PATH = "output"

Take the time to read through the

config.py

script paying attention to the comments.

Most of the settings are related to directory and file paths which are used in the rest of our scripts.

For a full review of the configuration, be sure to refer to last week’s post.

Building the image dataset

Whenever I’m performing machine learning on a dataset (and especially Keras/deep learning), I prefer to have my dataset in the format of:

dataset_name/class_label/example_of_class_label.jpg

Maintaining this directory structure not only keeps our dataset organized on disk but also enables us to utilize Keras’

flow_from_directory

function when we get to fine-tuning later in this series of tutorials.

Since the Food-5K dataset provides pre-supplied data splits our final directory structure will have the form:

dataset_name/split_name/class_label/example_of_class_label.jpg

Again, this step isn’t always necessary, but it is a best practice (in my opinion), and one that I suggest you do as well.

At the very least it will give you experience writing Python code to organize images on disk.

Let’s use the

build_dataset.py

file to build our directory structure now:

# import the necessary packages
from pyimagesearch import config
from imutils import paths
import shutil
import os

# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.ORIG_INPUT_DATASET, split])
	imagePaths = list(paths.list_images(p))

	# loop over the image paths
	for imagePath in imagePaths:
		# extract class label from the filename
		filename = imagePath.split(os.path.sep)[-1]
		label = config.CLASSES[int(filename.split("_")[0])]

		# construct the path to the output directory
		dirPath = os.path.sep.join([config.BASE_PATH, split, label])

		# if the output directory does not exist, create it
		if not os.path.exists(dirPath):
			os.makedirs(dirPath)

		# construct the path to the output image file and copy it
		p = os.path.sep.join([dirPath, filename])
		shutil.copy2(imagePath, p)

After importing our packages on Lines 2-5, we proceed to loop over the training, testing, and validation splits (Line 8).

We create our split + class label directory structure (detailed above) and then populate the directories with the Food-5K images. The result is organized data which we can use for extracting features.

Let’s execute the script and review our directory structure once more.

You can use the “Downloads” section of this tutorial to download the source code — from there, open up a terminal and execute the following command:

$ python build_dataset.py 
[INFO] processing 'training split'...
[INFO] processing 'evaluation split'...
[INFO] processing 'validation split'...

After doing so, you will encounter the following directory structure:

$ tree --dirsfirst --filelimit 10
.
├── Food-5K
│   ├── evaluation [1000 entries]
│   ├── training [3000 entries]
│   ├── validation [1000 entries]
│   └── Food-5K.zip
├── dataset
│   ├── evaluation
│   │   ├── food [500 entries]
│   │   └── non_food [500 entries]
│   ├── training
│   │   ├── food [1500 entries]
│   │   └── non_food [1500 entries]
│   └── validation
│       ├── food [500 entries]
│       └── non_food [500 entries]
├── output
├── pyimagesearch
│   ├── __init__.py
│   └── config.py
├── build_dataset.py
├── extract_features.py
└── train.py

16 directories, 6 files

Notice that our dataset/ directory is now populated. Each subdirectory then has the following format:

split_name/class_label

With our data organized, we’re ready to move on to feature extraction.

Using Keras for deep learning feature extraction

Now that we’ve built our dataset directory structure for the project, we can:

Use Keras to extract features via deep learning from each image in the dataset.
Write the class labels + extracted features to disk in CSV format.

To accomplish these tasks we’ll need to implement the

extract_features.py

file.

This file was covered in detail in last week’s post so we’ll only briefly review the script here as a matter of completeness:

# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from keras.applications import ResNet50
from keras.applications import imagenet_utils
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
from pyimagesearch import config
from imutils import paths
import numpy as np
import pickle
import random
import os

# load the ResNet50 network and initialize the label encoder
print("[INFO] loading network...")
model = ResNet50(weights="imagenet", include_top=False)
le = None

# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.BASE_PATH, split])
	imagePaths = list(paths.list_images(p))

	# randomly shuffle the image paths and then extract the class
	# labels from the file paths
	random.shuffle(imagePaths)
	labels = [p.split(os.path.sep)[-2] for p in imagePaths]

	# if the label encoder is None, create it
	if le is None:
		le = LabelEncoder()
		le.fit(labels)

	# open the output CSV file for writing
	csvPath = os.path.sep.join([config.BASE_CSV_PATH,
		"{}.csv".format(split)])
	csv = open(csvPath, "w")

On Line 16, ResNet is loaded while excluding the head. Pre-trained ImageNet weights are loaded into the network as well. Feature extraction via transfer learning is now possible using this pre-trained, headless network.

From there, we proceed to loop over the data splits on Line 20.

Inside, we grab all

imagePaths

for the particular

split

and fit our label encoder (Lines 23-39).

A CSV file is opened for writing (Lines 37-39) so that we can write our class labels and extracted features to disk.

Now that our initializations are all set, we can start looping over images in batches:

# loop over the images in batches
	for (b, i) in enumerate(range(0, len(imagePaths), config.BATCH_SIZE)):
		# extract the batch of images and labels, then initialize the
		# list of actual images that will be passed through the network
		# for feature extraction
		print("[INFO] processing batch {}/{}".format(b + 1,
			int(np.ceil(len(imagePaths) / float(config.BATCH_SIZE)))))
		batchPaths = imagePaths[i:i + config.BATCH_SIZE]
		batchLabels = le.transform(labels[i:i + config.BATCH_SIZE])
		batchImages = []

		# loop over the images and labels in the current batch
		for imagePath in batchPaths:
			# load the input image using the Keras helper utility
			# while ensuring the image is resized to 224x224 pixels
			image = load_img(imagePath, target_size=(224, 224))
			image = img_to_array(image)

			# preprocess the image by (1) expanding the dimensions and
			# (2) subtracting the mean RGB pixel intensity from the
			# ImageNet dataset
			image = np.expand_dims(image, axis=0)
			image = imagenet_utils.preprocess_input(image)

			# add the image to the batch
			batchImages.append(image)

Each

image

in the batch is loaded and preprocessed. From there it is appended to

batchImages

We’ll now send the batch through ResNet to extract features:

# pass the images through the network and use the outputs as
		# our actual features, then reshape the features into a
		# flattened volume
		batchImages = np.vstack(batchImages)
		features = model.predict(batchImages, batch_size=config.BATCH_SIZE)
		features = features.reshape((features.shape[0], 7 * 7 * 2048))

		# loop over the class labels and extracted features
		for (label, vec) in zip(batchLabels, features):
			# construct a row that exists of the class label and
			# extracted features
			vec = ",".join([str(v) for v in vec])
			csv.write("{},{}\n".format(label, vec))

	# close the CSV file
	csv.close()

# serialize the label encoder to disk
f = open(config.LE_PATH, "wb")
f.write(pickle.dumps(le))
f.close()

Feature extraction for the batch takes place on Line 72. Using ResNet, our output layer has a volume size of 7 x 7 x 2,048. Treating the output as a feature vector, we simply flatten it into a list of 7 x 7 x 2,048 = 100,352-dim (Line 73).

The batch of feature vectors is then output to a CSV file with the first entry of each row being the class

label

and the rest of the values making up the feature

vec

We’ll repeat this process for all batches inside each split until we finish. Finally, our label encoder is dumped to disk.

For a more detailed, line-by-line review, refer to last week’s tutorial.

To extract features from our dataset, make sure you use the “Downloads” section of the guide to download the source code to this post.

From there, open up a terminal and execute the following command:

$ python extract_features.py
[INFO] loading network...
[INFO] processing 'training split'...
...
[INFO] processing batch 92/94
[INFO] processing batch 93/94
[INFO] processing batch 94/94
[INFO] processing 'evaluation split'...
...
[INFO] processing batch 30/32
[INFO] processing batch 31/32
[INFO] processing batch 32/32
[INFO] processing 'validation split'...
...
[INFO] processing batch 30/32
[INFO] processing batch 31/32
[INFO] processing batch 32/32

On an NVIDIA K80 GPU the entire feature extraction process took 5m11s.

You can also run

extract_features.py

on a CPU but it will take much longer.

After feature extraction is complete, you should have three CSV files in your output directory, one for each of our data splits, respectively:

$ ls -l output/
total 2655188
-rw-rw-r-- 1 ubuntu ubuntu  502570423 May 13 17:17 evaluation.csv
-rw-rw-r-- 1 ubuntu ubuntu 1508474926 May 13 17:16 training.csv
-rw-rw-r-- 1 ubuntu ubuntu  502285852 May 13 17:18 validation.csv

Implementing the incremental learning training script

Finally, we are now ready to utilize incremental learning to apply transfer learning via feature extraction on large datasets.

The Python script we’re implementing in this section will be responsible for:

Constructing the simple feedforward NN architecture.
Implementing a CSV data generator used to yield batches of labels + feature vectors to the NN.
Training the simple NN using the data generator.
Evaluating the feature extractor.

Open up the

train.py

script and let’s get started:

# import the necessary packages
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
from keras.utils import to_categorical
from sklearn.metrics import classification_report
from pyimagesearch import config
import numpy as np
import pickle
import os

On Lines 2-10 import our required packages. Our most notable import is Keras’

Sequential

API which we will use to build a simple feedforward neural network.

Several months ago I wrote a tutorial on implementing custom Keras data generators, and more specifically, yielding data from a CSV file to train a neural network with Keras.

At the time, I found that readers were a bit confused on practical applications where you would use such a generator — today is a great example of such a practical application.

Again, keep in mind that we’re assuming at the entire CSV file of extracted features will not fit into memory. Therefore, we need a custom Keras generator to yield batches of labels + data to the network so it can be trained.

Let’s implement the generator now:

def csv_feature_generator(inputPath, bs, numClasses, mode="train"):
	# open the input file for reading
	f = open(inputPath, "r")

	# loop indefinitely
	while True:
		# initialize our batch of data and labels
		data = []
		labels = []

		# keep looping until we reach our batch size
		while len(data) < bs:
			# attempt to read the next row of the CSV file
			row = f.readline()

Our

csv_feature_generator

accepts four parameters:

```
inputPath
```
: The path to our input CSV file containing the extracted features.
```
bs
```
: The batch size (or length) of each chunk of data.
```
numClasses
```
: An integer value representing the number of classes in our data.
```
mode
```
: Whether we are training or evaluating/testing.

On Line 14, we open our CSV file for reading.

Beginning on Line 17, we loop indefinitely, starting by initializing our data and labels. (Lines 19 and 20).

From there, we’ll loop until the length

data

equals the batch size starting on Line 23.

We proceed by reading a line from the CSV (Line 25). Once we have the line we’ll go ahead and process it:

# check to see if the row is empty, indicating we have
			# reached the end of the file
			if row == "":
				# reset the file pointer to the beginning of the file
				# and re-read the row
				f.seek(0)
				row = f.readline()

				# if we are evaluating we should now break from our
				# loop to ensure we don't continue to fill up the
				# batch from samples at the beginning of the file
				if mode == "eval":
					break

			# extract the class label and features from the row
			row = row.strip().split(",")
			label = row[0]
			label = to_categorical(label, num_classes=numClasses)
			features = np.array(row[1:], dtype="float")

			# update the data and label lists
			data.append(features)
			labels.append(label)

		# yield the batch to the calling function
		yield (np.array(data), np.array(labels))

If the

row

is empty, we will restart at the beginning of the file (Lines 29-32). And if we are in evaluation mode, we will

break

from our loop, ensuring that we don’t fill the batch from the start of the file (Lines 38 and 39).

Assuming we are continuing on, the

label

and

features

are extracted from the

row

(Lines 42-45).

We then append the feature vector (

features

) and

label

to the

data

and

labels

lists, respectively, until the lists reach the specified batch size (Lines 48 and 49).

When the batch is ready, Line 52 yields the

data

and

labels

as a tuple. Python’s

yield

keyword is critical to making our function operate as a generator.

Let’s continue — we have a few more steps before we will train the model:

# load the label encoder from disk
le = pickle.loads(open(config.LE_PATH, "rb").read())

# derive the paths to the training, validation, and testing CSV files
trainPath = os.path.sep.join([config.BASE_CSV_PATH,
	"{}.csv".format(config.TRAIN)])
valPath = os.path.sep.join([config.BASE_CSV_PATH,
	"{}.csv".format(config.VAL)])
testPath = os.path.sep.join([config.BASE_CSV_PATH,
	"{}.csv".format(config.TEST)])

# determine the total number of images in the training and validation
# sets
totalTrain = sum([1 for l in open(trainPath)])
totalVal = sum([1 for l in open(valPath)])

# extract the testing labels from the CSV file and then determine the
# number of testing images
testLabels = [int(row.split(",")[0]) for row in open(testPath)]
totalTest = len(testLabels)

Our label encoder is loaded from disk on Line 54. We then derive the paths to the training, validation, and testing CSV files (Lines 58-63).

Lines 67 and 68 handle counting the number of images that are in the training and validation sets. With this information, we will be able to tell the

.fit_generator

function how many

batch_size

steps are in each epoch.

Let’s construct a generator for each data split:

# construct the training, validation, and testing generators
trainGen = csv_feature_generator(trainPath, config.BATCH_SIZE,
	len(config.CLASSES), mode="train")
valGen = csv_feature_generator(valPath, config.BATCH_SIZE,
	len(config.CLASSES), mode="eval")
testGen = csv_feature_generator(testPath, config.BATCH_SIZE,
	len(config.CLASSES), mode="eval")

Lines 76-81 initialize our CSV feature generators.

We’re now ready to build a simple neural network:

# define our simple neural network
model = Sequential()
model.add(Dense(256, input_shape=(7 * 7 * 2048,), activation="relu"))
model.add(Dense(16, activation="relu"))
model.add(Dense(len(config.CLASSES), activation="softmax"))

Contrary to last week’s tutorial where we used a Logistic Regression machine learning model, today we will build a simple neural network for classification.

Lines 84-87 define a simple

100352-256-16-2

feedforward neural network architecture using Keras.

How did I come up with the values of

and

for the two hidden layers?

A good rule of thumb is to take the square root of the previous number of nodes in the layer and then find the closest power of 2.

In this case, the closest power of 2 to

. The square root of

is then

, thus giving us our architecture definition.

Let’s go ahead and

compile

our

model

# compile the model
opt = SGD(lr=1e-3, momentum=0.9, decay=1e-3 / 25)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

compile

our

model

using stochastic gradient descent (

SGD

) with an initial learning rate of

1e-3

(which will decay over

epochs).

We’re using

"binary_crossentropy"

for our

loss

function here as we only have to two classes. If you have greater than 2 classes then you should use
"categorical_crossentropy"
.

With our

model

compiled, now we are ready to train and evaluate:

# train the network
print("[INFO] training simple network...")
H = model.fit_generator(
	trainGen,
	steps_per_epoch=totalTrain // config.BATCH_SIZE,
	validation_data=valGen,
	validation_steps=totalVal // config.BATCH_SIZE,
	epochs=25)

# make predictions on the testing images, finding the index of the
# label with the corresponding largest predicted probability, then
# show a nicely formatted classification report
print("[INFO] evaluating network...")
predIdxs = model.predict_generator(testGen,
	steps=(totalTest //config.BATCH_SIZE) + 1)
predIdxs = np.argmax(predIdxs, axis=1)
print(classification_report(testLabels, predIdxs,
	target_names=le.classes_))

Lines 96-101 fit our

model

using our training and validation generators (

trainGen

and

valGen

). Using generators with our

model

allows for incremental learning.

Using incremental learning we are no longer required to have all of our data loaded into memory at one time. Instead, batches of data flow through our network making it easy to work with massive datasets.

Of course, CSV data isn’t exactly an efficient use of space, nor is it fast. Inside of Deep Learning for Computer Vision with Python, I teach how to use HDF5 for storage more efficiently.

Evaluation of the model takes place on Lines 107-109, where

testGen

generates our feature vectors in batches. A classification report is then printed in the terminal (Lines 110 and 111).

Keras feature extraction results

Finally, we are ready to train our simple NN on the extracted features from ResNet!

Make sure you use the “Downloads” section of this tutorial to download the source code.

From there, open up a terminal and execute the following command:

$ python train.py
Using TensorFlow backend.
[INFO] training simple network...
Epoch 1/25
93/93 [==============================] - 78s 842ms/step - loss: 0.0764 - acc: 0.9724 - val_loss: 0.0565 - val_acc: 0.9869
Epoch 2/25
93/93 [==============================] - 72s 771ms/step - loss: 0.0087 - acc: 0.9963 - val_loss: 0.0354 - val_acc: 0.9917
Epoch 3/25
93/93 [==============================] - 72s 771ms/step - loss: 0.0013 - acc: 0.9993 - val_loss: 0.0448 - val_acc: 0.9897
Epoch 4/25
93/93 [==============================] - 72s 773ms/step - loss: 1.8864e-04 - acc: 1.0000 - val_loss: 0.0445 - val_acc: 0.9907
Epoch 5/25
93/93 [==============================] - 72s 772ms/step - loss: 1.0165e-04 - acc: 1.0000 - val_loss: 0.0451 - val_acc: 0.9907
...
Epoch 21/25
93/93 [==============================] - 71s 765ms/step - loss: 2.6889e-05 - acc: 1.0000 - val_loss: 0.0421 - val_acc: 0.9917
Epoch 22/25
93/93 [==============================] - 71s 768ms/step - loss: 2.5603e-05 - acc: 1.0000 - val_loss: 0.0482 - val_acc: 0.9907
Epoch 23/25
93/93 [==============================] - 71s 762ms/step - loss: 2.5084e-05 - acc: 1.0000 - val_loss: 0.0480 - val_acc: 0.9907
Epoch 24/25
93/93 [==============================] - 71s 766ms/step - loss: 2.3940e-05 - acc: 1.0000 - val_loss: 0.0484 - val_acc: 0.9907
Epoch 25/25
93/93 [==============================] - 71s 761ms/step - loss: 2.3282e-05 - acc: 1.0000 - val_loss: 0.0485 - val_acc: 0.9907
[INFO] evaluating network...
              precision    recall  f1-score   support 

        food       0.98      0.99      0.99       500 
    non_food       0.99      0.98      0.98       500 

   micro avg       0.98      0.98      0.98      1000 
   macro avg       0.99      0.98      0.98      1000 
weighted avg       0.99      0.98      0.98      1000

Training on an NVIDIA K80 took approximately ~30m. You could train on a CPU as well but it will take considerably longer.

And as our output shows, we are able to obtain ~98-99% accuracy on the Food-5K dataset, even though ResNet-50 was never trained on food/non-food classes!

As you can see, transfer learning is a very powerful technique, enabling you to take the features extracted from CNNs and recognize classes they were not trained on.

Later in this series of tutorials on transfer learning with Keras and deep learning, I’ll be showing you how to perform fine-tuning, another transfer learning method.

What’s next — where do I learn more about transfer learning and feature extraction?

In this tutorial, you learned how to utilize a CNN to recognize class labels it was never trained on.

You also learned how to use incremental learning to accomplish this task.

Incremental learning is critical when your dataset is too large to fit into memory.

But I know as soon as this post is published I’m going to get emails and questions in the comments regarding:

“How do I classify images outside my training/testing set?”
“How do I load an image from disk, extract features from it using a CNN, and then classify it using the neural network?”
“How do I correctly preprocess my input image before classification?”

Today’s tutorial is long enough as it is. I can’t, therefore, include those sections of Deep Learning for Computer Vision with Python inside this post.

If you’d like to learn more about transfer learning, including:

More details on the concept of transfer learning
How to perform feature extraction
How to fine-tune networks
How to classify images outside your training/testing set using both feature extraction and fine-tuning

…then you’ll definitely want to refer to my book, Deep Learning for Computer Vision with Python.

Besides chapters on transfer learning, you’ll also find:

Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision, but their implementations as well.
A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Grab my free sample chapters!

Summary

In this tutorial you learned how to:

Utilize Keras for deep learning feature extraction.
Perform incremental learning on the extracted features.

Utilizing incremental learning enables us to train models on datasets too large to fit into memory.

Neural networks are a great example of incremental learners as we can load data via batches, ensuring the entire network does not have to fit into RAM at once. Using incremental learning we were able to obtain ~98% accuracy.

I would suggest using this code as a template for whenever you need to use Keras for feature extraction on large datasets.

I hope you enjoyed the tutorial!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

The post Keras: Feature extraction on large datasets with Deep Learning appeared first on PyImageSearch.

In this tutorial, you will learn how to perform fine-tuning with Keras and Deep Learning.

We will take a CNN pre-trained on the ImageNet dataset and fine-tune it to perform image classification and recognize classes it was never trained on.

Today is the final post in our three-part series on fine-tuning:

Part #1: Transfer learning with Keras and Deep Learning
Part #2: Feature extraction with on large datasets with Keras and Deep Learning
Part #3: Fine-tuning with Keras and Deep Learning (today’s post)

I would strongly encourage you to read the previous two tutorials in the series if you haven’t yet — understanding the concept of transfer learning, including performing feature extraction via a pre-trained CNN, will better enable you to understand (and appreciate) fine-tuning.

When performing feature extraction we did not re-train the original CNN. Instead, we treated the CNN as an arbitrary feature extractor and then trained a simple machine learning model on top of the extracted features.

Fine-tuning, on the other hand, requires that we not only update the CNN architecture but also re-train it to learn new object classes.

Fine-tuning is a multi-step process:

Remove the fully connected nodes at the end of the network (i.e., where the actual class label predictions are made).
Replace the fully connected nodes with freshly initialized ones.
Freeze earlier CONV layers earlier in the network (ensuring that any previous robust features learned by the CNN are not destroyed).
Start training, but only train the FC layer heads.
Optionally unfreeze some/all of the CONV layers in the network and perform a second pass of training.

If you are new to deep learning and CNNs, I would recommend you stop here and learn how to train your first CNN.

Fine-tuning with Keras is a more advanced technique with plenty of gotchas and pitfalls that will trip you up along the way (for example, it tends to be very easy to overfit a network when performing fine-tuning if you are not careful).

To learn how to perform fine-tuning with Keras and deep learning, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Fine-tuning with Keras and Deep Learning

Note: Many of the fine-tuning concepts I’ll be covering in this post also appear in my book, Deep Learning for Computer Vision with Python. Inside the book, I go into considerably more detail (and include more of my tips, suggestions, and best practices). If you would like more detail on fine-tuning with Keras after going through this guide, definitely take a look at my book.

In the first part of this tutorial, we’ll discuss the concept of fine-tuning and how we can re-train a neural network to recognize classes it was not originally trained to recognize.

From there we’ll review the dataset we are using for fine-tuning.

I’ll then discuss our project directory structure.

Once we have a good handle on the dataset we’ll then switch to implementing fine-tuning with Keras.

After you have finished going through this tutorial you will be able to:

Fine-tune networks with Keras.
Make predictions using the fine-tuned network.

Let’s get started!

What is fine-tuning?

Figure 1: Fine-tuning with Keras and deep learning using Python involves retraining the head of a network to recognize classes it was not originally intended for.

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning and fine-tuning, please refer to the text.

Earlier in this series of posts on transfer learning, we learned how to treat a pre-trained Convolutional Neural Network as a feature extractor.

Using this feature extractor, we forward propagated our dataset of images through the network, extracted the activations at a given layer (treating the activations as a feature vector), and then saved the values to disk.

A standard machine learning classifier (in our case, Logistic Regression), was trained on top of the CNN features, exactly as we would do with hand-engineered features such as SIFT, HOG, LBPs, etc.

This approach to transfer learning is called feature extraction.

But there is another type of transfer learning, one that can actually outperform the feature extraction method. This method is called fine-tuning and requires us to perform “network surgery”.

First, we take a scalpel and cut off the final set of fully connected layers (i.e., the “head” of the network where the class label predictions are returned) from a pre-trained CNN (typically VGG, ResNet, or Inception).

We then replace the head with a new set of fully connected layers with random initializations.

From there, all layers below the head are frozen so their weights cannot be updated (i.e., the backward pass in back propagation does not reach them).

We then train the network using a very small learning rate so the new set of fully connected layers can learn patterns from the previously learned CONV layers earlier in the network — this process is called allowing the FC layers to “warm up”.

Optionally, we may unfreeze the rest of the network and continue training. Applying fine-tuning allows us to utilize pre-trained networks to recognize classes they were not originally trained on.

And furthermore, this method can lead to higher accuracy than transfer learning via feature extraction.

Fine-tuning and network surgery

As we discussed earlier in this series on transfer learning via feature extraction, pre-trained networks (such as ones trained on the ImageNet dataset) contain rich, discriminative filters. The filters can be used on datasets to predict class labels outside the ones the network has already been trained on.

However, instead of simply applying feature extraction, we are going to perform network surgery and modify the actual architecture so we can re-train parts of the network.

If this sounds like something out of a bad horror movie; don’t worry, there won’t be any blood and gore — but we’ll have some fun and learn a lot about transfer learning via our Dr. Frankenstien-esque network experiments.

To understand how fine-tuning works, consider the following figure:

Figure 2: Left: The original VGG16 network architecture. Middle: Removing the FC layers from VGG16 and treating the final POOL layer as a feature extractor. Right: Removing the original FC Layers and replacing them with a brand new FC head. These FC layers can then be fine-tuned to a specific dataset (the old FC Layers are no longer used).

On the left we have the layers of the VGG116 network.

As we know, the final set of layers (i.e., the “head”) are our fully connected layers along with our softmax classifier.

When performing fine-tuning, we actually sever the head of the network, just as in feature extraction (Figure 2, middle).

However, unlike feature extraction, when we perform fine-tuning we actually build a new fully connected head and place it on top of the original architecture (Figure 2, right).

The new FC layer head is randomly initialized (just like any other layer in a new network) and connected to the body of the original network.

However, there is a problem:

Our CONV layers have already learned rich, discriminative filters while our FC layers are brand new and totally random.

If we allow the gradient to backpropagate from these random values all the way through the network, we risk destroying these powerful features.

To circumvent this problem, we instead let our FC head “warm up” by (ironically) “freezing” all layers in the body of the network (I told you the horror/cadaver analogy works well here) as depicted in Figure 2 (left).

Figure 3: Left: When we start the fine-tuning process, we freeze all CONV layers in the network and only allow the gradient to backpropagate through the FC layers. Doing this allows our network to “warm up”. Right: After the FC layers have had a chance to warm up, we may choose to unfreeze all or some of the layers earlier in the network and allow each of them to be fine-tuned as well.

Training data is forward propagated through the network as we usually would; however, the backpropagation is stopped after the FC layers, which allows these layers to start to learn patterns from the highly discriminative CONV layers.

In some cases, we may decide to never unfreeze the body of the network as our new FC head may obtain sufficient accuracy.

However, for some datasets it is often advantageous to allow the original CONV layers to be modified during the fine-tuning process as well (Figure 3, right).

After the FC head has started to learn patterns in our dataset, we can pause training, unfreeze the body, and continue training, but with a very small learning rate — we do not want to alter our CONV filters dramatically.

Training is then allowed to continue until sufficient accuracy is obtained.

Fine-tuning is a super-powerful method to obtain image classifiers on your own custom datasets from pre-trained CNNs (and is even more powerful than transfer learning via feature extraction).

If you’d like to learn more about transfer learning via deep learning, including:

Deep learning-based feature extraction
Training models on top of extracted features
Fine-tuning networks on your own custom datasets
My personal tips, suggestions, and best practices for transfer learning

…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover these algorithms and techniques in detail.

The Food-11 Dataset

Figure 4: The Food-11 dataset is curated by the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology. (image source)

The dataset we’ll be using for fine-tuning is the Food-11 dataset from the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology.

The dataset consists of 16,643 images belonging to 11 major food categories:

Bread (1724 images)
Dairy product (721 images)
Dessert (2,500 images)
Egg (1,648 images)
Fried food (1,461images)
Meat (2,206 images)
Noodles/pasta (734 images)
Rice (472 images)
Seafood (1,505 images)
Soup (2,500 images)
Vegetable/fruit (1,172 images)

Using the Food-11 dataset we can train a deep learning model capable of recognizing each major food group — such a model could be used, for example, in a mobile fitness application that automatically tracks estimated food group and caloric intake.

To train such a model, we’ll be utilizing fine-tuning with the Keras deep learning library.

Downloading the Food-11 dataset

Go ahead and grab the zip from the “Downloads” section of this blog post.

Once you’ve downloaded the source code, change directory into

fine-tuning-keras

$ unzip fine-tuning-keras.zip
$ cd fine-tuning-keras

Now let’s create a

Food-11/

directory to house our unaltered dataset:

$ mkdir Food-11
$ cd Food-11

In my experience, I’ve found that downloading the Food-11 dataset is unreliable.

Therefore I’m presenting two options to download the dataset:

Option 1: Use

wget

in your terminal

The

wget

application comes pre-installed on Ubuntu and other Linux distros. On macOS, you must install it:

$ brew install wget

To download the Food-11 dataset, let’s use

wget

in our terminal:

$ wget --passive-ftp --ftp-user FoodImage@grebvm2.epfl.ch \
	--ftp-password Cahc1moo ftp://tremplin.epfl.ch/Food-11.zip

Note: At least on macOS, I’ve found that if the

wget

command fails once, just run it again and then the download will start.

Option 2: Use FileZilla

FileZilla is a GUI application for FTP and SCP connections. You may download it for your OS here.

Once you’ve installed and launched the application, enter the credentials:

Host: tremplin.epfl.ch
Username: FoodImage@grebvm2.epfl.ch
Password: Cahc1moo

You can then connect and download the file into the appropriate destination.

Figure 5: Downloading the Food-11 dataset with FileZilla.

The username and password combination was obtained from the official Food-11 dataset website. If the username/password combination stops working for you, check to see if the dataset curators changed the login credentials.

Once downloaded (hopefully with no issues), we can go ahead and unzip the dataset inside of the

Food-11/

directory:

$ unzip Food-11.zip

Project structure

Now that we’ve downloaded the project and dataset, go ahead and navigate back to the project root. From there let’s analyze the project structure:

$ cd ..
$ tree --dirsfirst --filelimit 10
.
├── Food-11
│   ├── evaluation [3347 entries]
│   ├── training [9866 entries]
│   ├── validation [3430 entries]
│   └── Food-11.zip
├── dataset
├── output
│   ├── unfrozen.png
│   └── warmup.png
├── pyimagesearch
│   ├── __init__.py
│   └── config.py
├── build_dataset.py
├── predict.py
└── train.py

7 directories, 8 files

Our project structure is similar to last week’s.

Our original dataset is in the

Food-11/

directory.

Executing

build_dataset.py

enables us to organize the Food-11 images into the

dataset/

directory.

From there, we’ll use

train.py

to perform fine tuning.

Finally, we’ll use

predict.py

to make predictions on sample images using our fine-tuned network.

Each of the aforementioned scripts takes advantage of a configuration file named

config.py

. Let’s go ahead and learn more about the configuration script now.

Understanding our configuration file

Before we can actually fine-tune our network, we first need to create our configuration file to store important variables, including:

Paths to the input dataset
Class labels
Batch size/training parameters
Output paths, including model files, label encoders, plot histories, etc.

Since there are so many parameters that we need, I’ve opted to use a configuration file to keep our code nice and organized (versus having to utilize many command line arguments).

Our configuration file,

config.py

, lives in a Python module named

pyimagesearch

We keep the

config.py

file there for two reasons:

To ensure we can import the configuration into our own Python scripts
To keep our code tidy and organized

Note: This config file is similar to the one in last week’s and the prior week’s tutorials.

Let’s fill our

config.py

file now — open it up in your favorite code editor and insert the following lines:

# import the necessary packages
import os

# initialize the path to the *original* input directory of images
ORIG_INPUT_DATASET = "Food-11"

# initialize the base path to the *new* directory that will contain
# our images after computing the training and testing split
BASE_PATH = "dataset"

First, we import

os

, enabling us to build file/directory paths directly in this config.

The original dataset path where we extracted the Food-11 dataset is contained in

ORIG_INPUT_DATASET

Then we specify the

BASE_PATH

where our organized dataset will soon reside.

From there we’ll define the names of our

TRAIN

TEST

, and

VAL

directories:

# define the names of the training, testing, and validation
# directories
TRAIN = "training"
TEST = "evaluation"
VAL = "validation"

Followed by listing the eleven

CLASSES

of our Food-11 dataset:

# initialize the list of class label names
CLASSES = ["Bread", "Dairy product", "Dessert", "Egg", "Fried food",
	"Meat", "Noodles/Pasta", "Rice", "Seafood", "Soup",
	"Vegetable/Fruit"]

Finally, we’ll specify our batch size and model + plot paths:

# set the batch size when fine-tuning
BATCH_SIZE = 32

# set the path to the serialized model after training
MODEL_PATH = os.path.sep.join(["output", "food11.model"])

# define the path to the output training history plots
UNFROZEN_PLOT_PATH = os.path.sep.join(["output", "unfrozen.png"])
WARMUP_PLOT_PATH = os.path.sep.join(["output", "warmup.png"])

Our

BATCH_SIZE

represents the size of the chunks of data that will flow through our CNN.

We’ll store our fine-tuned serialized Keras model in the

MODEL_PATH

Similarly, we specify the paths where our warmup and unfrozen plot images will be stored.

Building our image dataset for fine-tuning

If we were to store the entire Food-11 dataset in memory, it would occupy ~10GB of RAM.

Most deep learning rigs should be able to handle that amount of data, but nevertheless, I’ll be showing you how to use the

.flow_from_directory

function with Keras to only load small batches of data from disk at a time.

However, before we can actually get to fine-tuning and re-training a network, we first must (correctly) organize our dataset of images on disk.

In order to use the

.flow_from_directory

function, Keras requires that we have our dataset organized using the following template:

dataset_name/class_label/example_of_class_label.jpg

And since the Food-11 dataset also provides pre-supplied data splits, our final directory structure will have the form:

dataset_name/split_name/class_label/example_of_class_label.jpg

Having the above directory structure ensures that:

The
```
.flow_from_directory
```
function will properly work.
Our dataset is organized into a neat, easy to follow directory structure.

In order to take the original Food-11 images and then copy them into our desired directory structure, we need the

build_dataset.py

script.

Let’s review that script now:

# import the necessary packages
from pyimagesearch import config
from imutils import paths
import shutil
import os

# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.ORIG_INPUT_DATASET, split])
	imagePaths = list(paths.list_images(p))

	# loop over the image paths
	for imagePath in imagePaths:
		# extract class label from the filename
		filename = imagePath.split(os.path.sep)[-1]
		label = config.CLASSES[int(filename.split("_")[0])]

		# construct the path to the output directory
		dirPath = os.path.sep.join([config.BASE_PATH, split, label])

		# if the output directory does not exist, create it
		if not os.path.exists(dirPath):
			os.makedirs(dirPath)

		# construct the path to the output image file and copy it
		p = os.path.sep.join([dirPath, filename])
		shutil.copy2(imagePath, p)

Lines 2-5 import our necessary packages, in particular, our

config

From there we loop over data splits beginning on Line 8. Inside, we:

Extract
```
imagePaths
```
and each class
```
label
```
(Lines 11-18).
Create a directory structure for our organized image files (Lines 21-25).
Copy the image files into the appropriate destination (Lines 28 and 29).

This script has been reviewed in more detail inside the Transfer learning with Keras and deep learning post. If you would like more detail on the inner-workings of

build_dataset.py

, please refer to the previous tutorial.

Before continuing, make sure you have used the “Downloads” section of the tutorial to download the source code associated with this blog post.

From there, open up a terminal and execute the following command:

$ python build_dataset.py 
[INFO] processing 'training split'...
[INFO] processing 'evaluation split'...
[INFO] processing 'validation split'...

If you investigate the

dataset/

directory you’ll see three directories, one for each of our respective data splits:

$ ls dataset/
evaluation	training	validation

Inside each of the data split directories you’ll also find class label subdirectories:

$ ls -l dataset/training/
Bread
Dairy product
Dessert
Egg
Fried food
Meat
Noodles
Rice
Seafood
Soup
Vegetable

And inside each of the class label subdirectories you’ll find images associated with that label:

$ ls -l dataset/training/Bread/*.jpg | head -n 5
dataset/training/Bread/0_0.jpg
dataset/training/Bread/0_1.jpg
dataset/training/Bread/0_10.jpg
dataset/training/Bread/0_100.jpg
dataset/training/Bread/0_101.jpg

Implementing fine-tuning with Keras

Now that our images are in the proper directory structure, we can perform fine-tuning with Keras.

Let’s implement the fine-tuning script inside

train.py

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import VGG16
from keras.layers.core import Dropout
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers import Input
from keras.models import Model
from keras.optimizers import SGD
from sklearn.metrics import classification_report
from pyimagesearch import config
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import pickle
import os

Lines 2-20 import required packages. Let’s briefly review those that are most important to the fine-tuning concepts in today’s post:

```
matplotlib
```
: We’ll be plotting our frozen and unfrozen training efforts. Line 3 sets the backend ensuring that we can save our plots to disk as image files.
```
ImageDataGenerator
```
: Allows for data augmentation. Be sure to refer to DL4CV and this blog post for more information on this class.
```
VGG16
```
: The seminal network trained on ImageNet that we’ll be slicing and dicing with our scalpel for the purposes of fine-tuning.
```
classification_report
```
: Calculates basic statistical information upon evaluation of our model.
```
config
```
: Our custom configuration file which we reviewed in the “Understanding our configuration file” section.

Be sure to familiarize yourself with the rest of the imports as well.

With the packages at our fingertips, we’re now ready to move on. Let’s start by defining a function for plotting training history:

def plot_training(H, N, plotPath):
	# construct a plot that plots and saves the training history
	plt.style.use("ggplot")
	plt.figure()
	plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
	plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
	plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
	plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
	plt.title("Training Loss and Accuracy")
	plt.xlabel("Epoch #")
	plt.ylabel("Loss/Accuracy")
	plt.legend(loc="lower left")
	plt.savefig(plotPath)

The

plot_training

function is defined on Lines 22-34. This helper function will be used to construct and save a plot of our training history.

Let’s determine the total number of images in each of our splits:

# derive the paths to the training, validation, and testing
# directories
trainPath = os.path.sep.join([config.BASE_PATH, config.TRAIN])
valPath = os.path.sep.join([config.BASE_PATH, config.VAL])
testPath = os.path.sep.join([config.BASE_PATH, config.TEST])

# determine the total number of image paths in training, validation,
# and testing directories
totalTrain = len(list(paths.list_images(trainPath)))
totalVal = len(list(paths.list_images(valPath)))
totalTest = len(list(paths.list_images(testPath)))

Lines 38-40 define paths to training, validation, and testing directories, respectively.

Then, we determine the total number of images for each split via Lines 44-46 — these values will enable us to calculate the steps per epoch.

Let’s initialize our data augmentation object and establish our mean subtraction value:

# initialize the training data augmentation object
trainAug = ImageDataGenerator(
	rotation_range=30,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

# initialize the validation/testing data augmentation object (which
# we'll be adding mean subtraction to)
valAug = ImageDataGenerator()

# define the ImageNet mean subtraction (in RGB order) and set the
# the mean subtraction value for each of the data augmentation
# objects
mean = np.array([123.68, 116.779, 103.939], dtype="float32")
trainAug.mean = mean
valAug.mean = mean

The process of data augmentation is important for small datasets. In fact, it is nearly always recommended. Lines 49-56 define our training data augmentation object. The parameters specify random rotations, zooms, translations, shears, and flips to the training data as we train.

Note: A common misconception I see about data augmentation is that the random transforms of the images are then added to the original training data — that’s not the case. The random transformations performed by data augmentation are performed in-place, implying that the dataset size does not increase. These transforms are performed in-place, on the fly, during training.

Although our validation data augmentation object (Line 60) uses the same class, we do not supply any parameters (we don’t apply data augmentation to validation or testing data). The validation

ImageDataGenerator

will only be used for mean subtraction which is why no parameters are needed.

Next, we set the ImageNet mean subtraction values on Line 65. In this pre-processing technique, we perform a pixel-wise subtraction for all images. Mean subtraction is one of several scaling techniques I explain in the Practitioner Bundle of Deep Learning for Computer Vision with Python. In the text, we’ll even build a custom preprocessor to more efficiently accomplish mean subtraction.

Given the pixel-wise subtraction values, we prepare each of our data augmentation objects for mean subtraction (Lines 66 and 67).

Our data augmentation generators will generate data directly from their respective directories:

# initialize the training generator
trainGen = trainAug.flow_from_directory(
	trainPath,
	class_mode="categorical",
	target_size=(224, 224),
	color_mode="rgb",
	shuffle=True,
	batch_size=config.BATCH_SIZE)

# initialize the validation generator
valGen = valAug.flow_from_directory(
	valPath,
	class_mode="categorical",
	target_size=(224, 224),
	color_mode="rgb",
	shuffle=False,
	batch_size=config.BATCH_SIZE)

# initialize the testing generator
testGen = valAug.flow_from_directory(
	testPath,
	class_mode="categorical",
	target_size=(224, 224),
	color_mode="rgb",
	shuffle=False,
	batch_size=config.BATCH_SIZE)

Lines 70-94 define generators that will load batches of images from their respective, training, validation, and testing splits.

Using these generators ensures that our machine will not run out of RAM by trying to load all of the data at once.

Let’s go ahead and perform network surgery:

# load the VGG16 network, ensuring the head FC layer sets are left
# off
baseModel = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(512, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(len(config.CLASSES), activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

First, we’ll load the VGG16 architecture (with pre-trained ImageNet weights) from disk, leaving off the fully connected layers (Lines 98 and 99). By omitting the fully connected layers, we have effectively put the network in a guillotine to behead our network as in Figure 2.

From there, we define a new fully connected layer head (Lines 103-107).

Note: If you are unfamiliar with the contents on Lines 103-107, I recommend that you read my Keras tutorial or CNN tutorial. And if you would like to immerse yourself completely into the world of deep learning, be sure to check out my highly rated deep learning book.

On Line 111 we place the new FC layer head on top of the VGG16 base network. You can think of this as adding sutures to sew the head back on to the network body after surgery.

Take the time to review the above code block carefully as it is where the heart of fine-tuning with Keras begins.

Continuing on with fine-tuning, let’s freeze all of the CONV layers in the body of VGG16:

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

Lines 115-116 freeze all CONV layers in the VGG16 base model.

Given that the base is now frozen, we’ll go ahead and train our network (only the head weights will be updated):

# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=1e-4, momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network for a few epochs (all other layers
# are frozen) -- this will allow the new FC layers to start to become
# initialized with actual "learned" values versus pure random
print("[INFO] training head...")
H = model.fit_generator(
	trainGen,
	steps_per_epoch=totalTrain // config.BATCH_SIZE,
	validation_data=valGen,
	validation_steps=totalVal // config.BATCH_SIZE,
	epochs=50)

# reset the testing generator and evaluate the network after
# fine-tuning just the network head
print("[INFO] evaluating after fine-tuning network head...")
testGen.reset()
predIdxs = model.predict_generator(testGen,
	steps=(totalTest // config.BATCH_SIZE) + 1)
predIdxs = np.argmax(predIdxs, axis=1)
print(classification_report(testGen.classes, predIdxs,
	target_names=testGen.class_indices.keys()))
plot_training(H, 50, config.WARMUP_PLOT_PATH)

In this block, we train our

model

, keeping in mind that no weight updates will occur in the base. Only the head of the network will be tuned at this point.

In this code block, we:

Compile the
```
model
```
(Lines 121-123). We use
```
"categorical_crossentropy"
```
for our
```
loss
```
function. If you are performing classification with only two classes, be sure to use
```
"binary_crossentropy"
```
.
Train our network while applying data augmentation, only updating the weights for the head of the network (Lines 129-134)
Reset our testing generator (Line 139).
Evaluate our network on our testing data (Lines 140-142). We’ll print classification statistics in our terminal via Lines 143 and 144.
Plot the training history via our
```
plot_training
```
function (Line 145).

Now let’s proceed to unfreeze the final set of CONV layers in the base model layers:

# reset our data generators
trainGen.reset()
valGen.reset()

# now that the head FC layers have been trained/initialized, lets
# unfreeze the final set of CONV layers and make them trainable
for layer in baseModel.layers[15:]:
	layer.trainable = True

# loop over the layers in the model and show which ones are trainable
# or not
for layer in baseModel.layers:
	print("{}: {}".format(layer, layer.trainable))

We start by reseting our training and validation generators (Lines 148 and 149).

We then unfreeze the final CONV layer block in VGG16 (Lines 153 and 154). Again, only the final CONV block of VGG16 is unfrozen (not the rest of the network).

Just so there is no confusion about what is going on in our network, Lines 158 and 159 will show us which layers are frozen/not frozen (i.e., trainable). The information will print out in our terminal.

Continuing on, let’s fine-tune both the final set of CONV layers and our set of FC layers:

# for the changes to the model to take affect we need to recompile
# the model, this time using SGD with a *very* small learning rate
print("[INFO] re-compiling model...")
opt = SGD(lr=1e-4, momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the model again, this time fine-tuning *both* the final set
# of CONV layers along with our set of FC layers
H = model.fit_generator(
	trainGen,
	steps_per_epoch=totalTrain // config.BATCH_SIZE,
	validation_data=valGen,
	validation_steps=totalVal // config.BATCH_SIZE,
	epochs=20)

Since we’ve unfrozen additional layers, we must re-compile the model (Lines 164-166).

We then train the model again, this time fine-tuning both the FC layer head and the final CONV block (Lines 170-175).

Wrapping up, let’s evaluate the network once more:

# reset the testing generator and then use our trained model to
# make predictions on the data
print("[INFO] evaluating after fine-tuning network...")
testGen.reset()
predIdxs = model.predict_generator(testGen,
	steps=(totalTest // config.BATCH_SIZE) + 1)
predIdxs = np.argmax(predIdxs, axis=1)
print(classification_report(testGen.classes, predIdxs,
	target_names=testGen.class_indices.keys()))
plot_training(H, 20, config.UNFROZEN_PLOT_PATH)

# serialize the model to disk
print("[INFO] serializing network...")
model.save(config.MODEL_PATH)

Here we:

Make predictions on the testing data (Lines 180-183).
Print a new classification report (Lines 184 and 185).
Save the unfrozen training plot to disk (Line 186).
And serialize the model to disk, allowing us to recall the model in our
```
predict.py
```
script (Line 190).

Great job sticking with me on our fine-tuning journey. We’re going to put our script to work next!

Training a network via fine-tuning with Keras

Now that we’ve implemented our Python script to perform fine-tuning, let’s give it a try and see what happens.

Make sure you’ve used the “Downloads” section of this tutorial to download the source code to this post, and from there, execute the following command:

$ python train.py
Using TensorFlow backend.
Found 9866 images belonging to 11 classes.
Found 3430 images belonging to 11 classes.
Found 3347 images belonging to 11 classes.
[INFO] compiling model...
[INFO] training head...
Epoch 1/50
308/308 [==============================] - 246s 799ms/step - loss: 10.7644 - acc: 0.2883 - val_loss: 8.0234 - val_acc: 0.461
4
Epoch 2/50
308/308 [==============================] - 237s 768ms/step - loss: 8.3090 - acc: 0.4336 - val_loss: 6.3494 - val_acc: 0.5556
Epoch 3/50
308/308 [==============================] - 233s 757ms/step - loss: 7.0419 - acc: 0.4963 - val_loss: 5.2425 - val_acc: 0.6071
...
Epoch 48/50
308/308 [==============================] - 238s 771ms/step - loss: 0.8755 - acc: 0.7085 - val_loss: 0.8004 - val_acc: 0.7663
Epoch 49/50
308/308 [==============================] - 236s 765ms/step - loss: 0.8473 - acc: 0.7127 - val_loss: 0.7725 - val_acc: 0.7743
Epoch 50/50
308/308 [==============================] - 235s 763ms/step - loss: 0.8434 - acc: 0.7169 - val_loss: 0.7893 - val_acc: 0.7599
[INFO] evaluating after fine-tuning network head...
               precision    recall  f1-score   support

        Bread       0.79      0.52      0.62       368
Dairy product       0.75      0.55      0.64       148
      Dessert       0.71      0.68      0.69       500
          Egg       0.68      0.78      0.72       335
   Fried food       0.64      0.74      0.68       287
         Meat       0.73      0.88      0.79       432
      Noodles       0.94      0.95      0.95       147
         Rice       0.92      0.89      0.90        96
      Seafood       0.80      0.82      0.81       303
         Soup       0.92      0.94      0.93       500
    Vegetable       0.89      0.84      0.86       231

    micro avg       0.78      0.78      0.78      3347
    macro avg       0.80      0.78      0.78      3347
 weighted avg       0.78      0.78      0.77      3347

Figure 6: Our Keras fine-tuning network is allowed to “warm up” prior to unfreezing only the final block of CONV layers in VGG16.

After fine-tuning just our newly initialized FC layer head and allowing the FC Layers to warm up, we are obtaining ~78% accuracy which is quite respectable.

Next, we see that we have unfrozen the final block of CONV layers in VGG16 while leaving the rest of the network weights frozen:

<keras.engine.input_layer.InputLayer object at 0x7f95da8baf60>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da880128>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da87ac18>: False
<keras.layers.pooling.MaxPooling2D object at 0x7f95da87c588>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da87c438>: False
<keras.layers.convolutional.Conv2D object at 0x7f95d84e0da0>: False
<keras.layers.pooling.MaxPooling2D object at 0x7f95da5c0080>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da5c00b8>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da5cd470>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da5dd048>: False
<keras.layers.pooling.MaxPooling2D object at 0x7f95da57c080>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da57c0b8>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da58b4a8>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da59b780>: False
<keras.layers.pooling.MaxPooling2D object at 0x7f95da53a0f0>: False
<keras.layers.convolutional.Conv2D object at 0x7f95da53a128>: True
<keras.layers.convolutional.Conv2D object at 0x7f95da548518>: True
<keras.layers.convolutional.Conv2D object at 0x7f95da5590f0>: True
<keras.layers.pooling.MaxPooling2D object at 0x7f95da4f6198>: True

Once we’ve unfrozen the final CONV block, we resume fine-tuning:

[INFO] re-compiling model...
poch 1/20
308/308 [==============================] - 245s 795ms/step - loss: 0.8553 - acc: 0.7201 - val_loss: 0.7468 - val_acc: 0.7766
Epoch 2/20
308/308 [==============================] - 234s 759ms/step - loss: 0.7736 - acc: 0.7461 - val_loss: 0.7006 - val_acc: 0.8031
Epoch 3/20
308/308 [==============================] - 233s 756ms/step - loss: 0.7246 - acc: 0.7680 - val_loss: 0.7132 - val_acc: 0.8034
Epoch 4/20
308/308 [==============================] - 232s 753ms/step - loss: 0.6738 - acc: 0.7820 - val_loss: 0.6806 - val_acc: 0.8072
Epoch 5/20
308/308 [==============================] - 230s 746ms/step - loss: 0.6533 - acc: 0.7905 - val_loss: 0.6465 - val_acc: 0.8096
...
Epoch 16/20
308/308 [==============================] - 231s 749ms/step - loss: 0.3888 - acc: 0.8703 - val_loss: 0.6178 - val_acc: 0.8434
Epoch 17/20
308/308 [==============================] - 232s 753ms/step - loss: 0.3993 - acc: 0.8671 - val_loss: 0.6077 - val_acc: 0.8434
Epoch 18/20
308/308 [==============================] - 233s 755ms/step - loss: 0.3665 - acc: 0.8758 - val_loss: 0.6093 - val_acc: 0.8405
Epoch 19/20
308/308 [==============================] - 233s 756ms/step - loss: 0.3575 - acc: 0.8801 - val_loss: 0.5789 - val_acc: 0.8508
Epoch 20/20
308/308 [==============================] - 236s 766ms/step - loss: 0.3536 - acc: 0.8840 - val_loss: 0.6020 - val_acc: 0.8464
[INFO] evaluating after fine-tuning network...
               precision    recall  f1-score   support

        Bread       0.86      0.78      0.82       368
Dairy product       0.85      0.65      0.74       148
      Dessert       0.83      0.79      0.81       500
          Egg       0.84      0.84      0.84       335
   Fried food       0.75      0.92      0.82       287
         Meat       0.89      0.88      0.88       432
      Noodles       0.99      0.95      0.97       147
         Rice       0.88      0.95      0.91        96
      Seafood       0.86      0.91      0.88       303
         Soup       0.97      0.95      0.96       500
    Vegetable       0.86      0.96      0.91       231

    micro avg       0.87      0.87      0.87      3347
    macro avg       0.87      0.87      0.87      3347
 weighted avg       0.87      0.87      0.87      3347

[INFO] serializing network...

Figure 7: We have unfrozen the final CONV block and resumed fine-tuning with Keras and deep learning. Training and validation loss are starting to divide indicating the start of overfitting, so fine-tuning stops at epoch 20.

I decided to not train past epoch 20 for fear of overfitting. If you take a look at Figure 7 you can start to see our training and validation loss start to rapidly divide. When you see training loss falling quickly while validation loss stagnates or even increases, you know you are overfitting.

That said, at the end of our fine-tuning process, we are now obtaining 87% accuracy, a significant increase from just fine-tuning the FC layer heads alone!

Making predictions with fine-tuning and Keras

Now that we’ve fine-tuned our Keras model, let’s see how we can use it to make predictions on images outside the training/testing set (i.e., our own custom images).

Open up

predict.py

and insert the following code:

# import the necessary packages
from keras.models import load_model
from pyimagesearch import config
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str, required=True,
	help="path to our input image")
args = vars(ap.parse_args())

Lines 2-7 import our required packages. We’re going to use

load_model

to recall our Keras fine-tuned model from disk and make predictions. This is also the first time today that we will use OpenCV (

cv2

On Lines 10-13 we parse our command line argument. The

--image

argument allows us to supply any image from our terminal at runtime with no modifications to the code. It makes sense to take advantage of a command line argument rather than hard-coding the value here or in our config.

Let’s go ahead and load that image from disk and preprocess it:

# load the input image and then clone it so we can draw on it later
image = cv2.imread(args["image"])
output = image.copy()
output = imutils.resize(output, width=400)

# our model was trained on RGB ordered images but OpenCV represents
# images in BGR order, so swap the channels, and then resize to
# 224x224 (the input dimensions for VGG16)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))

# convert the image to a floating point data type and perform mean
# subtraction
image = image.astype("float32")
mean = np.array([123.68, 116.779, 103.939][::1], dtype="float32")
image -= mean

Lines 16-30 load and preprocess our

image

. The preprocessing steps are identical to training and include:

Making a
```
copy
```
of the image and resizing it for
```
output
```
purposes (Lines 17 and 18).
Swapping color channels since we trained with RGB images and OpenCV loaded this
```
image
```
in BGR order (Line 23).
Resizing the
```
image
```
to 224×224 pixels for inference (Line 24).
Converting the
```
image
```
to floating point (Line 28).
Performing mean subtraction (Lines 29 and 30).

Note: When we perform inference using a custom prediction script, if the results are unsatisfactory nine times out of ten it is due to improper preprocessing. Typically having color channels in the wrong order or forgetting to perform mean subtraction altogether will lead to unfavorable results. Keep this in mind when writing your own scripts.

Now that our image is ready, let’s predict its class label:

# load the trained model from disk
print("[INFO] loading model...")
model = load_model(config.MODEL_PATH)

# pass the image through the network to obtain our predictions
preds = model.predict(np.expand_dims(image, axis=0))[0]
i = np.argmax(preds)
label = config.CLASSES[i]

# draw the prediction on the output image
text = "{}: {:.2f}%".format(label, preds[i] * 100)
cv2.putText(output, text, (3, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
	(0, 255, 0), 2)

# show the output image
cv2.imshow("Output", output)
cv2.waitKey(0)

We load our fine-tuned

model

via Line 34 and then perform inference. The top prediction class

label

is extracted on Lines 37-39.

Finally, we annotate the

output

image and display it on screen (Lines 42-48). The

text

annotation contains the highest prediction along with its associated confidence.

On to the fun part — testing our script on food! I’m hungry just thinking about it and I bet you may be too.

Keras fine-tuning results

To see our fine-tuned Keras model in action, make sure you use the “Downloads” section of this tutorial to download the source code and example images.

From there, open up a terminal and execute the following command:

$ python predict.py --image dataset/evaluation/Seafood/8_186.jpg

Figure 8: Our fine-tuned Keras deep learning network correctly recognizes oysters as “seafood”.

As you can see from Figure 7, we have correctly classified the input image as “Seafood”.

Let’s try another example:

$ python predict.py --image dataset/evaluation/Meat/5_293.jpg

Figure 9: With 64% accuracy this image of chicken wings is classified as “fried food”. We have applied the process fine-tuning to a pre-trained model to recognize new classes with Keras and deep learning.

Our fine-tuned network has labeled the image as “Fried food” despite it being in the “Meat” class in our dataset.

Chicken wings are typically fried and these ones clearly are. They are both “Meat” and “Fried food” which is why we are pulled in two directions. Therefore, I’m still declaring it as a “correct” classification. A fun experiment would be to apply fine-tuning with multi-label classification. I’ll leave that as an exercise to you to implement.

Below I have included a few additional results from my fine-tuning experiments:

Figure 10: Fine-tuning with Keras and deep learning on the Food-11 dataset.

What’s next — where do I learn more about transfer learning, feature extraction, and fine-tuning?

Over the past few weeks since we started this series on transfer learning with Keras, I’ve received a number of emails and comments that are some variation of the following:

“How can I determine the number of nodes to put in my fully connected layer head when fine-tuning?”
“What optimizer and learning rate should I use for fine-tuning?”
“Which CONV layers (and when) should I freeze and unfreeze?”
“How do I classify images outside my training/testing set?”
“How do I load an image from disk, extract features from it using a CNN, and then classify it using the neural network?”
“How do I correctly preprocess my input image before classification?”

Today’s tutorial is long enough as it is, so I can’t include those sections of Deep Learning for Computer Vision with Python inside this post.

If you’d like to learn more about transfer learning, including:

More details on the concept of transfer learning
How to perform feature extraction
How to fine-tune networks
How to classify images outside your training/testing set using both feature extraction and fine-tuning

…then you’ll definitely want to refer to Deep Learning for Computer Vision with Python.

Besides chapters on transfer learning, you’ll also find:

Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Grab my free sample chapters!

Summary

In this tutorial, you learned how to perform fine-tuning with Keras and deep learning.

To perform fine-tuning, we:

Loaded the VGG16 network architecture from disk with weights pre-trained on ImageNet.
Ensured the original fully connected layer heads were removed (i.e., where the output predictions from the network are made).
Replaced the originally fully connected layers with brand new, freshly initialized ones.
Froze all CONV layers in VGG16.
Trained only the fully connected layer heads.
Unfroze the final set of CONV layer blocks in VGG16.
Continued training.

Overall, we were able to obtain 87% accuracy on the Food-11 dataset.

Further accuracy can be obtained by applying additional data augmentation and adjusting the parameters to our optimizer and number of FC layer nodes.

If you’re interested in learning more about fine-tuning with Keras, including my tips, suggestions, and best practices, be sure to take a look at Deep Learning for Computer Vision with Python where I cover fine-tuning in more detail.

I hope you enjoyed today’s tutorial on fine-tuning!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

The post Fine-tuning with Keras and Deep Learning appeared first on PyImageSearch.