Quantcast
Channel: Tutorials Archives - PyImageSearch
Viewing all 432 articles
Browse latest View live

Keras Mask R-CNN

$
0
0

In this tutorial, you will learn how to use Keras and Mask R-CNN to perform instance segmentation (both with and without a GPU).

Using Mask R-CNN we can perform both:

  1. Object detection, giving us the (x, y)-bounding box coordinates of for each object in an image.
  2. Instance segmentation, enabling us to obtain a pixel-wise mask for each individual object in an image.

An example of instance segmentation via Mask R-CNN can be seen in the image at the top of this tutorial — notice how we not only have the bounding box of the objects in the image, but we also have pixel-wise masks for each object as well, enabling us to segment each individual object (something that object detection alone does not give us).

Instance segmentation, along with Mask R-CNN, powers some of the recent advances in the “magic” we see in computer vision, including self-driving cars, robotics, and more.

In the remainder of this tutorial, you will learn how to use Mask R-CNN with Keras, including how to perform instance segmentation on your own images.

To learn more about Keras and Mask R-CNN, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras Mask R-CNN

In the first part of this tutorial, we’ll briefly review the Mask R-CNN architecture. From there, we’ll review our directory structure for this project and then install Keras + Mask R-CNN on our system.

I’ll then show you how to implement Mask R-CNN and Keras using Python.

Finally, we’ll apply Mask R-CNN to our own images and examine the results.

I’ll also share resources on how to train a Mask R-CNN model on your own custom dataset.

The History of Mask R-CNN

Figure 1: The Mask R-CNN architecture by He et al. enables object detection and pixel-wise instance segmentation. This blog post uses Keras to work with a Mask R-CNN model trained on the COCO dataset.

The Mask R-CNN model for instance segmentation has evolved from three preceding architectures for object detection:

  • R-CNN: An input image is presented to the network, Selective Search is run on the image, and then the output regions from Selective Search are used for feature extraction and classification using a pre-trained CNN.
  • Fast R-CNN: Still uses the Selective Search algorithm to obtain region proposals, but adds the Region of Interest (ROI) Pooling module. Extracts a fixed-size window from the feature map and uses the features to obtain the final class label and bounding box. The benefit is that the network is now end-to-end trainable.
  • Faster R-CNN: Introduces the Regional Proposal Network (RPN) that bakes the region proposal directly into the architecture, alleviating the need for the Selective Search algorithm.

The Mask R-CNN algorithm builds on the previous Faster R-CNN, enabling the network to not only perform object detection but pixel-wise instance segmentation as well!

I’ve covered Mask R-CNN in-depth inside both:

  1. The “What is Mask R-CNN?” section of the Mask R-CNN with OpenCV post.
  2. My book, Deep Learning for Computer Vision with Python.

Please refer to those resources for more in-depth details on how the architecture works, including the ROI Align module and how it facilitates instance segmentation.

Project structure

Go ahead and use the “Downloads” section of today’s blog post to download the code and pre-trained model. Let’s inspect our Keras Mask R-CNN project structure:

$ tree --dirsfirst
.
├── images
│   ├── 30th_birthday.jpg
│   ├── couch.jpg
│   ├── page_az.jpg
│   └── ybor_city.jpg
├── coco_labels.txt
├── mask_rcnn_coco.h5
└── maskrcnn_predict.py

1 directory, 7 files

Our project consists of a testing

images/
  directory as well as three files:
  • coco_labels.txt
     : Comprised of a line-by-line listing of 81 class labels. The first label is the “background” class, so typically we say there are 80 classes.
  • mask_rcnn_coco.h5
     : Our pre-trained Mask R-CNN model weights file which will be loaded from disk.
  • maskrcnn_predict.py
     : The Mask R-CNN demo script loads the labels and model/weights. From there, an inference is made on a testing image provided via a command line argument. You may test with one of our own images or any in the 
    images/
      directory included with the “Downloads”.

Before we review today’s script, we’ll install Keras + Mask R-CNN and then we’ll briefly review the COCO dataset.

Installing Keras Mask R-CNN

The Keras + Mask R-CNN installation process is quote straightforward with pip, git, and

setup.py
 . I recommend you install these packages in a dedicated virtual environment for today’s project so you don’t complicate your system’s package tree.

First, install the required Python packages:

$ pip install numpy scipy
$ pip install pillow scikit-image matplotlib
$ pip install "IPython[all]"
$ pip install tensorflow # or tensorflow-gpu
$ pip install keras h5py

Be sure to install

tensorflow-gpu
  if you have a GPU, CUDA, and cuDNN installed in your machine.

From there, go ahead and install OpenCV, either via pip or compiling from source:

$ pip install opencv-contrib-python

Next, we’ll install the Matterport implementation of Mask R-CNN in Keras:

$ git clone https://github.com/matterport/Mask_RCNN.git
$ cd Mask_RCNN
$ python setup.py install

Finally, fire up a Python interpreter in your virtual environment to verify that Mask R-CNN + Keras and OpenCV have been successfully installed:

$ python
>>> import mrcnn
>>> import cv2
>>>

Provided that there are no import errors, your environment is now ready for today’s blog post.

Mask R-CNN and COCO

The Mask R-CNN model we’ll be using here today is pre-trained on the COCO dataset.

This dataset includes a total of 80 classes (plus one background class) that you can detect and segment from an input image (with the first class being the background class). I have included the labels file named

coco_labels.txt
  in the “Downloads” associated with this post, but out of convenience, I have included them here for you:
  1. BG
  2. person
  3. bicycle
  4. car
  5. motorcycle
  6. airplane
  7. bus
  8. train
  9. truck
  10. boat
  11. traffic light
  12. fire hydrant
  13. stop sign
  14. parking meter
  15. bench
  16. bird
  17. cat
  18. dog
  19. horse
  20. sheep
  21. cow
  22. elephant
  23. bear
  24. zebra
  25. giraffe
  26. backpack
  27. umbrella
  1. handbag
  2. tie
  3. suitcase
  4. frisbee
  5. skis
  6. snowboard
  7. sports ball
  8. kite
  9. baseball bat
  10. baseball glove
  11. skateboard
  12. surfboard
  13. tennis racket
  14. bottle
  15. wine glass
  16. cup
  17. fork
  18. knife
  19. spoon
  20. bowl
  21. banana
  22. apple
  23. sandwich
  24. orange
  25. broccoli
  26. carrot
  27. hot dog
  1. pizza
  2. donut
  3. cake
  4. chair
  5. couch
  6. potted plant
  7. bed
  8. dining table
  9. toilet
  10. tv
  11. laptop
  12. mouse
  13. remote
  14. keyboard
  15. cell phone
  16. microwave
  17. oven
  18. toaster
  19. sink
  20. refrigerator
  21. book
  22. clock
  23. vase
  24. scissors
  25. teddy bear
  26. hair drier
  27. toothbrush

In the next section, we’ll learn how to use Keras and Mask R-CNN to detect and segment each of these classes.

Implementing Mask R-CNN with Keras and Python

Let’s get started implementing Mask R-CNN segmentation script.

Open up the

maskrcnn_predict.py
  and insert the following code:
# import the necessary packages
from mrcnn.config import Config
from mrcnn import model as modellib
from mrcnn import visualize
import numpy as np
import colorsys
import argparse
import imutils
import random
import cv2
import os

Lines 2-11 import our required packages.

The

mrcnn
  imports are from Matterport’s implementation of Mask R-CNN. From
mrcnn
 , we’ll use
Config
  to create a custom subclass for our configuration,
modellib
  to load our model, and
visualize
  to draw our masks.

Let’s go ahead and parse our command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-w", "--weights", required=True,
	help="path to Mask R-CNN model weights pre-trained on COCO")
ap.add_argument("-l", "--labels", required=True,
	help="path to class labels file")
ap.add_argument("-i", "--image", required=True,
	help="path to input image to apply Mask R-CNN to")
args = vars(ap.parse_args())

Our script requires three command line arguments:

  • --weights
     : The path to our Mask R-CNN model weights pre-trained on COCO.
  • --labels
     : The path to our COCO class labels text file.
  • --image
     : Our input image path. We’ll be performing instance segmentation on the image provided via the command line.

Using the second argument, let’s go ahead and load our

CLASS_NAMES
  and
COLORS
  for each:
# load the class label names from disk, one label per line
CLASS_NAMES = open(args["labels"]).read().strip().split("\n")

# generate random (but visually distinct) colors for each class label
# (thanks to Matterport Mask R-CNN for the method!)
hsv = [(i / len(CLASS_NAMES), 1, 1.0) for i in range(len(CLASS_NAMES))]
COLORS = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
random.seed(42)
random.shuffle(COLORS)

Line 24 loads the COCO class label names directly from the text file into a list.

From there, Lines 28-31 generate random, distinct

COLORS
  for each class label. The method comes from Matterport’s Mask R-CNN implementation on GitHub.

Let’s go ahead and construct our

SimpleConfig
  class:
class SimpleConfig(Config):
	# give the configuration a recognizable name
	NAME = "coco_inference"

	# set the number of GPUs to use along with the number of images
	# per GPU
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

	# number of classes (we would normally add +1 for the background
	# but the background class is *already* included in the class
	# names)
	NUM_CLASSES = len(CLASS_NAMES)

Our

SimpleConfig
  class inherits from Matterport’s Mask R-CNN
Config
  (Line 33).

The configuration is given a

NAME
 (Line 35).

From there we set the

GPU_COUNT
  and
IMAGES_PER_GPU
  (i.e., batch). If you have a GPU and
tensorflow-gpu
installed then Keras + Mask R-CNN will automatically use your GPU. If not, your CPU will be used instead.

Note: I performed today’s experiment on a machine using a single Titan X GPU, so I set my

GPU_COUNT = 1
 . While my 12GB GPU could technically handle more than one image at a time (either during training or during prediction as in this script), I decided to set
IMAGES_PER_GPU = 1
  as most readers will not have a GPU with as much memory. Feel free to increase this value if your GPU can handle it.

Our

NUM_CLASSES
  is then set equal to the length of the
CLASS_NAMES
  list (Line 45).

Next, we’ll initialize our config and load our model:

# initialize the inference configuration
config = SimpleConfig()

# initialize the Mask R-CNN model for inference and then load the
# weights
print("[INFO] loading Mask R-CNN model...")
model = modellib.MaskRCNN(mode="inference", config=config,
	model_dir=os.getcwd())
model.load_weights(args["weights"], by_name=True)

Line 48 instantiates our

config
 .

Then, using our

config
 , Lines 53-55 load our Mask R-CNN
model
  pre-trained on the COCO dataset.

Let’s go ahead and perform instance segmentation:

# load the input image, convert it from BGR to RGB channel
# ordering, and resize the image
image = cv2.imread(args["image"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = imutils.resize(image, width=512)

# perform a forward pass of the network to obtain the results
print("[INFO] making predictions with Mask R-CNN...")
r = model.detect([image], verbose=1)[0]

Lines 59-61 load and preprocess our

image
 . Our model expects images in RGB format so we use
cv2.cvtColor
  to swap the color channels (in contrast OpenCV’s default BGR color channel ordering).

Line 65 then performs a forward pass of the

image
  through the network to make both object detection and pixel-wise mask predictions.

The remaining two code blocks will process the results so that we can visualize the objects’ bounding boxes and masks using OpenCV:

# loop over of the detected object's bounding boxes and masks
for i in range(0, r["rois"].shape[0]):
	# extract the class ID and mask for the current detection, then
	# grab the color to visualize the mask (in BGR format)
	classID = r["class_ids"][i]
	mask = r["masks"][:, :, i]
	color = COLORS[classID][::-1]

	# visualize the pixel-wise mask of the object
	image = visualize.apply_mask(image, mask, color, alpha=0.5)

In order to visualize the results, we begin by looping over object detections (Line 68). Inside the loop, we:

  • Grab the unique
    classID
      integer (Line 71).
  • Extract the
    mask
      for the current detection (Line 72).
  • Determine the
    color
      used to
    visualize
      the mask (Line 73).
  • Apply/draw our predicted pixel-wise mask on the object using a semi-transparent
    alpha
      channel (Line 76).

From here, we’ll draw bounding boxes and class label + score texts for each object in the image:

# convert the image back to BGR so we can use OpenCV's drawing
# functions
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# loop over the predicted scores and class labels
for i in range(0, len(r["scores"])):
	# extract the bounding box information, class ID, label, predicted
	# probability, and visualization color
	(startY, startX, endY, endX) = r["rois"][i]
	classID = r["class_ids"][i]
	label = CLASS_NAMES[classID]
	score = r["scores"][i]
	color = [int(c) for c in np.array(COLORS[classID]) * 255]

	# draw the bounding box, class label, and score of the object
	cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
	text = "{}: {:.3f}".format(label, score)
	y = startY - 10 if startY - 10 > 10 else startY + 10
	cv2.putText(image, text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX,
		0.6, color, 2)

# show the output image
cv2.imshow("Output", image)
cv2.waitKey()

Line 80 converts our

image
  back to BGR (OpenCV’s default color channel ordering).

On Line 83 we begin looping over objects. Inside the loop, we:

  • Extract the bounding box coordinates,
    classID
     ,
    label
     , and
    score
      (Lines 86-89).
  • Compute the
    color
      for the bounding box and text (Line 90).
  • Draw each bounding box (Line 93).
  • Concatenate the class/probability
    text
      (Line 94) and then draw it at the top of the
    image
      (Lines 95-97).

Once the process is complete, the resulting output

image
  is displayed to the screen until a key is pressed (Lines 100-101).

Mask R-CNN and Keras results

Now that our Mask R-CNN script has been implemented, let’s give it a try.

Make sure you have used the “Downloads” section of this tutorial to download the source code.

You will need to know the concept of command line arguments to run the code. If it is unfamiliar to you, read up on argparse and command line arguments before you try to execute the code.

When you’re ready, open up a terminal and execute the following command:

$ python maskrcnn_predict.py --weights mask_rcnn_coco.h5 --labels coco_labels.txt \
	--image images/30th_birthday.jpg
Using TensorFlow backend.
[INFO] loading Mask R-CNN model...
[INFO] making predictions with Mask R-CNN...
Processing 1 images
image                    shape: (682, 512, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

Figure 2: The Mask R-CNN model trained on COCO created a pixel-wise map of the Jurassic Park jeep (truck), my friend, and me while we celebrated my 30th birthday.

For my 30th birthday, my wife found a person to drive us around Philadelphia in a replica Jurassic Park jeep — here my best friend and I are outside The Academy of Natural Sciences.

Notice how not only bounding boxes are produced for each object (i.e., both people and the jeep), but also pixel-wise masks as well!

Let’s give another image a try:

$ python maskrcnn_predict.py --weights mask_rcnn_coco.h5 --labels coco_labels.txt \
	--image images/couch.jpg
Using TensorFlow backend.
[INFO] loading Mask R-CNN model...
[INFO] making predictions with Mask R-CNN...
Processing 1 images
image                    shape: (682, 512, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

Figure 3: My dog, Janie, has been segmented from the couch and chair using a Keras and Mask R-CNN deep learning model.

Here is a super adorable photo of my dog, Janie, laying on the couch:

  1. Despite the vast majority of the couch not being visible, the Mask R-CNN is still able to label it as such.
  2. The Mask R-CNN is correctly able to label the dog in the image.
  3. And even though my coffee cup is barely visible, Mask R-CNN is able to label the cup as well (if you look really closely you’ll see that my coffee cup is a Jurassic Park mug!)

The only part of the image that Mask R-CNN is not able to correctly label is the back part of the couch which it mistakes as a chair — looking at the image closely, you can see how Mask R-CNN made the mistake (the region does look quite chair-like versus being part of the couch).

Here’s another example of using Keras + Mask R-CNN for instance segmentation:

$ python maskrcnn_predict.py --weights mask_rcnn_coco.h5 --labels coco_labels.txt \
	--image images/page_az.jpg
Using TensorFlow backend.
[INFO] loading Mask R-CNN model...
[INFO] making predictions with Mask R-CNN...
Processing 1 images
image                    shape: (682, 512, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  149.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

Figure 4: A Mask R-CNN segmented image (created with Keras, TensorFlow, and Matterport’s Mask R-CNN implementation). This picture is of me in Page, AZ.

A few years ago, my wife and I made a trip out to Page, AZ (this particular photo was taken just outside Horseshoe Bend) — you can see how the Mask R-CNN has not only detected me but also constructed a pixel-wise mask for my body.

Let’s apply Mask R-CNN to one final image:

$ python maskrcnn_predict.py --weights mask_rcnn_coco.h5 --labels coco_labels.txt \
	--image images/ybor_city.jpg
Using TensorFlow backend.
[INFO] loading Mask R-CNN model...
[INFO] making predictions with Mask R-CNN...
Processing 1 images
image                    shape: (688, 512, 3)         min:    5.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

Figure 5: Keras + Mask R-CNN with Python of a picture from Ybor City.

One of my favorite cities to visit in the United States is Ybor City — there’s just something I like about the area (and perhaps it’s that the roosters are a protected in thee city and free to roam around).

Here you can see me and such a rooster — notice how each of us is correctly labeled and segmented by the Mask R-CNN. You’ll also notice that the Mask R-CNN model was able to localize each of the individual cars and label the bus!

Can Mask R-CNN run in real-time?

At this point you’re probably wondering if it’s possible to run Keras + Mask R-CNN in real-time, right?

As you know from “The History of Mask R-CNN?” section above, Mask R-CNN is based on the Faster R-CNN object detectors.

Faster R-CNNs are incredibly computationally expensive, and when you add instance segmentation on top of object detection, the model only becomes more computationally expensive, therefore:

  • On a CPU, a Mask R-CNN cannot run in real-time.
  • But on a GPU, Mask R-CNN can get up to 5-8 FPS.

If you would like to run Mask R-CNN in semi-real-time, you will need a GPU.

How can I train a Mask R-CNN model on my own custom dataset?

Figure 6: Inside my book, Deep Learning for Computer Vision with Python, you will learn how to annotate your own training data, train your custom Mask R-CNN, and apply it to your own images. I also provide two case studies on (1) skin lesion/cancer segmentation and (2) prescription pill segmentation, a first step in pill identification.

The Mask R-CNN model we used in this tutorial was pre-trained on the COCO dataset…

…but what if you wanted to train a Mask R-CNN on your own custom dataset?

Inside my book, Deep Learning for Computer Vision with Python, I:

  1. Teach you how to train a Mask R-CNN to automatically detect and segment cancerous skin lesions — a first step in building an automatic cancer risk factor classification system.
  2. Provide you with my favorite image annotation tools, enabling you to create masks for your input images.
  3. Show you how to train a Mask R-CNN on your custom dataset.
  4. Provide you with my best practices, tips, and suggestions when training your own Mask R-CNN.

All of the Mask R-CNN chapters include a detailed explanation of both the algorithms and code, ensuring you will be able to successfully train your own Mask R-CNNs.

To learn more about my book (and grab your free set of sample chapters and table of contents), just click here.

Summary

In this tutorial, you learned how to use Keras + Mask R-CNN to perform instance segmentation.

Unlike object detection, which only gives you the bounding box (x, y)-coordinates for an object in an image, instance segmentation takes it a step further, yielding pixel-wise masks for each object.

Using instance segmentation we can actually segment an object from an image.

To perform instance segmentation we used the Matterport Keras + Mask R-CNN implementation.

We then created a Python script that:

  1. Constructed a configuration class for Mask R-CNN (both with and without a GPU).
  2. Loaded the Keras + Mask R-CNN architecture from disk
  3. Preprocessed our input image
  4. Detected objects/masks in the image
  5. Visualized the results

If you are interested in how to:

  1. Label and annotate your own custom image dataset
  2. And then train a Mask R-CNN model on top of your annotated dataset…

…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover Mask R-CNN and annotation in detail.

I hope you enjoyed today’s post!

To download the source code (including the pre-trained Keras + Mask R- CNN model), just enter your email address in the form below! I’ll be sure to let you know when future tutorials are published here on PyImageSearch.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Keras Mask R-CNN appeared first on PyImageSearch.


Change input shape dimensions for fine-tuning with Keras

$
0
0

In this tutorial, you will learn how to change the input shape tensor dimensions for fine-tuning using Keras. After going through this guide you’ll understand how to apply transfer learning to images with different image dimensions than what the CNN was originally trained on.

A few weeks ago I published a tutorial on transfer learning with Keras and deep learning — soon after the tutorial was published, I received a question from Francesca Maepa who asked the following:

Do you know of a good blog or tutorial that shows how to implement transfer learning on a dataset that has a smaller shape than the pre-trained model?

I created a really good pre-trained model, and would like to use some features for the pre-trained model and transfer them to a target domain that is missing certain feature training datasets and I’m not sure if I’m doing it right.

Francesca asks a great question.

Typically we think of Convolutional Neural Networks as accepting fixed size inputs (i.e., 224×224, 227×227, 299×299, etc.).

But what if you wanted to:

  1. Utilize a pre-trained network for transfer learning…
  2. …and then update the input shape dimensions to accept images with different dimensions than what the original network was trained on?

Why might you want to utilize different image dimensions?

There are two common reasons:

  • Your input image dimensions are considerably smaller than what the CNN was trained on and increasing their size introduces too many artifacts and dramatically hurts loss/accuracy.
  • Your images are high resolution and contain small objects that are hard to detect. Resizing to the original input dimensions of the CNN hurts accuracy and you postulate increasing resolution will help improve your model.

In these scenarios, you would wish to update the input shape dimensions of the CNN and then be able to perform transfer learning.

The question then becomes, is such an update possible?

Yes, in fact, it is.

Looking for the source code to this post?
Jump right to the downloads section.

Change input shape dimensions for fine-tuning with Keras

In the first part of this tutorial, we’ll discuss the concept of an input shape tensor and the role it plays with input image dimensions to a CNN.

From there we’ll discuss the example dataset we’ll be using in this blog post. I’ll then show you how to:

  1. Update the input image dimensions to pre-trained CNN using Keras.
  2. Fine-tune the updated CNN. Let’s get started!

What is an input shape tensor?

Figure 1: Convolutional Neural Networks built with Keras for deep learning have different input shape expectations. In this blog post, you’ll learn how to change input shape dimensions for fine-tuning with Keras.

When working with Keras and deep learning, you’ve probably either utilized or run into code that loads a pre-trained network via:

model = VGG16(weights="imagenet")

The code above is initializing the VGG16 architecture and then loading the weights for the model (pre-trained on ImageNet).

We would typically use this code when our project needs to classify input images that have class labels inside ImageNet (as this tutorial demonstrates).

When performing transfer learning or fine-tuning you may use the following code to leave off the fully-connected (FC) layer heads:

model = VGG16(weights="imagenet", include_top=False)

We’re still indicating that the pre-trained ImageNet weights should be used, but now we’re setting

include_top=False
 , indicating that the FC head should not be loaded.

This code would typically be utilized when you’re performing transfer learning either via feature extraction or fine-tuning.

Finally, we can update our code to include an input_tensor dimension:

model = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(224, 224, 3)))

We’re still loading VGG16 with weights pre-trained on ImageNet and we’re still leaving off the FC layer heads…but now we’re specifying an input shape of 224×224x3 (which are the input image dimensions that VGG16 was originally trained on, as seen in Figure 1, left).

That’s all fine and good — but what if we now wanted to fine-tune our model on 128×128px images?

That’s actually just a simple update to our model initialization:

model = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(128, 128, 3)))

Figure 1 (right) provides a visualization of the network updating the input tensor dimensions — notice how the input volume is now 128x128x3 (our updated, smaller dimensions) versus the previous 224x224x3 (the original, larger dimensions).

Updating the input shape dimensions of a CNN via Keras is that simple!

But there are a few caveats to look out for.

Can I make the input dimensions anything I want?

Figure 2: Updating a Keras CNN’s input shape is straightforward; however, there are a few caveats to take into consideration,

There are limits to how much you can update the image dimensions, both from an accuracy/loss perspective and from limitations of the network itself.

Consider the fact that CNNs reduce volume dimensions via two methods:

  1. Pooling (such as max-pooling in VGG16)
  2. Strided convolutions (such as in ResNet)

If your input image dimensions are too small then the CNN will naturally reduce volume dimensions during the forward propagation and then effectively “run out” of data.

In that case your input dimensions are too small.

I’ve included an error of what happens during that scenario below when, for example, when using 48×48 input images, I received this error message:

ValueError: Negative dimension size caused by subtracting 4 from 1 for 'average_pooling2d_1/AvgPool' (op: 'AvgPool') with input shapes: [?,1,1,512].

Notice how Keras is complaining that our volume is too small. You will encounter similar errors for other pre-trained networks as well. When you see this type of error, you know you need to increase your input image dimensions.

You can also make your input dimensions too large.

You won’t run into any errors per se, but you may see your network fail to obtain reasonable accuracy due to the fact that there are not enough layers in the network to:

  1. Learn robust, discriminative filters.
  2. Naturally reduce volume size via pooling or strided convolution.

If that happens, you have a few options:

  • Explore other (pre-trained) network architectures that are trained on larger input dimensions.
  • Tune your hyperparameters exhaustively, focusing first on learning rate.
  • Add additional layers to the network. For VGG16 you’ll use 3×3 CONV layers and max-pooling. For ResNet you’ll include residual layers with strided convolution.

The final suggestion will require you to update the network architecture and then perform fine-tuning on the newly initialized layers.

To learn more about fine-tuning and and transfer learning, along with my tips, suggestions, and best practices when training networks, make sure you refer to my book, Deep Learning for Computer Vision with Python.

Our example dataset

Figure 3: A subset of the Kaggle Dogs vs. Cats dataset is used for this Keras input shape example. Using a smaller dataset not only proves the point more quickly, but also allows just about any computer hardware to be used (i.e. no expensive GPU machine/instance necessary).

The dataset we’ll be using here today is a small subset of Kaggle’s Dogs vs. Cats dataset.

We also use this dataset inside Deep Learning for Computer Vision with Python to teach the fundamentals of training networks, ensuring that readers with either CPUs or GPUs can follow along and learn best practices when training models.

The dataset itself contains 2,000 images belonging to 2 classes (“cat” and dog”):

  • Cat: 1,000 images
  • Dog: 1,000 images

A visualization of the dataset can be seen in Figure 3 above.

In the remainder of this tutorial you’ll learn how to take this dataset and:

  1. Update the input shape dimensions for a pre-trained CNN.
  2. Fine-tune the CNN with the smaller image dimensions.

Installing necessary packages

All of today’s packages can be installed via pip.

I recommend that you create a Python virtual environment for today’s project, but it is not necessarily required. To learn how to create a virtual environment quickly and to install OpenCV into it, refer to my pip install opencv tutorial.

To install the packages for today’s project, just enter the following commands:

$ workon <env_name>
$ pip install opencv-contrib-python # includes numpy
$ pip install imutils
$ pip install matplotlib
$ pip install scikit-learn
$ pip install tensorflow # or tensorflow-gpu
$ pip install keras

Project structure

Go ahead and grab the code + dataset from the “Downloads section of today’s blog post.

Once you’ve extracted the .zip archive, you may inspect the project structure using the

tree
  command:
$ tree --dirsfirst --filelimit 10
.
├── dogs_vs_cats_small
│   ├── cats [1000 entries]
│   └── dogs [1000 entries]
├── plot.png
└── train.py

3 directories, 2 files

Our dataset is contained within the

dogs_vs_cats_small/
  directory. The two subdirectories contain images of our classes. If you’re working with a different dataset be sure the structure is
<dataset>/<class_name>
 .

Today we’ll be reviewing the

train.py
  script. The training script generates
plot.png
  containing our accuracy/loss curves.

Updating the input shape dimensions with Keras

It’s now time to update our input image dimensions with Keras and a pre-trained CNN.

Open up the

train.py
  file in your project structure and insert the following code:
# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.pooling import AveragePooling2D
from keras.applications import VGG16
from keras.layers.core import Dropout
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers import Input
from keras.models import Model
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os

Lines 2-20 import required packages:

  • keras
      and
    sklearn
      are for deep learning/machine learning. Be sure to refer to my extensive deep learning book, Deep Learning for Computer Vision with Python, to become more familiar with the classes and functions we use from these tools.
  • paths
      from imutils traverses a directory and enables us to list all images in a directory.
  • matplotlib
      will allow us to plot our training accuracy/loss history.
  • numpy
      is a Python package for numerical operations; one of the ways we’ll put it to work is for “mean subtraction”, a scaling/normalization technique.
  • cv2
      is OpenCV.
  • argparse
      will be used to read and parse command line arguments.

Let’s go ahead and parse the command line arguments now:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-e", "--epochs", type=int, default=25,
	help="# of epochs to train our network for")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our script accepts three command line arguments via Lines 23-30:

  • --dataset
     : The path to our input dataset. We’re using a condensed version of Dogs vs. Cats, but you could use other binary, 2-class datasets with little or no modification as well (provided they follow a similar structure).
  • --epochs
     : The number of times we’ll pass our data through the network during training; by default, we’ll train for
    25
      epochs unless a different value is supplied.
  • --plot
     : The path to our output accuracy/loss plot. Unless otherwise specified, the file will be named
    plot.png
      and placed in the project directory. If you are conducting multiple experiments, be sure to give your plots a different name each time for future comparison purposes.

Next, we will load and preprocess our images:

# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

# loop over the image paths
for imagePath in imagePaths:
	# extract the class label from the filename
	label = imagePath.split(os.path.sep)[-2]

	# load the image, swap color channels, and resize it to be a fixed
	# 128x128 pixels while ignoring aspect ratio
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image = cv2.resize(image, (128, 128))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

First, we grab our

imagePaths
  on Line 35 and then initialize our
data
  and
labels
  (Lines 36 and 37).

Lines 40-52 loop over the

imagePaths
  while first extracting the labels. Each image is loaded, the color channels are swapped, and the image is resized. The images and labels are added to the
data
  and
labels
  lists respectively.

VGG16 was trained on 224×224px images; however, I’d like to draw your attention to Line 48. Notice how we’ve resized our images to 128×128px. This resizing is an example of applying transfer learning on images with different dimensions.

Although Line 48 doesn’t fully answer Francesca Maepa’s question yet, we’re getting close.

Let’s go ahead and one-hot encode our labels as well as split our data:

# convert the data and labels to NumPy arrays
data = np.array(data)
labels = np.array(labels)

# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = np_utils.to_categorical(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25, stratify=labels, random_state=42)

Lines 55 and 56 convert our

data
  and
labels
  to NumPy array format.

Then, Lines 59-61 perform one-hot encoding on our labels. Essentially, this process converts our two labels (“cat” and “dog”) to arrays indicating which label is active/hot. If a training image is representative of a dog, then the value would be

[0, 1]
  where “dog” is hot. Otherwise, for a “cat”, the value would be
[1, 0]
 .

To reinforce the point, if for example, we had 5 classes of data, a one-hot encoded array may look like

[0, 0, 0, 1, 0]
  where the 4th element is hot indicating that the image is from the 4th class. For further details, please refer to Deep Learning for Computer Vision with Python.

Lines 65 and 66 mark 75% of our data for training and the remaining 25% for testing via the

train_test_split
  function.

Let’s now initialize our data augmentation generator. We’ll also establish our ImageNet mean for mean subtraction:

# initialize the training data augmentation object
trainAug = ImageDataGenerator(
	rotation_range=30,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

# initialize the validation/testing data augmentation object (which
# we'll be adding mean subtraction to)
valAug = ImageDataGenerator()

# define the ImageNet mean subtraction (in RGB order) and set the
# the mean subtraction value for each of the data augmentation
# objects
mean = np.array([123.68, 116.779, 103.939], dtype="float32")
trainAug.mean = mean
valAug.mean = mean

Lines 69-76 initialize a data augmentation object for performing random manipulations on our input images during training.

Line 80 also takes advantage of the

ImageDataGenerator
  class for validation, but without any parameters — we won’t manipulate validation images with the exception of performing mean subtraction.

Both training and validation/testing generators will conduct mean subtraction. Mean subtraction is a scaling/normalization technique proven to increase accuracy. Line 85 contains the mean for each respective RGB channel while Lines 86 and 87 are then populated with the value. Later, our data generators will automatically perform the mean subtraction on our training/validation data.

Note: I’ve covered data augmentation in detail in this blog post as well as in the Practitioner Bundle of Deep Learning for Computer Vision with Python. Scaling and normalization techniques such as mean subtraction are covered in DL4CV as well.

We’re performing transfer learning with VGG16. Let’s initialize the base model now:

# load VGG16, ensuring the head FC layer sets are left off, while at
# the same time adjusting the size of the input image tensor to the
# network
baseModel = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(128, 128, 3)))

# show a summary of the base model
print("[INFO] summary for base model...")
print(baseModel.summary())

Lines 92 and 93 load

VGG16
  with an input shape dimension of 128×128 using 3 channels.

Remember, VGG16 was originally trained on 224×224 imagesnow we’re updating the input shape dimensions to handle 128×128 images.

Effectively, we have now fully answered Francesca Maepa’s question! We accomplished changing the input dimensions via two steps:

  1. We resized all of our input images to 128×128.
  2. Then we set the input
    shape=(128, 128, 3)
     .

Line 97 will print a model summary in our terminal so that we can inspect it. Alternatively, you may visualize the model graphically by studying Chapter 19 “Visualizing Network Architectures” of Deep Learning for Computer Vision with Python.

Since we’re performing transfer learning, the

include_top
  parameter is set to
False
  (Line 92) — we chopped off the head!

Now we’re going to perform surgery by erecting a new head and suturing it onto the CNN:

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

Line 101 takes the output from the

baseModel
  and sets it as input to the
headModel
 .

From there, Lines 102-106 construct the rest of the head.

The

baseModel
  is already initialized with ImageNet weights per Line 92. On Lines 114 and 115, we set the base layers in VGG16 as not trainable (i.e., they will not be updated during the backpropagation phase). Be sure to read my previous fine-tuning tutorial for further explanation.

We’re now ready to compile and train the model with our data:

# compile our model (this needs to be done after our setting our
# layers to being non-trainable)
print("[INFO] compiling model...")
opt = Adam(lr=1e-4)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network for a few epochs (all other layers
# are frozen) -- this will allow the new FC layers to start to become
# initialized with actual "learned" values versus pure random
print("[INFO] training head...")
H = model.fit_generator(
	trainAug.flow(trainX, trainY, batch_size=32),
	steps_per_epoch=len(trainX) // 32,
	validation_data=valAug.flow(testX, testY),
	validation_steps=len(testX) // 32,
	epochs=args["epochs"])

Our

model
  is compiled with the
Adam
  optimizer and a
1e-4
  learning rate (Lines 120-122).

We use

"binary_crossentropy"
  for 2-class classification. If you have more than two classes of data, be sure to use
"categorical_crossentropy"
 .

Lines 128-133 then train our transfer learning network. Our training and validation generators are put to work in the process.

Upon training completion, we’ll evaluate the network and plot the training history:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=lb.classes_))

# plot the training loss and accuracy
N = args["epochs"]
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Lines 137-139 evaluate our

model
  and print a classification report for statistical analysis.

We then employ

matplotlib
  to plot our accuracy and loss history during training (Lines 142-152). The plot figure is saved to disk via Line 153.

Fine-tuning a CNN using the updated input dimensions

Figure 4: Changing Keras input shape dimensions for fine-tuning produced the following accuracy/loss training plot.

To fine-tune our CNN using the updated input dimensions first make sure you’ve used the “Downloads” section of this guide to download the (1) source code and (2) example dataset.

From there, open up a terminal and execute the following command:

$ python train.py --dataset dogs_vs_cats_small --epochs 25
Using TensorFlow backend.
[INFO] loading images...
[INFO] summary for base model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 128, 128, 3)       0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 128, 128, 64)      1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 128, 128, 64)      36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 64, 64, 64)        0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 64, 64, 128)       73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 64, 64, 128)       147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 32, 32, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 32, 32, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 32, 32, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 32, 32, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 16, 16, 256)       0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 16, 16, 512)       1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 16, 16, 512)       2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 16, 16, 512)       2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 8, 8, 512)         0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0

Our first set of output shows our updated input shape dimensions.

Notice how our

input_1
(i.e., the
InputLayer
) has input dimensions of 128x128x3 versus the normal 224x224x3 for VGG16.

The input image will then forward propagate through the network until the final

MaxPooling2D
  layer (i.e.,
block5_pool).

At this point, our output volume has dimensions of 4x4x512 (for reference, VGG16 with a 224x224x3 input volume would have the shape 7x7x512 after this layer).

Note: If your input image dimensions are too small then you risk the model, effectively, reducing the tensor volume into “nothing” and then running out of data, leading to an error. See the “Can I make the input dimensions anything I want?” section of this post for more details.

We then flatten that volume and apply the FC layers from the

headModel
 , ultimately leading to our final classification.

Once our model is constructed we can then fine-tune it:

[INFO] compiling model...
[INFO] training head...
Epoch 1/25
46/46 [==============================] - 9s 199ms/step - loss: 3.9048 - acc: 0.5897 - val_loss: 1.7824 - val_acc: 0.7438
Epoch 2/25
46/46 [==============================] - 8s 182ms/step - loss: 2.7716 - acc: 0.6815 - val_loss: 1.1589 - val_acc: 0.8248
Epoch 3/25
46/46 [==============================] - 6s 130ms/step - loss: 2.2621 - acc: 0.7312 - val_loss: 1.0531 - val_acc: 0.8504
Epoch 4/25
46/46 [==============================] - 6s 130ms/step - loss: 1.8732 - acc: 0.7665 - val_loss: 0.6999 - val_acc: 0.8718
Epoch 5/25
46/46 [==============================] - 6s 130ms/step - loss: 1.5874 - acc: 0.7684 - val_loss: 0.8887 - val_acc: 0.8697
...
Epoch 21/25
46/46 [==============================] - 6s 130ms/step - loss: 0.5067 - acc: 0.8722 - val_loss: 0.3941 - val_acc: 0.9060
Epoch 22/25
46/46 [==============================] - 6s 131ms/step - loss: 0.3869 - acc: 0.8822 - val_loss: 0.2982 - val_acc: 0.9274
Epoch 23/25
46/46 [==============================] - 6s 131ms/step - loss: 0.4629 - acc: 0.8767 - val_loss: 0.3193 - val_acc: 0.9252
Epoch 24/25
46/46 [==============================] - 6s 130ms/step - loss: 0.4304 - acc: 0.8822 - val_loss: 0.4016 - val_acc: 0.9103
Epoch 25/25
46/46 [==============================] - 6s 130ms/step - loss: 0.3874 - acc: 0.8800 - val_loss: 0.2538 - val_acc: 0.9466
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       0.94      0.92      0.93       250
        dogs       0.92      0.94      0.93       250

   micro avg       0.93      0.93      0.93       500
   macro avg       0.93      0.93      0.93       500
weighted avg       0.93      0.93      0.93       500

At the end of fine-tuning we see that our model has obtained 93% accuracy, respectable given our small image dataset.

As Figure 4 demonstrates, our training is also quite stable as well with no signs of overfitting.

More importantly, you now know how to change the input image shape dimensions of a pre-trained network and then apply feature extraction/fine-tuning using Keras!

Be sure to use this tutorial as a template for whenever you need to apply transfer learning to a pre-trained network with different image dimensions than what it was originally trained on.

Summary

In this tutorial, you learned how to change input shape dimensions for fine-tuning with Keras.

We typically perform such an operation when we want to apply transfer learning, including both feature extraction and fine-tuning.

Using the methods in this guide, you can update your input image dimensions for your pre-trained CNN and then perform transfer learning; however, there are two caveats you need to look out for:

  1. If your input images are too small, Keras will error out.
  2. If your input images are too large, you may not obtain your desired accuracy.

Be sure to refer to the “Can I make the input dimensions anything I want?” section of this post for more details on these caveats, including suggestions on how to solve them.

I hope you enjoyed this tutorial!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Change input shape dimensions for fine-tuning with Keras appeared first on PyImageSearch.

Keras ImageDataGenerator and Data Augmentation

$
0
0

In today’s tutorial, you will learn how to use Keras’ ImageDataGenerator class to perform data augmentation. I’ll also dispel common confusions surrounding what data augmentation is, why we use data augmentation, and what it does/does not do.

Knowing that I was going to write a tutorial on data augmentation, two weekends ago I decided to have some fun and purposely post a semi-trick question on my Twitter feed.

The question was simple — data augmentation does which of the following?

  1. Adds more training data
  2. Replaces training data
  3. Does both
  4. I don’t know

Here are the results:

Figure 1: My @PyImageSearch twitter poll on the concept of Data Augmentation.

Only 5% of respondents answered this trick question “correctly” (at least if you’re using Keras’ ImageDataGenerator class).

Again, it’s a trick question so that’s not exactly a fair assessment, but here’s the deal:

While the word “augment” means to make something “greater” or “increase” something (in this case, data), the Keras ImageDataGenerator class actually works by:

  1. Accepting a batch of images used for training.
  2. Taking this batch and applying a series of random transformations to each image in the batch (including random rotation, resizing, shearing, etc.).
  3. Replacing the original batch with the new, randomly transformed batch.
  4. Training the CNN on this randomly transformed batch (i.e., the original data itself is not used for training).

That’s right — the Keras ImageDataGenerator class is not an “additive” operation. It’s not taking the original data, randomly transforming it, and then returning both the original data and transformed data.

Instead, the ImageDataGenerator accepts the original data, randomly transforms it, and returns only the new, transformed data.

But remember how I said this was a trick question?

Technically, all the answers are correct — but the only way you know if a given definition of data augmentation is correct is via the context of its application.

I’ll help you clear up some of the confusion regarding data augmentation (and give you the context you need to successfully apply it).

Inside the rest of today’s tutorial you will:

  • Learn about three types of data augmentation.
  • Dispel any confusion you have surrounding data augmentation.
  • Learn how to apply data augmentation with Keras and the
    ImageDataGenerator
      class.

To learn more about data augmentation, including using Keras’ ImageDataGenerator class, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras ImageDataGenerator and Data Augmentation

We’ll start this tutorial with a discussion of data augmentation and why we use it.

I’ll then cover the three types of data augmentation you’ll see when training deep neural networks:

  1. Dataset generation and data expansion via data augmentation (less common)
  2. In-place/on-the-fly data augmentation (most common)
  3. Combining dataset generation and in-place augmentation

From there I’ll teach you how to apply data augmentation to your own datasets (using all three methods) using Keras’

ImageDataGenerator
  class.

What is data augmentation?

Data augmentation encompasses a wide range of techniques used to generate “new” training samples from the original ones by applying random jitters and perturbations (but at the same time ensuring that the class labels of the data are not changed).

Our goal when applying data augmentation is to increase the generalizability of the model.

Given that our network is constantly seeing new, slightly modified versions of the input data, the network is able to learn more robust features.

At testing time we do not apply data augmentation and simply evaluate our trained network on the unmodified testing data — in most cases, you’ll see an increase in testing accuracy, perhaps at the expense of a slight dip in training accuracy.

A simple data augmentation example

Figure 2: Left: A sample of 250 data points that follow a normal distribution exactly. Right: Adding a small amount of random “jitter” to the distribution. This type of data augmentation increases the generalizability of our networks.

Let’s consider Figure 2 (left) of a normal distribution with zero mean and unit variance.

Training a machine learning model on this data may result in us modeling the distribution exactly — however, in real-world applications, data rarely follows such a nice, neat distribution.

Instead, to increase the generalizability of our classifier, we may first randomly jitter points along the distribution by adding some random values \epsilon drawn from a random distribution (right).

Our plot still follows an approximately normal distribution, but it’s not a perfect distribution as on the left.

A model trained on this modified, augmented data is more likely to generalize to example data points not included in the training set.

Computer vision and data augmentation

Figure 3: In computer vision, data augmentation performs random manipulations on images. It is typically applied in three scenarios discussed in this blog post.

In the context of computer vision, data augmentation lends itself naturally.

For example, we can obtain augmented data from the original images by applying simple geometric transforms, such as random:

  1. Translations
  2. Rotations
  3. Changes in scale
  4. Shearing
  5. Horizontal (and in some cases, vertical) flips

Applying a (small) amount of the transformations to an input image will change its appearance slightly, but it does not change the class label — thereby making data augmentation a very natural, easy method to apply for computer vision tasks.

Three types of data augmentation

There are three types of data augmentation you will likely encounter when applying deep learning in the context of computer vision applications.

Exactly which definition of data augmentation is “correct” is entirely dependent on the context of your project/set of experiments.

Take the time to read this section carefully as I see many deep learning practitioners confuse what data augmentation does and does not do.

Type #1: Dataset generation and expanding an existing dataset (less common)

Figure 4: Type #1 of data augmentation consists of dataset generation/dataset expansion. This is a less common form of data augmentation.

The first type of data augmentation is what I call dataset generation or dataset expansion.

As you know machine learning models, and especially neural networks, can require quite a bit of training data — but what if you don’t have very much training data in the first place?

Let’s examine the most trivial case where you only have one image and you want to apply data augmentation to create an entire dataset of images, all based on that one image.

To accomplish this task, you would:

  1. Load the original input image from disk.
  2. Randomly transform the original image via a series of random translations, rotations, etc.
  3. Take the transformed image and write it back out to disk.
  4. Repeat steps 2 and 3 a total of N times.

After performing this process you would have a directory full of randomly transformed “new” images that you could use for training, all based on that single input image.

This is, of course, an incredibly simplified example.

You more than likely have more than a single image — you probably have 10s or 100s of images and now your goal is to turn that smaller set into 1000s of images for training.

In those situations, dataset expansion and dataset generation may be worth exploring.

But there’s a problem with this approach — we haven’t exactly increased the ability of our model to generalize.

Yes, we have increased our training data by generating additional examples, but all of these examples are based on a super small dataset.

Keep in mind that our neural network is only as good as the data it was trained on.

We cannot expect to train a NN on a small amount of data and then expect it to generalize to data it was never trained on and has never seen before.

If you find yourself seriously considering dataset generation and dataset expansion, you should take a step back and instead invest your time gathering additional data or looking into methods of behavioral cloning (and then applying the type of data augmentation covered in the “Combining dataset generation and in-place augmentation” section below).

Type #2: In-place/on-the-fly data augmentation (most common)

Figure 5: Type #2 of data augmentation consists of on-the-fly image batch manipulations. This is the most common form of data augmentation with Keras.

The second type of data augmentation is called in-place data augmentation or on-the-fly data augmentation. This type of data augmentation is what Keras’

ImageDataGenerator
  class implements.

Using this type of data augmentation we want to ensure that our network, when trained, sees new variations of our data at each and every epoch.

Figure 5 demonstrates the process of applying in-place data augmentation:

  1. Step #1: An input batch of images is presented to the
    ImageDataGenerator
     .
  2. Step #2: The
    ImageDataGenerator
      transforms each image in the batch by a series of random translations, rotations, etc.
  3. Step #3: The randomly transformed batch is then returned to the calling function.

There are two important points that I want to draw your attention to:

  1. The
    ImageDataGenerator
      is not returning both the original data and the transformed data — the class only returns the randomly transformed data.
  2. We call this “in-place” and “on-the-fly” data augmentation because this augmentation is done at training time (i.e., we are not generating these examples ahead of time/prior to training).

When our model is being trained, we can think of our

ImageDataGenerator
  class as “intercepting” the original data, randomly transforming it, and then returning it to the neural network for training, all the while the NN has no idea the data was modified!

I’ve written previous tutorials on the PyImageSearch blog where readers think that Keras’ ImageDateGenerator class is an “additive operation”, similar to the following (incorrect) figure:

Figure 6: How Keras data augmentation does not work.

In the above illustration the

ImageDataGenerator
  accepts an input batch of images, randomly transforms the batch, and then returns both the original batch and modified data — again, this is not what the Keras
ImageDataGenerator
  does. Instead, the
ImageDataGenerator
  class will return just the randomly transformed data.

When I explain this concept to readers the next question is often:

But Adrian, what about the original training data? Why is it not used? Isn’t the original training data still useful for training?

Keep in mind that the entire point of the data augmentation technique described in this section is to ensure that the network sees “new” images that it has never “seen” before at each and every epoch.

If we included the original training data along with the augmented data in each batch, then the network would “see” the original training data multiple times, effectively defeating the purpose. Secondly, recall that the overall goal of data augmentation is to increase the generalizability of the model.

To accomplish this goal we “replace” the training data with randomly transformed, augmented data.

In practice, this leads to a model that performs better on our validation/testing data but perhaps performs slightly worse on our training data (to due to the variations in data caused by the random transforms).

You’ll learn how to use the Keras

ImageDataGenerator
  class later in this tutorial.

Type #3: Combining dataset generation and in-place augmentation

The final type of data augmentation seeks to combine both dataset generation and in-place augmentation — you may see this type of data augmentation when performing behavioral cloning.

A great example of behavioral cloning can be seen in self-driving car applications.

Creating self-driving car datasets can be extremely time consuming and expensive — a way around the issue is to instead use video games and car driving simulators.

Video game graphics have become so life-like that it’s now possible to use them as training data.

Therefore, instead of driving an actual vehicle, you can instead:

  • Play a video game
  • Write a program to play a video game
  • Use the underlying rendering engine of the video game

…all to generate actual data that can be used for training.

Once you have your training data you can go back and apply Type #2 data augmentation (i.e., in-place/on-the-fly data augmentation) to the data you gathered via your simulation.

Project structure

Before we dive into the code let’s first review our directory structure for the project:

$ tree --dirsfirst --filelimit 10
.
├── dogs_vs_cats_small
│   ├── cats [1000 entries]
│   └── dogs [1000 entries]
├── generated_dataset
│   ├── cats [100 entries]
│   └── dogs [100 entries]
├── pyimagesearch
│   ├── __init__.py
│   └── resnet.py
├── cat.jpg
├── dog.jpg
├── plot_dogs_vs_cats_no_aug.png
├── plot_dogs_vs_cats_with_aug.png
├── plot_generated_dataset.png
├── train.py
└── generate_images.py

7 directories, 9 files

First, there are two dataset directories which are not to be confused:

  • dogs_vs_cats_small/
     : A subset of the popular Kaggle Dogs vs. Cats competition dataset. In my curated subset, only 2,000 images (1,000 per class) are present (as opposed to the 25,000 images for the challenge).
  • generated_dataset/
     : We’ll create this generated dataset using the
    cat.jpg
      and
    dog.jpg
      images which are in the parent directory. We’ll utilize data augmentation Type #1 to generate this dataset automatically and fill this directory with images.

Next, we have our

pyimagesearch
  module which contains our implementation of the ResNet CNN classifier.

Today we’ll review two Python scripts:

  • train.py
     : Used to train models for both Type #1 and Type #2 (and optionally Type #3 if the user so wishes) data augmentation techniques. We’ll perform three training experiments resulting in each of the three
    plot*.png
      files in the project folder.
  • generate_images.py
     : Used to generate a dataset from a single image using Type #1.

Let’s begin.

Implementing our training script

In the remainder of this tutorial we’ll be performing three experiments:

  1. Experiment #1: Generate a dataset via dataset expansion and train a CNN on it.
  2. Experiment #2: Use a subset of the Kaggle Dogs vs. Cats dataset and train a CNN without data augmentation.
  3. Experiment #3: Repeat the second experiment, but this time with data augmentation.

All of these experiments will be accomplished using the same Python script.

Open up the

train.py
  script and let’s get started:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.utils import np_utils
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os

On Lines 2-18 our necessary packages are imported. Line 10 is our

ImageDataGenerator
  import from the Keras library — a class for data augmentation.

Let’s go ahead and parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-a", "--augment", type=int, default=-1,
	help="whether or not 'on the fly' data augmentation should be used")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our script accepts three command line arguments via the terminal:

  • --dataset
     : The path to the input dataset.
  • --augment
     : Whether “on-the-fly” data augmentation should be used (refer to type #2 above). By default, this method is not performed.
  • --plot
     : The path to the output training history plot.

Let’s proceed to initialize hyperparameters and load our image data:

# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-1
BS = 8
EPOCHS = 50

# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

# loop over the image paths
for imagePath in imagePaths:
	# extract the class label from the filename, load the image, and
	# resize it to be a fixed 64x64 pixels, ignoring aspect ratio
	label = imagePath.split(os.path.sep)[-2]
	image = cv2.imread(imagePath)
	image = cv2.resize(image, (64, 64))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

Training hyperparameters, including initial learning rate, batch size, and number of epochs to train for, are initialized on Lines 32-34.

From there Lines 39-53 grab

imagePaths
 , load images, and populate our
data
  and
labels
  lists. The only image preprocessing we perform at this point is to resize each image to 64×64px.

Next, let’s finish preprocessing, encode our labels, and partition our data:

# convert the data into a NumPy array, then preprocess it by scaling
# all pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0

# encode the labels (which are currently strings) as integers and then
# one-hot encode them
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = np_utils.to_categorical(labels, 2)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25, random_state=42)

On Line 57, we convert data to a NumPy array as well as scale all pixel intensities to the range [0, 1]. This completes our preprocessing.

From there we perform “one-hot encoding” of our

labels
  (Lines 61-63). This method of encoding our
labels
  results in an array that may look like this:
array([[0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.]], dtype=float32)

For this sample of data, there are two cats (

[1., 0.]
 ) and five dogs (
[0., 1]
 ) where the label corresponding to the image is marked as “hot”.

From there we partition our

data
  into training and testing splits marking 75% of our data for training and the remaining 25% for testing (Lines 67 and 68).

Now, we are ready to initialize our data augmentation object:

# initialize an our data augmenter as an "empty" image data generator
aug = ImageDataGenerator()

Line 71 initializes our empty data augmentation object (i.e., no augmentation will be performed). This is the default operation of this script.

Let’s check if we’re going to override the default with the

--augment
  command line argument:
# check to see if we are applying "on the fly" data augmentation, and
# if so, re-instantiate the object
if args["augment"] > 0:
	print("[INFO] performing 'on the fly' data augmentation")
	aug = ImageDataGenerator(
		rotation_range=20,
		zoom_range=0.15,
		width_shift_range=0.2,
		height_shift_range=0.2,
		shear_range=0.15,
		horizontal_flip=True,
		fill_mode="nearest")

Line 75 checks to see if we are performing data augmentation. If so, we re-initialize the data augmentation object with random transformation parameters (Lines 77-84). As the parameters indicate, random rotations, zooms, shifts, shears, and flips will be performed during in-place/on-the-fly data augmentation.

Let’s compile and train our model:

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / EPOCHS)
model = ResNet.build(64, 64, 3, 2, (2, 3, 4),
	(32, 64, 128, 256), reg=0.0001)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
print("[INFO] training network for {} epochs...".format(EPOCHS))
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BS),
	validation_data=(testX, testY),
	steps_per_epoch=len(trainX) // BS,
	epochs=EPOCHS)

Lines 88-92 construct our

ResNet
  model using Stochastic Gradient Descent optimization and learning rate decay. We use
"binary_crossentropy"
  loss for this 2-class problem. If you have more than two classes, be sure to use
"categorial_crossentropy"
 .

Lines 96-100 then train our model. The

aug
  object handles data augmentation in batches (although be sure to recall that the
aug
  object will only perform data augmentation if the 
--augment
  command line argument was set).

Finally, we’ll evaluate our model, print statistics, and generate a training history plot:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BS)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=le.classes_))

# plot the training loss and accuracy
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Line 104 makes predictions on the test set for evaluation purposes. A classification report is printed via Lines 105 and 106.

From there, Lines 109-120 generate and save an accuracy/loss training plot.

Generating a dataset/dataset expansion with data augmentation and Keras

In our first experiment, we will perform dataset expansion via data augmentation with Keras.

Our dataset will contain 2 classes and initially, the dataset will trivially contain only 1 image per class:

  • Cat: 1 image
  • Dog: 1 image

We’ll utilize Type #1 data augmentation (see the “Type #1: Dataset generation and expanding an existing dataset” section above) to generate a new dataset with 100 images per class:

  • Cat: 100 images
  • Dog: 100 images

Again, this meant to be an example — in a real-world application you would have 100s of example images, but we’re keeping it simple here so you can learn the concept.

Generating the example dataset

Figure 7: Data augmentation with Keras performs random manipulations on images.

Before we can train our CNN we first need to generate an example dataset.

From our “Project Structure” section above you know that we have two example images in our root directory:

cat.jpg
and
dog.jpg
. We will use these example images to generate 100 new training images per class (200 images in total).

To see how we can use data augmentation to generate new examples, open up the

generate_images.py
  file and follow along:
# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
import numpy as np
import argparse

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to the input image")
ap.add_argument("-o", "--output", required=True,
	help="path to output directory to store augmentation examples")
ap.add_argument("-t", "--total", type=int, default=100,
	help="# of training samples to generate")
args = vars(ap.parse_args())

Lines 2-6 import our necessary packages. Our

ImageDataGenerator
  is imported on Line 2 and will handle our data augmentation with Keras.

From there, we’ll parse three command line arguments:

  • --image
     : The path to the input image. We’ll generate additional random, mutated versions of this image.
  • --output
     : The path to the output directory to store the data augmentation examples.
  • --total
     : The number of sample images to generate.

Let’s go ahead and load our

image
  and initialize our data augmentation object:
# load the input image, convert it to a NumPy array, and then
# reshape it to have an extra dimension
print("[INFO] loading example image...")
image = load_img(args["image"])
image = img_to_array(image)
image = np.expand_dims(image, axis=0)

# construct the image generator for data augmentation then
# initialize the total number of images generated thus far
aug = ImageDataGenerator(
	rotation_range=30,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")
total = 0

Our

image
  is loaded and prepared for data augmentation via Lines 21-23. Image loading and processing is handled via Keras functionality (i.e. we aren’t using OpenCV).

From there, we initialize the

ImageDataGenerator
  object. This object will facilitate performing random rotations, zooms, shifts, shears, and flips on our input image.

Next, we’ll construct a Python generator and put it to work until all of our images have been produced:

# construct the actual Python generator
print("[INFO] generating images...")
imageGen = aug.flow(image, batch_size=1, save_to_dir=args["output"],
	save_prefix="image", save_format="jpg")

# loop over examples from our image data augmentation generator
for image in imageGen:
	# increment our counter
	total += 1

	# if we have reached the specified number of examples, break
	# from the loop
	if total == args["total"]:
		break

We will use the

imageGen
  to randomly transform the input image (Lines 39 and 40). This generator saves images as .jpg files to the specified output directory contained within
args["output"]
 .

Finally, we’ll loop over examples from our image data generator and count them until we’ve reached the required

total
  number of images.

To run the

generate_examples.py
  script make sure you have used the “Downloads” section of the tutorial to download the source code and example images.

From there open up a terminal and execute the following command:

$ python generate_images.py --image cat.jpg --output generated_dataset/cats
[INFO] loading example image...
[INFO] generating images...

Check the output of the

generated_dataset/cats
  directory you will now see 100 images:
$ ls generated_dataset/cats/*.jpg | wc -l
     100

Let’s do the same now for the “dogs” class:

$ python generate_images.py --image dog.jpg --output generated_dataset/dogs
[INFO] loading example image...
[INFO] generating images...

And now check for the dog images:

$ ls generated_dataset/dogs/*.jpg | wc -l
     100

A visualization of the dataset generation via data augmentation can be seen in Figure 6 at the top of this section — notice how we have accepted a single input image (of me — not of a dog or cat) and then created 100 new training examples (48 of which are visualized) from that single image.

Experiment #1: Dataset generation results

We are now ready to perform our first experiment:

$ python train.py --dataset generated_dataset --plot plot_generated_dataset.png
[INFO] loading images...
[INFO] compiling model...
[INFO] training network for 50 epochs...
Epoch 1/50
18/18 [==============================] - 6s 319ms/step - loss: 1.1509 - acc: 0.8403 - val_loss: 8.2228 - val_acc: 0.5000
Epoch 2/50
18/18 [==============================] - 1s 43ms/step - loss: 0.2832 - acc: 0.9791 - val_loss: 0.7010 - val_acc: 0.8800
Epoch 3/50
18/18 [==============================] - 1s 38ms/step - loss: 1.0534 - acc: 0.9166 - val_loss: 4.2728 - val_acc: 0.7000
...
Epoch 48/50
18/18 [==============================] - 1s 38ms/step - loss: 0.2277 - acc: 1.0000 - val_loss: 0.2203 - val_acc: 1.0000
Epoch 49/50
18/18 [==============================] - 1s 38ms/step - loss: 0.2216 - acc: 1.0000 - val_loss: 0.2196 - val_acc: 1.0000
Epoch 50/50
18/18 [==============================] - 1s 38ms/step - loss: 0.2199 - acc: 1.0000 - val_loss: 0.2191 - val_acc: 1.0000
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       1.00      1.00      1.00        25
        dogs       1.00      1.00      1.00        25

   micro avg       1.00      1.00      1.00        50
   macro avg       1.00      1.00      1.00        50
weighted avg       1.00      1.00      1.00        50

Figure 8: Data augmentation with Keras Experiment #1 training accuracy/loss results.

Our results show that we were able to obtain 100% accuracy with little effort.

Of course, this is a trivial, contrived example. In practice, you would not be taking only a single image and then building a dataset of 100s or 1000s of images via data augmentation. Instead, you would have a dataset of 100s of images and then you would apply dataset generation to that dataset — but again, the point of this section was to demonstrate on a simple example so you could understand the process.

Training a network with in-place data augmentation

The more popular form of (image-based) data augmentation is called in-place data augmentation (see the “Type #2: In-place/on-the-fly data augmentation” section of this post for more details).

When performing in-place augmentation our Keras

ImageDataGenerator
  will:
  1. Accept a batch of input images.
  2. Randomly transform the input batch.
  3. Return the transformed batch to the network for training.

We’ll explore how data augmentation can reduce overfitting and increase the ability of our model to generalize via two experiments.

To accomplish this task we’ll be using a subset of the Kaggle Dogs vs. Cats dataset:

  • Cats: 1,000 images
  • Dogs: 1,000 images

We’ll then train a variation of ResNet, from scratch, on this dataset with and without data augmentation.

Experiment #2: Obtaining a baseline (no data augmentation)

In our first experiment we’ll perform no data augmentation:

$ python train.py --dataset dogs_vs_cats_small --plot plot_dogs_vs_cats_no_aug.png
[INFO] loading images...
[INFO] compiling model...
[INFO] training network for 50 epochs...
Epoch 1/50
187/187 [==============================] - 13s 69ms/step - loss: 1.0943 - acc: 0.5087 - val_loss: 0.8961 - val_acc: 0.5500
Epoch 2/50
187/187 [==============================] - 7s 39ms/step - loss: 0.9141 - acc: 0.5194 - val_loss: 0.8928 - val_acc: 0.5300
Epoch 3/50
187/187 [==============================] - 7s 39ms/step - loss: 0.9090 - acc: 0.5207 - val_loss: 0.8842 - val_acc: 0.5560
...
Epoch 48/50
187/187 [==============================] - 7s 38ms/step - loss: 0.2570 - acc: 0.9639 - val_loss: 1.3453 - val_acc: 0.6680
Epoch 49/50
187/187 [==============================] - 7s 38ms/step - loss: 0.2609 - acc: 0.9666 - val_loss: 1.5542 - val_acc: 0.6200
Epoch 50/50
187/187 [==============================] - 7s 38ms/step - loss: 0.2539 - acc: 0.9699 - val_loss: 1.5584 - val_acc: 0.6420
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       0.64      0.69      0.67       257
        dogs       0.64      0.59      0.62       243

   micro avg       0.64      0.64      0.64       500
   macro avg       0.64      0.64      0.64       500
weighted avg       0.64      0.64      0.64       500

Looking at the raw classification report you’ll see that we’re obtaining 64% accuracybut there’s a problem!

Take a look at the plot associated with our training:

Figure 9: For Experiment #2 we did not perform data augmentation. The result is a plot with strong indications of overfitting.

There is dramatic overfitting occurring — at approximately epoch 15 we see our validation loss start to rise while training loss continues to fall. By epoch 20 the rise in validation loss is especially pronounced.

This type of behavior is indicative of overfitting.

The solution is to (1) reduce model capacity, and/or (2) perform regularization.

Experiment #3: Improving our results (with data augmentation)

Let’s now investigate how data augmentation can act as a form of regularization:

$ python train.py --dataset dogs_vs_cats_small --augment 1 --plot plot_dogs_vs_cats_with_aug.png
[INFO] loading images...
[INFO] performing 'on the fly' data augmentation
[INFO] compiling model...
[INFO] training network for 50 epochs...
Epoch 1/50
187/187 [==============================] - 13s 69ms/step - loss: 1.0667 - acc: 0.5428 - val_loss: 0.9553 - val_acc: 0.5100
Epoch 2/50
187/187 [==============================] - 7s 40ms/step - loss: 0.9167 - acc: 0.5267 - val_loss: 0.8919 - val_acc: 0.5760
Epoch 3/50
187/187 [==============================] - 7s 40ms/step - loss: 0.8932 - acc: 0.5582 - val_loss: 0.8866 - val_acc: 0.5160
...
Epoch 48/50
187/187 [==============================] - 7s 40ms/step - loss: 0.6695 - acc: 0.7326 - val_loss: 0.7505 - val_acc: 0.6800
Epoch 49/50
187/187 [==============================] - 8s 40ms/step - loss: 0.6958 - acc: 0.7152 - val_loss: 0.7361 - val_acc: 0.7040
Epoch 50/50
187/187 [==============================] - 7s 40ms/step - loss: 0.6888 - acc: 0.7206 - val_loss: 0.7555 - val_acc: 0.6840
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       0.69      0.71      0.70       257
        dogs       0.68      0.66      0.67       243

   micro avg       0.68      0.68      0.68       500
   macro avg       0.68      0.68      0.68       500
weighted avg       0.68      0.68      0.68       500

We’re now up to 68% accuracy, an increase from our previous 64% accuracy.

But more importantly, we are no longer overfitting:

Figure 10: For Experiment #3, we performed data augmentation with Keras on batches of images in-place. Our training plot shows no signs of overfitting with this form of regularization.

Note how validation and training loss are falling together with little divergence. Similarly, classification accuracy for both the training and validation splits are growing together as well.

By using data augmentation we were able to combat overfitting!

In nearly all situations, unless you have very good reason not to, you should be performing data augmentation when training your own neural networks.

What’s next?

Figure 11: Deep Learning for Computer Vision with Python is the book I wish I had when I was getting started in the field of deep learning a number of years ago.

If you’d like to learn more about data augmentation, including:

  1. More details on the concept of data augmentation.
  2. How to perform data augmentation on your own datasets.
  3. Other forms of regularization to improve your model accuracy.
  4. My tips/tricks, suggestions, and best practices for training CNNs.

…then you’ll definitely want to refer to Deep Learning for Computer Vision with Python.

Data augmentation is just one of the sixty-three chapters in the book. You’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image classification, object detection, and segmentation.

To learn more about the book, and to grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial, you learned about data augmentation and how to apply data augmentation via Keras’ ImageDataGenerator class.

You also learned about three types of data augmentation, including:

  1. Dataset generation and data expansion via data augmentation (less common).
  2. In-place/on-the-fly data augmentation (most common).
  3. Combining the dataset generator and in-place augmentation.

By default, Keras’

ImageDataGenerator
  class performs in-place/on-the-fly data augmentation, meaning that the class:

  1. Accepts a batch of images used for training.
  2. Takes this batch and applies a series of random transformations to each image in the batch.
  3. Replaces the original batch with the new, randomly transformed batch
  4. 4. Trains the CNN on this randomly transformed batch (i.e., the original data itself is not used for training).

All that said, we actually can take the

ImageDataGenerator
  class and use it for dataset generation/expansion as well — we just need to use it to generate our dataset before training.

The final method of data augmentation, combining both in-place and dataset expansion, is rarely used. In those situations, you likely have a small dataset, need to generate additional examples via data augmentation, and then have an additional augmentation/preprocessing at training time.

We wrapped up the guide by performing a number of experiments with data augmentation, noting that data augmentation is a form of regularization, enabling our network to generalize better to our testing/validation set.

This claim of data augmentation as regularization was verified in our experiments when we found that:

  1. Not applying data augmentation at training caused overfitting
  2. While apply data augmentation allowed for smooth training, no overfitting, and higher accuracy/lower loss

You should apply data augmentation in all of your experiments unless you have a very good reason not to.

To learn more about data augmentation, including my best practices, tips, and suggestions, be sure to take a look at my book, Deep Learning for Computer Vision with Python.

I hope you enjoyed today’s tutorial!

To download the source code to this post (and receive email updates when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Keras ImageDataGenerator and Data Augmentation appeared first on PyImageSearch.

Video classification with Keras and Deep Learning

$
0
0


In this tutorial, you will learn how to perform video classification using Keras, Python, and Deep Learning.

Specifically, you will learn:

  • The difference between video classification and standard image classification
  • How to train a Convolutional Neural Network using Keras for image classification
  • How to take that CNN and then use it for video classification
  • How to use rolling prediction averaging to reduce “flickering” in results

This tutorial will serve as an introduction to the concept of working with deep learning in a temporal nature, paving the way for when we discuss Long Short-term Memory networks (LSTMs) and eventually human activity recognition.

To learn how to perform video classification with Keras and Deep learning, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Video Classification with Keras and Deep Learning

Videos can be understood as a series of individual images; and therefore, many deep learning practitioners would be quick to treat video classification as performing image classification a total of N times, where N is the total number of frames in a video.

There’s a problem with that approach though.

Video classification is more than just simple image classification — with video we can typically make the assumption that subsequent frames in a video are correlated with respect to their semantic contents.

If we are able to take advantage of the temporal nature of videos, we can improve our actual video classification results.

Neural network architectures such as Long short-term memory (LSTMs) and Recurrent Neural Networks (RNNs) are suited for time series data — two topics that we’ll be covering in later tutorials — but in some cases, they may be overkill. They are also resource-hungry and time-consuming when it comes to training over thousands of video files as you can imagine.

Instead, for some applications, all you may need is rolling averaging over predictions.

In the remainder of this tutorial, you’ll learn how to train a CNN for image classification (specifically sports classification) and then turn it into a more accurate video classifier by employing rolling averaging.

How is video classification different than image classification?


When performing image classification, we:

  1. Input an image to our CNN
  2. Obtain the predictions from the CNN
  3. Choose the label with the largest corresponding probability

Since a video is just a series of frames, a naive video classification method would be to:

  1. Loop over all frames in the video file
  2. For each frame, pass the frame through the CNN
  3. Classify each frame individually and independently of each other
  4. Choose the label with the largest corresponding probability
  5. Label the frame and write the output frame to disk

There’s a problem with this approach though — if you’ve ever tried to apply simple image classification to video classification you likely encountered a sort of “prediction flickering” as seen in the video at the top of this section. Notice how in this visualization we see our CNN shifting between two predictions: “football” and the correct label, “weight_lifting”.

The video is clearly of weightlifting and we would like our entire video to be labeled as such — but how we can prevent the CNN “flickering” between these two labels?

A simple, yet elegant solution, is to utilize a rolling prediction average.

Our algorithm now becomes:

  1. Loop over all frames in the video file
  2. For each frame, pass the frame through the CNN
  3. Obtain the predictions from the CNN
  4. Maintain a list of the last K predictions
  5. Compute the average of the last K predictions and choose the label with the largest corresponding probability
  6. Label the frame and write the output frame to disk

The results of this algorithm can be seen in the video at the very top of this post — notice how the prediction flickering is gone and the entire video clip is correctly labeled!

In the remainder of this tutorial, you will learn how to implement this algorithm for video classification with Keras.

The Sports Classification Dataset

Figure 1: A sports dataset curated by GitHub user “anubhavmaity” using Google Image Search. We will use this image dataset for video classification with Keras. (image source)

The dataset we’ll be using here today is for sport/activity classification. The dataset was curated by Anubhav Maity by downloading photos from Google Images (you could also use Bing) for the following categories:

  1. Swimming
  2. Badminton
  3. Wrestling
  4. Olympic Shooting
  5. Cricket
  6. Football
  7. Tennis
  8. Hockey
  9. Ice Hockey
  10. Kabaddi
  11. WWE
  1. Gymnasium
  2. Weight lifting
  3. Volleyball
  4. Table tennis
  5. Baseball
  6. Formula 1
  7. Moto GP
  8. Chess
  9. Boxing
  10. Fencing
  11. Basketball

To save time, computational resources, and to demonstrate the actual video classification algorithm (the actual point of this tutorial), we’ll be training on a subset of the sports type dataset:

  • Football (i.e., soccer): 799 images
  • Tennis: 718 images
  • Weightlifting: 577 images

Let’s go ahead and download our dataset!

Downloading the Sports Classification Dataset

Go ahead and download the source code for today’s blog post from the “Downloads” link.

Extract the .zip and navigate into the project folder from your terminal:

$ unzip keras-video-classification.zip
$ cd keras-video-classification

From there, clone Anubhav Maity’s repo:

$ git clone https://github.com/anubhavmaity/Sports-Type-Classifier

The data we’ll be using today is now in the following path:

$ ls Sports-Type-Classifier/data | grep -Ev "urls|models|csv|pkl"
badminton
baseball
basketball
boxing
chess
cricket
fencing
football
formula1
gymnastics
hockey
ice_hockey
kabaddi
motogp
shooting
swimming
table_tennis
tennis
volleyball
weight_lifting
wrestling
wwe

Project Structure

Now that we have our project folder and Anubhav Maity‘s repo sitting inside, let’s review our project structure:

$ tree --dirsfirst --filelimit 50
.
├── Sports-Type-Classifier
│   ├── data
│   │   ├── badminton [938 entries]
│   │   ├── baseball [746 entries]
│   │   ├── basketball [495 entries]
│   │   ├── boxing [705 entries]
│   │   ├── chess [481 entries]
│   │   ├── cricket [715 entries]
│   │   ├── fencing [635 entries]
│   │   ├── football [799 entries]
│   │   ├── formula1 [687 entries]
│   │   ├── gymnastics [719 entries]
│   │   ├── hockey [572 entries]
│   │   ├── ice_hockey [715 entries]
│   │   ├── kabaddi [454 entries]
│   │   ├── motogp [679 entries]
│   │   ├── shooting [536 entries]
│   │   ├── swimming [689 entries]
│   │   ├── table_tennis [713 entries]
│   │   ├── tennis [718 entries]
│   │   ├── volleyball [713 entries]
│   │   ├── weight_lifting [577 entries]
│   │   ├── wrestling [611 entries]
│   │   ├── wwe [671 entries]
|   ...
├── example_clips
│   ├── lifting.mp4
│   ├── soccer.mp4
│   └── tennis.mp4
├── model
│   ├── activity.model
│   └── lb.pickle
├── output
├── plot.png
├── predict_video.py
└── train.py

29 directories, 41 files

Our training image data is in the

Sports-Type-Classifier/data/
  directory, organized by class. There is additional clutter included with the GitHub repo that we won’t be using. I’ve omitted it from the project structure output above since we only care about the data. Furthermore, our training script will only train with football, tennis, and weightlifting data (however a simple list item change could allow you to train with other classes as well).

I’ve extracted three

example_clips/
  for us from YouTube to test our model upon. Credits for the three clips are at the bottom of the “Keras video classification results” section.

Our classifier files are in the

model/
  directory. Included are
activity.model
  (the trained Keras model) and
lb.pickle
  (our label binarizer).

An empty

output/
  folder is the location where we’ll store video classification results.

We’ll be covering two Python scripts in today’s tutorial:

  • train.py
     : A Keras training script that grabs the dataset class images that we care about, loads the ResNet50 CNN, and applies transfer learning/fine-tuning of ImageNet weights to train our model. The training script generates/outputs three files:
    • model/activity.model
       : A fine-tuned classifier based on ResNet50 for recognizing sports.
    • model/lb.pickle
       : A serialized label binarizer containing our unique class labels.
    • plot.png
       : The accuracy/loss training history plot.
  • predict_video.py
     : Loads an input video from the
    example_clips/
      and proceeds to classify the video ideally using today’s rolling average method.

Implementing our Keras training script

Let’s go ahead and implement our training script used to train a Keras CNN to recognize each of the sports activities.

Open up the

train.py
  file and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.pooling import AveragePooling2D
from keras.applications import ResNet50
from keras.layers.core import Dropout
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers import Input
from keras.models import Model
from keras.optimizers import SGD
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import cv2
import os

On Lines 2-24, we import necessary packages for training our classifier:

  • matplotlib
     : For plotting. Line 3 sets the backend so we can output our training plot to a .png image file.
  • keras
     : For deep learning. Namely, we’ll use the
    ResNet50
      CNN. We’ll also work with the
    ImageDataGenerator
      which you can read about in last week’s tutorial.
  • sklearn
     : From scikit-learn we’ll use their implementation of a
    LabelBinarizer
      for one-hot encoding our class labels. The
    train_test_split
      function will segment our dataset into training and testing splits. We’ll also print a
    classification_report
      in a traditional format.
  • paths
     : Contains convenience functions for listing all image files in a given path. From there we’ll be able to load our images into memory.
  • numpy
     : Python’s de facto numerical processing library.
  • argparse
     : For parsing command line arguments.
  • pickle
     : For serializing our label binarizer to disk.
  • cv2
     : OpenCV.
  • os
     : The operating system module will be used to ensure we grab the correct file/path separator which is OS-dependent.

Let’s go ahead and parse our command line arguments now:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-m", "--model", required=True,
	help="path to output serialized model")
ap.add_argument("-l", "--label-bin", required=True,
	help="path to output label binarizer")
ap.add_argument("-e", "--epochs", type=int, default=25,
	help="# of epochs to train our network for")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our script accepts five command line arguments, the first three of which are required:

  • --dataset
     : The path to the input dataset.
  • --model
     : Our path to our output Keras model file.
  • --label-bin
     : The path to our output label binarizer pickle file.
  • --epochs
     : How many epochs to train our network for — by default, we’ll train for
    25
      epochs, but as I’ll show later in the tutorial,
    50
      epochs can lead to better results.
  • --plot
     : The path to our output plot image file — by default it will be named
    plot.png
      and be placed in the same directory as this training script.

With our command line arguments parsed and in-hand, let’s proceed to initialize our

LABELS
  and load our
data
 :
# initialize the set of labels from the spots activity dataset we are
# going to train our network on
LABELS = set(["weight_lifting", "tennis", "football"])

# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

# loop over the image paths
for imagePath in imagePaths:
	# extract the class label from the filename
	label = imagePath.split(os.path.sep)[-2]

	# if the label of the current image is not part of of the labels
	# are interested in, then ignore the image
	if label not in LABELS:
		continue

	# load the image, convert it to RGB channel ordering, and resize
	# it to be a fixed 224x224 pixels, ignoring aspect ratio
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image = cv2.resize(image, (224, 224))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

Line 42 contains the set of class

LABELS
  for which our dataset will consist of. All labels not present in this set will be excluded from being part of our dataset. To save on training time, our dataset will only consist of weight lifting, tennis, and football/soccer. Feel free to work with other classes by making changes to the
LABELS
  set.

All dataset

imagePaths
  are gathered via Line 47 and the value contained in
args["dataset"]
  (which comes from our command line arguments).

Lines 48 and 49 initialize our

data
  and
labels
  lists.

From there, we’ll begin looping over all

imagePaths
  on Line 52.

In the loop, first we extract the class

label
  from the
imagePath
  (Line 54). Lines 58 and 59 then ignore any
label
  not in the
LABELS
  set.

Lines 63-65 load and preprocess an

image
 . Preprocessing includes swapping color channels for OpenCV to Keras compatibility and resizing to 224×224px. Read more about resizing images for CNNs here. To learn more about the importance of preprocessing be sure to refer to Deep Learning for Computer Vision with Python.

The

image
  and
label
  are then added to the
data
  and
labels
  lists, respectively on Lines 68 and 69.

Continuing on, we will one-hot encode our

labels
  and partition our
data
 :
# convert the data and labels to NumPy arrays
data = np.array(data)
labels = np.array(labels)

# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25, stratify=labels, random_state=42)

Lines 72 and 73 convert our

data
  and
labels
  lists into NumPy arrays.

One-hot encoding of

labels
  takes place on Lines 76 and 77. One-hot encoding is a way of marking an active class label via binary array elements. For example “football” may be
array([1, 0, 0])
  whereas “weightlifting” may be
array([0, 0, 1])
 . Notice how only one class is “hot” at any given time.

Lines 81 and 82 then segment our

data
  into training and testing splits using 75% of the data for training and the remaining 25% for testing.

Let’s initialize our data augmentation object:

# initialize the training data augmentation object
trainAug = ImageDataGenerator(
	rotation_range=30,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

# initialize the validation/testing data augmentation object (which
# we'll be adding mean subtraction to)
valAug = ImageDataGenerator()

# define the ImageNet mean subtraction (in RGB order) and set the
# the mean subtraction value for each of the data augmentation
# objects
mean = np.array([123.68, 116.779, 103.939], dtype="float32")
trainAug.mean = mean
valAug.mean = mean

Lines 85-96 initialize two data augmentation objects — one for training and one for validation. Data augmentation is nearly always recommended in deep learning for computer vision to increase model generalization.

The

trainAug
  object performs random rotations, zooms, shifts, shears, and flips on our data. You can read more about the ImageDataGenerator and fit_generator here. As we reinforced last week, keep in mind that with Keras, images will be generated on-the-fly (it is not an additive operation).

No augmentation will be conducted for validation data (

valAug
 ), but we will perform mean subtraction.

The

mean
  pixel value is set on Line 101. From there, Lines 102 and 103 set the
mean
  attribute for
trainAug
  and
valAug
  so that mean subtraction will be conducted as images are generated during training/evaluation.

Now we’re going to perform what I like to call “network surgery” as part of fine-tuning:

# load the ResNet-50 network, ensuring the head FC layer sets are left
# off
baseModel = ResNet50(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(512, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(len(lb.classes_), activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the training process
for layer in baseModel.layers:
	layer.trainable = False

Lines 107 and 108 load

ResNet50
  pre-trained with ImageNet weights while chopping the head of the network off.

From there, Lines 112-121 assemble a new

headModel
  and suture it onto the
baseModel
 .

We’ll now freeze the

baseModel
  so that it will not be trained via backpropagation (Lines 125 and 126).

Let’s go ahead and compile + train our

model
 :
# compile our model (this needs to be done after our setting our
# layers to being non-trainable)
print("[INFO] compiling model...")
opt = SGD(lr=1e-4, momentum=0.9, decay=1e-4 / args["epochs"])
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network for a few epochs (all other layers
# are frozen) -- this will allow the new FC layers to start to become
# initialized with actual "learned" values versus pure random
print("[INFO] training head...")
H = model.fit_generator(
	trainAug.flow(trainX, trainY, batch_size=32),
	steps_per_epoch=len(trainX) // 32,
	validation_data=valAug.flow(testX, testY),
	validation_steps=len(testX) // 32,
	epochs=args["epochs"])

Lines 131-133

compile
  our
model
  with the Stochastic Gradient Descent (
SGD
 ) optimizer with an initial learning rate of
1e-4
  and learning rate decay. We use
"categorical_crossentropy"
  loss for training with multiple classes. If you are working with only two classes, be sure to use
"binary_crossentropy"
  loss.

A call to the

fit_generator
  function on our
model
  (Lines 139-144) trains our network with data augmentation and mean subtraction.

Keep in mind that our

baseModel
  is frozen and we’re only training the head. This is known as “fine-tuning”. For a quick overview of fine-tuning, be sure to read my previous article. And for a more in-depth dive into fine-tuning, pick up a copy of the Practitioner Bundle of Deep Learning for Computer Vision with Python.

We’ll begin to wrap up by evaluating our network and plotting the training history:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=lb.classes_))

# plot the training loss and accuracy
N = args["epochs"]
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

After we evaluate our network on the testing set and print a

classification_report
 (Lines 148-150), we go ahead and plot our accuracy/loss curves with matplotlib (Lines 153-163). The plot is saved to disk via Line 164.

To wrap up will serialize our

model
  and label binarizer (
lb
 ) to disk:
# serialize the model to disk
print("[INFO] serializing network...")
model.save(args["model"])

# serialize the label binarizer to disk
f = open(args["label_bin"], "wb")
f.write(pickle.dumps(lb))
f.close()

Line 168 saves our fine-tuned Keras

model
 .

Finally, Lines 171 serialize and store our label binarizer in Python’s pickle format.

Training results

Before we can (1) classify frames in a video with our CNN and then (2) utilize our CNN for video classification, we first need to train the model.

Make sure you have used the “Downloads” section of this tutorial to download the source code to this image (as well as downloaded the sports type dataset).

From there, open up a terminal and execute the following command:

$ python train.py --dataset Sports-Type-Classifier/data --model output/activity.model \
	--label-bin output/lb.pickle --epochs 50
[INFO] loading images...
[INFO] compiling model...
[INFO] training head...
Epoch 1/50
48/48 [==============================] - 21s 445ms/step - loss: 1.1552 - acc: 0.4329 - val_loss: 0.7308 - val_acc: 0.6699
Epoch 2/50
48/48 [==============================] - 18s 368ms/step - loss: 0.9412 - acc: 0.5801 - val_loss: 0.5987 - val_acc: 0.7346
Epoch 3/50
48/48 [==============================] - 17s 351ms/step - loss: 0.8054 - acc: 0.6504 - val_loss: 0.5181 - val_acc: 0.7613
Epoch 4/50
48/48 [==============================] - 17s 353ms/step - loss: 0.7215 - acc: 0.6966 - val_loss: 0.4497 - val_acc: 0.7922
Epoch 5/50
48/48 [==============================] - 17s 353ms/step - loss: 0.6253 - acc: 0.7572 - val_loss: 0.4530 - val_acc: 0.7984
...
Epoch 46/50
48/48 [==============================] - 17s 352ms/step - loss: 0.2325 - acc: 0.9167 - val_loss: 0.2024 - val_acc: 0.9198
Epoch 47/50
48/48 [==============================] - 17s 349ms/step - loss: 0.2284 - acc: 0.9212 - val_loss: 0.2058 - val_acc: 0.9280
Epoch 48/50
48/48 [==============================] - 17s 348ms/step - loss: 0.2261 - acc: 0.9212 - val_loss: 0.2448 - val_acc: 0.9095
Epoch 49/50
48/48 [==============================] - 17s 348ms/step - loss: 0.2170 - acc: 0.9153 - val_loss: 0.2259 - val_acc: 0.9280
Epoch 50/50
48/48 [==============================] - 17s 352ms/step - loss: 0.2109 - acc: 0.9225 - val_loss: 0.2267 - val_acc: 0.9218
[INFO] evaluating network...
                precision    recall  f1-score   support

      football       0.86      0.98      0.92       196
        tennis       0.95      0.88      0.91       179
weight_lifting       0.98      0.87      0.92       143

     micro avg       0.92      0.92      0.92       518
     macro avg       0.93      0.91      0.92       518
  weighted avg       0.92      0.92      0.92       518

[INFO] serializing network...

Figure 2: Sports video classification with Keras accuracy/loss training history plot.

As you can see, we’re obtaining ~92-93% accuracy after fine-tuning ResNet50 on the sports dataset.

Checking our model directory we can see that the fine-tuned model along with the label binarizer have been serialized to disk:

$ ls model/
activity.model	lb.pickle

We’ll then take these files and use them to implement rolling prediction averaging in the next section.

Video classification with Keras and rolling prediction averaging

We are now ready to implement video classification with Keras via rolling prediction accuracy!

To create this script we’ll take advantage of the temporal nature of videos, specifically the assumption that subsequent frames in a video will have similar semantic contents.

By performing rolling prediction accuracy we’ll be able to “smoothen out” the predictions and avoid “prediction flickering”.

Let’s get started — open up the

predict_video.py
  file and insert the following code:
# import the necessary packages
from keras.models import load_model
from collections import deque
import numpy as np
import argparse
import pickle
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to trained serialized model")
ap.add_argument("-l", "--label-bin", required=True,
	help="path to  label binarizer")
ap.add_argument("-i", "--input", required=True,
	help="path to our input video")
ap.add_argument("-o", "--output", required=True,
	help="path to our output video")
ap.add_argument("-s", "--size", type=int, default=128,
	help="size of queue for averaging")
args = vars(ap.parse_args())

Lines 2-7 load necessary packages and modules. In particular, we’ll be using

deque
  from Python’s
collections
  module to assist with our rolling average algorithm.

Then, Lines 10-21 parse five command line arguments, four of which are required:

  • --model
     : The path to the input model generated from our previous training step.
  • --label-bin
     : The path to the serialized pickle-format label binarizer generated by the previous script.
  • --input
     : A path to an input video for video classification.
  • --output
     : The path to our output video which will be saved to disk.
  • --size
     : The max size of the queue for rolling averaging (
    128
      by default). For some of our example results later on, we’ll set the size to
    1
      so that no averaging is performed.

Armed with our imports and command line

args
 , we’re now ready to perform initializations:
# load the trained model and label binarizer from disk
print("[INFO] loading model and label binarizer...")
model = load_model(args["model"])
lb = pickle.loads(open(args["label_bin"], "rb").read())

# initialize the image mean for mean subtraction along with the
# predictions queue
mean = np.array([123.68, 116.779, 103.939][::1], dtype="float32")
Q = deque(maxlen=args["size"])

Lines 25 and 26 load our

model
  and label binarizer.

Line 30 then sets our

mean
  subtraction value.

We’ll use a

deque
  to implement our rolling prediction averaging. Our deque,

Q
 , is initialized with a
maxlen
  equal to the
args["size"]
  value (Line 31).

Let’s initialize our

cv2.VideoCapture
  object and begin looping over video frames:
# initialize the video stream, pointer to output video file, and
# frame dimensions
vs = cv2.VideoCapture(args["input"])
writer = None
(W, H) = (None, None)

# loop over frames from the video file stream
while True:
	# read the next frame from the file
	(grabbed, frame) = vs.read()

	# if the frame was not grabbed, then we have reached the end
	# of the stream
	if not grabbed:
		break

	# if the frame dimensions are empty, grab them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

Line 35 grabs a pointer to our input video file stream. We use the

VideoCapture
  class from OpenCV to read frames from our video stream.

Our video

writer
  and dimensions are then initialized to
None
  via Lines 36 and 37.

Line 40 begins our video classification 

while
  loop.

First, we grab a

frame
  (Lines 42-47). If the
frame
  was
not grabbed
 , then we’ve reached the end of the video, at which point we’ll
break
  from the loop.

Lines 50-51 then set our frame dimensions if required.

Let’s preprocess our

frame
 :
# clone the output frame, then convert it from BGR to RGB
	# ordering, resize the frame to a fixed 224x224, and then
	# perform mean subtraction
	output = frame.copy()
	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frame = cv2.resize(frame, (224, 224)).astype("float32")
	frame -= mean

A

copy
  of our frame is made for
output
  purposes (Line 56).

We then preprocess the

frame
  using the same steps as our training script, including:
  • Swapping color channels (Line 57).
  • Resizing to 224×224px (Line 58).
  • Mean subtraction (Line 59).

Frame classification inference and rolling prediction averaging come next:

# make predictions on the frame and then update the predictions
	# queue
	preds = model.predict(np.expand_dims(frame, axis=0))[0]
	Q.append(preds)

	# perform prediction averaging over the current history of
	# previous predictions
	results = np.array(Q).mean(axis=0)
	i = np.argmax(results)
	label = lb.classes_[i]

Line 63 makes predictions on the current frame. The prediction results are added to the

Q
  via Line 64.

From there, Lines 68-70 perform prediction averaging over the

Q
  history resulting in a class

label
  for the rolling average. Broken down, these lines find the label with the largest corresponding probability across the average predictions.

Now that we have our resulting

label
 , let’s annotate our
output
  frame and write it to disk:
# draw the activity on the output frame
	text = "activity: {}".format(label)
	cv2.putText(output, text, (35, 50), cv2.FONT_HERSHEY_SIMPLEX,
		1.25, (0, 255, 0), 5)

	# check if the video writer is None
	if writer is None:
		# initialize our video writer
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 30,
			(W, H), True)

	# write the output frame to disk
	writer.write(output)

	# show the output image
	cv2.imshow("Output", output)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

Lines 73-75 draw the prediction on the

output
  frame.

Lines 78-82 initialize the video

writer
  if necessary. The
output
  frame is written to the file (Line 85). Read more about writing to video files with OpenCV here.

The

output
  is also displayed on the screen until the “q” key is pressed (or until the end of the video file is reached as aforementioned) via Lines 88-93.

Finally, we’ll perform cleanup (Lines 97 and 98).

Keras video classification results

Now that we’ve implemented our video classifier with Keras, let’s put it to work.

Make sure you’ve used the “Downloads” section of this tutorial to download the source code.

From there, let’s apply video classification to a “tennis” clip — but let’s set the

--size
of the queue to
1
, trivially turning video classification into standard image classification:
$ python predict_video.py --model model/activity.model \
	--label-bin model/lb.pickle \
	--input example_clips/tennis.mp4 \
	--output output/tennis_1frame.avi \
	--size 1
Using TensorFlow backend.
[INFO] loading model and label binarizer...
[INFO] cleaning up...


As you can see, there is quite a bit of label flickering — our CNN thinks certain frames are “tennis” (correct) while others are “football” (incorrect).

Let’s now use the default queue

--size
of
128
, thus utilizing our prediction averaging algorithm to smoothen the results:
$ python predict_video.py --model model/activity.model \
	--label-bin model/lb.pickle \
	--input example_clips/tennis.mp4 \
	--output output/tennis_128frames_smoothened.avi \
	--size 128
Using TensorFlow backend.
[INFO] loading model and label binarizer...
[INFO] cleaning up...

Notice how we’ve correctly labeled this video as “tennis”!

Let’s try a different example, this one of “weightlifting”. Again, we’ll start off by using a queue

--size
  of
1
:
$ python predict_video.py --model model/activity.model \
	--label-bin model/lb.pickle \
	--input example_clips/lifting.mp4 \
	--output output/lifting_1frame.avi \
	--size 1
Using TensorFlow backend.
[INFO] loading model and label binarizer...
[INFO] cleaning up...

We once again encounter prediction flickering.

However, if we use a frame

--size
of
128
, our prediction averaging will obtain the desired result:
$ python predict_video.py --model model/activity.model \
	--label-bin model/lb.pickle \
	--input example_clips/lifting.mp4 \
	--output output/lifting_128frames_smoothened.avi \
	--size 128
Using TensorFlow backend.
[INFO] loading model and label binarizer...
[INFO] cleaning up...

Let’s try one final example:

$ python predict_video.py --model model/activity.model \
	--label-bin model/lb.pickle \
	--input example_clips/soccer.mp4 \
	--output output/soccer_128frames_smoothened.avi \
	--size 128
Using TensorFlow backend.
[INFO] loading model and label binarizer...
[INFO] cleaning up...

Here you can see the input video is correctly classified as “football” (i.e., soccer).

Notice that there is no frame flickering — our rolling prediction averaging smoothes out the predictions.

While simple, this algorithm can enable you to perform video classification with Keras!

In future tutorials, we’ll cover more advanced methods of activity and video classification, including LSTMs and RNNs.

Video Credits:

Summary

In this tutorial, you learned how to perform video classification with Keras and deep learning.

A naïve algorithm to video classification would be to treat each individual frame of a video as independent from the others. This type of implementation will cause “label flickering” where the CNN returns different labels for subsequent frames, even though the frames should be the same labels!

More advanced neural networks, including LSTMs and the more general RNNs, can help combat this problem and lead to much higher accuracy. However, LSTMs and RNNs can be dramatic overkill dependent on what you are doing — in some situations, simple rolling prediction averaging will give you the results you need.

Using rolling prediction averaging, you maintain a list of the last K predictions from the CNN. We then take these last K predictions, average them, select the label with the largest probability, and choose this label to classify the current frame. The assumption here is that subsequent frames in a video will have similar semantic contents.

If that assumption holds then we can take advantage of the temporal nature of videos, assuming that the previous frames are similar to the current frame.

The averaging, therefore, enables us to smooth out the predictions and make for a better video classifier.

In a future tutorial, we’ll discuss the more advanced LSTMs and RNNs as well. But in the meantime, take a look at this guide to deep learning action recognition.

To download the source code to this post, and to be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Video classification with Keras and Deep Learning appeared first on PyImageSearch.

Keras learning rate schedules and decay

$
0
0

In this tutorial, you will learn about learning rate schedules and decay using Keras. You’ll learn how to use Keras’ standard learning rate decay along with step-based, linear, and polynomial learning rate schedules.

When training a neural network, the learning rate is often the most important hyperparameter for you to tune:

  • Too small a learning rate and your neural network may not learn at all
  • Too large a learning rate and you may overshoot areas of low loss (or even overfit from the start of training)

When it comes to training a neural network, the most bang for your buck (in terms of accuracy) is going to come from selecting the correct learning rate and appropriate learning rate schedule.

But that’s easier said than done.

To help deep learning practitioners such as yourself learn how to assess a problem and choose an appropriate learning rate, we’ll be starting a series of tutorials on learning rate schedules, decay, and hyperparameter tuning with Keras.

By the end of this series, you’ll have a good understanding of how to appropriately and effectively apply learning rate schedules with Keras to your own deep learning projects.

To learn how to use Keras for learning rate schedules and decay, just keep reading

Looking for the source code to this post?
Jump right to the downloads section.

Keras learning rate schedules and decay

In the first part of this guide, we’ll discuss why the learning rate is the most important hyperparameter when it comes to training your own deep neural networks.

We’ll then dive into why we may want to adjust our learning rate during training.

From there I’ll show you how to implement and utilize a number of learning rate schedules with Keras, including:

  • The decay schedule built into most Keras optimizers
  • Step-based learning rate schedules
  • Linear learning rate decay
  • Polynomial learning rate schedules

We’ll then perform a number of experiments on the CIFAR-10 using these learning rate schedules and evaluate which one performed the best.

These sets of experiments will serve as a template you can use when exploring your own deep learning projects and selecting an appropriate learning rate and learning rate schedule.

Why adjust our learning rate and use learning rate schedules?

To see why learning rate schedules are a worthwhile method to apply to help increase model accuracy and descend into areas of lower loss, consider the standard weight update formula used by nearly all neural networks:

W += \alpha * gradient

Recall that the learning rate, \alpha, controls the “step” we make along the gradient. Larger values of \alpha imply that we are taking bigger steps. While smaller values of \alpha will make tiny steps. If \alpha is zero the network cannot make any steps at all (since the gradient multiplied by zero is zero).

Most initial learning rates (but not all) you encounter are typically in the set \alpha = \{1e^{-1}, 1e^{-2}, 1e^{-3}\} .

A network is then trained for a fixed number of epochs without changing the learning rate.

This method may work well in some situations, but it’s often beneficial to decrease our learning rate over time. When training our network, we are trying to find some location along our loss landscape where the network obtains reasonable accuracy. It doesn’t have to be a global minima or even a local minima, but in practice, simply finding an area of the loss landscape with reasonably low loss is “good enough”.

If we constantly keep a learning rate high, we could overshoot these areas of low loss as we’ll be taking too large of steps to descend into those series.

Instead, what we can do is decrease our learning rate, thereby allowing our network to take smaller steps — this decreased learning rate enables our network to descend into areas of the loss landscape that are “more optimal” and would have otherwise been missed entirely by our learning rate learning.

We can, therefore, view the process of learning rate scheduling as:

  1. Finding a set of reasonably “good” weights early in the training process with a larger learning rate.
  2. Tuning these weights later in the process to find more optimal weights using a smaller learning rate.

We’ll be covering some of the most popular learning rate schedules in this tutorial.

Project structure

Once you’ve grabbed and extracted the “Downloads” go ahead and use the

tree
  command to inspect the project folder:
$ tree
.
├── output
│   ├── lr_linear_schedule.png
│   ├── lr_poly_schedule.png
│   ├── lr_step_schedule.png
│   ├── train_linear_schedule.png
│   ├── train_no_schedule.png
│   ├── train_poly_schedule.png
│   ├── train_standard_schedule.png
│   └── train_step_schedule.png
├── pyimagesearch
│   ├── __init__.py
│   ├── learning_rate_schedulers.py
│   └── resnet.py
└── train.py

2 directories, 12 files

Our

output/
  directory will contain learning rate and training history plots. The five experiments included in the results section correspond to the five plots with the
train_*.png
  filenames, respectively.

The

pyimagesearch
  module contains our ResNet CNN and our
learning_rate_schedulers.py
 . The
LearningRateDecay
  parent class simply includes a method called
plot
  for plotting each of our types of learning rate decay. Also included are subclasses,
StepDecay
  and
PolynomialDecay
  which calculate the learning rate upon the completion of each epoch. Both of these classes contain the
plot
  method via inheritance (an object-oriented concept).

Our training script,

train.py
 , will train ResNet on the CIFAR-10 dataset. We’ll run the script with the absence of learning rate decay as well as standard, linear, step-based, and polynomial learning rate decay.

The standard “decay” schedule in Keras

The Keras library ships with a time-based learning rate scheduler — it is controlled via the

decay
  parameter of the optimizer class (such as
SGD
,
Adam
, etc.).

To discover how we can utilize this type of learning rate decay, let’s take a look at an example of how we may initialize the ResNet architecture and the SGD optimizer:

# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-2, momentum=0.9, decay=1e-2/epochs)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
	(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

Here we initialize our SGD optimizer with an initial learning rate of

1e-2
 . We then set our
decay
  to be the learning rate divided by the total number of epochs we are training the network for (a common rule of thumb).

Internally, Keras applies the following learning rate schedule to adjust the learning rate after every batch update — it is a misconception that Keras updates the standard decay after every epoch. Keep this in mind when using the default learning rate scheduler supplied with Keras.

The update formula follows: lr = init\_lr * \frac{1.0}{1.0 + decay * iterations}

Using the CIFAR-10 dataset as an example, we have a total of 50,000 training images.

If we use a batch size of

64
 , that implies there are a total of \lceil50000 / 64\rceil = 782 steps per epoch. Therefore, a total of
782
  weight updates need to be applied before an epoch completes.

To see an example of the learning rate schedule calculation, let’s assume our initial learning rate is \alpha = 0.01 and our decay = \frac{0.01}{40} (with the assumption that we are training for forty epochs).

The learning rate at step zero, before any learning rate schedule has been applied, is:

lr = 0.01 * \frac{1.0}{1.0 + 0.00025 * (0 * 782)} = 0.01

At the beginning of epoch one we can see the following learning rate:

lr = 0.01 * \frac{1.0}{(1.0 + 0.00025 * (1 * 782)} = 0.00836

Figure 1 below continues the calculation of Keras’ standard learning rate decay \alpha =0.01 and a decay of \frac{0.01}{40}:

Figure 1: Keras’ standard learning rate decay table.

You’ll learn how to utilize this type of learning rate decay inside the “Implementing our training script” and “Keras learning rate schedule results” sections of this post, respectively.

Our LearningRateDecay class

In the remainder of this tutorial, we’ll be implementing our own custom learning rate schedules and then incorporating them with Keras when training our neural networks.

To keep our code neat and tidy, and not to mention, follow object-oriented programming best practices, let’s first define a base

LearningRateDecay
  class that we’ll subclass for each respective learning rate schedule.

Open up the

learning_rate_schedulers.py
  in your directory structure and insert the following code:
# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np

class LearningRateDecay:
	def plot(self, epochs, title="Learning Rate Schedule"):
		# compute the set of learning rates for each corresponding
		# epoch
		lrs = [self(i) for i in epochs]

		# the learning rate schedule
		plt.style.use("ggplot")
		plt.figure()
		plt.plot(epochs, lrs)
		plt.title(title)
		plt.xlabel("Epoch #")
		plt.ylabel("Learning Rate")

Each and every learning rate schedule we implement will have a plot function, enabling us to visualize our learning rate over time.

With our base

LearningRateSchedule
  class implement, let’s move on to creating a step-based learning rate schedule.

Step-based learning rate schedules with Keras

Figure 2: Keras learning rate step-based decay. The schedule in red is a decay factor of 0.5 and blue is a factor of 0.25.

One popular learning rate scheduler is step-based decay where we systematically drop the learning rate after specific epochs during training.

The step decay learning rate scheduler can be seen as a piecewise function, as visualized in Figure 2 — here the learning rate is constant for a number of epochs, then drops, is constant once more, then drops again, etc.

When applying step decay to our learning rate, we have two options:

  1. Define an equation that models the piecewise drop-in learning rate that we wish to achieve.
  2. Use what I call the
    ctrl + c
    method to train a deep neural network. Here we train for some number of epochs at a given learning rate and eventually notice validation performance stagnating/stalling, then 
    ctrl + c
    to stop the script, adjust our learning rate, and continue training.

We’ll primarily be focusing on the equation-based piecewise drop to learning rate scheduling in this post.

The

ctrl + c
method is a bit more advanced and normally applied to larger datasets using deeper neural networks where the exact number of epochs required to obtain a reasonable model is unknown.

If you’d like to learn more about the

ctrl + c
method to training, please refer to Deep Learning for Computer Vision with Python.

When applying step decay, we often drop our learning rate by either (1) half or (2) an order of magnitude after every fixed number of epochs. For example, let’s suppose our initial learning rate is \alpha = 0.01.

After 10 epochs we drop the learning rate to \alpha = 0.005.

After another 10 epochs (i.e., the 20th total epoch), \alpha is dropped by a factor of

0.5
  again, such that \alpha = 0.0025, etc.

In fact, this is the exact same learning rate schedule that is depicted in Figure 2 (red line).

The blue line displays a more aggressive drop factor of

0.25
 . Modeled mathematically, we can define our step-based decay equation as: \alpha_{E + 1} = \alpha_{I} \times F^{(1 + E) / D}

Where \alpha_{I} is the initial learning rate, F is the factor value controlling the rate in which the learning date drops, D is the “Drop every” epochs value, and E is the current epoch.

The larger our factor F is, the slower the learning rate will decay.

Conversely, the smaller the factor F, the faster the learning rate will decay.

All that said, let’s go ahead and implement our

StepDecay
  class now.

Go back to your

learning_rate_schedulers.py
  file and insert the following code:
class StepDecay(LearningRateDecay):
	def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10):
		# store the base initial learning rate, drop factor, and
		# epochs to drop every
		self.initAlpha = initAlpha
		self.factor = factor
		self.dropEvery = dropEvery

	def __call__(self, epoch):
		# compute the learning rate for the current epoch
		exp = np.floor((1 + epoch) / self.dropEvery)
		alpha = self.initAlpha * (self.factor ** exp)

		# return the learning rate
		return float(alpha)

Line 20 defines the constructor to our

StepDecay
  class. We then store the initial learning rate (
initAlpha
 ), drop factor, and
dropEvery
  epochs values (Lines 23-25).

The

__call__
function:
  • Accepts the current
    epoch
      number.
  • Computes the learning rate based on the step-based decay formula detailed above (Lines 29 and 30).
  • Returns the computed learning rate for the current epoch (Line 33).

You’ll see how to use this learning rate schedule later in this post.

Linear and polynomial learning rate schedules in Keras

Two of my favorite learning rate schedules are linear learning rate decay and polynomial learning rate decay.

Using these methods our learning rate is decayed to zero over a fixed number of epochs.

The rate in which the learning rate is decayed is based on the parameters to the polynomial function. A smaller exponent/power to the polynomial will cause the learning rate to decay “more slowly”, whereas larger exponents decay the learning rate “more quickly”.

Conveniently, both of these methods can be implemented in a single class:

class PolynomialDecay(LearningRateDecay):
	def __init__(self, maxEpochs=100, initAlpha=0.01, power=1.0):
		# store the maximum number of epochs, base learning rate,
		# and power of the polynomial
		self.maxEpochs = maxEpochs
		self.initAlpha = initAlpha
		self.power = power

	def __call__(self, epoch):
		# compute the new learning rate based on polynomial decay
		decay = (1 - (epoch / float(self.maxEpochs))) ** self.power
		alpha = self.initAlpha * decay

		# return the new learning rate
		return float(alpha)

Line 36 defines the constructor to our

PolynomialDecay
  class which requires three values:
  • maxEpochs
     : The total number of epochs we’ll be training for.
  • initAlpha
     : The initial learning rate.
  • power
     : The power/exponent of the polynomial.

Note that if you set

power=1.0
  then you have a linear learning rate decay.

Lines 45 and 46 compute the adjusted learning rate for the current epoch while Line 49 returns the new learning rate.

Implementing our training script

Now that we’ve implemented a few different Keras learning rate schedules, let’s see how we can use them inside an actual training script.

Create a file named 

train.py
  file in your editor and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.learning_rate_schedulers import StepDecay
from pyimagesearch.learning_rate_schedulers import PolynomialDecay
from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.callbacks import LearningRateScheduler
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse

Lines 2-16 import required packages. Line 3 sets the

matplotlib
  backend so that we can create plots as image files. Our most notable imports include:
  • StepDecay
     : Our class which calculates and plots step-based learning rate decay.
  • PolynomialDecay
     : The class we wrote to calculate polynomial-based learning rate decay.
  • ResNet
     : Our Convolutional Neural Network implemented in Keras.
  • LearningRateScheduler
     : A Keras callback. We’ll pass our learning rate
    schedule
      to this class which will be called as a callback at the completion of each epoch to calculate our learning rate.

Let’s move on and parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-s", "--schedule", type=str, default="",
	help="learning rate schedule method")
ap.add_argument("-e", "--epochs", type=int, default=100,
	help="# of epochs to train for")
ap.add_argument("-l", "--lr-plot", type=str, default="lr.png",
	help="path to output learning rate plot")
ap.add_argument("-t", "--train-plot", type=str, default="training.png",
	help="path to output training plot")
args = vars(ap.parse_args())

Our script accepts any of four command line arguments when the script is called via the terminal:

  • --schedule
     : The learning rate schedule method. Valid options are “standard”, “step”, “linear”, “poly”. By default, no learning rate schedule will be used.
  • --epochs
     : The number of epochs to train for (
    default=100
     ).
  • --lr-plot
     : The path to the output plot. I suggest overriding the
    default
      of
    lr.png
      with a more descriptive path + filename.
  • --train-plot
     : The path to the output accuracy/loss training history plot. Again, I suggest a descriptive path + filename, otherwise
    training.png
      will be set by
    default
     .

With our imports and command line arguments in hand, now it’s time to initialize our learning rate schedule:

# store the number of epochs to train for in a convenience variable,
# then initialize the list of callbacks and learning rate scheduler
# to be used
epochs = args["epochs"]
callbacks = []
schedule = None

# check to see if step-based learning rate decay should be used
if args["schedule"] == "step":
	print("[INFO] using 'step-based' learning rate decay...")
	schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=15)

# check to see if linear learning rate decay should should be used
elif args["schedule"] == "linear":
	print("[INFO] using 'linear' learning rate decay...")
	schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=1)

# check to see if a polynomial learning rate decay should be used
elif args["schedule"] == "poly":
	print("[INFO] using 'polynomial' learning rate decay...")
	schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=5)

# if the learning rate schedule is not empty, add it to the list of
# callbacks
if schedule is not None:
	callbacks = [LearningRateScheduler(schedule)]

Line 33 sets the number of

epochs
  we will train for directly from the command line
args
  variable. From there we’ll initialize our
callbacks
  list and learning rate
schedule
  (Lines 34 and 35).

Lines 38-50 then select the learning rate

schedule
  if
args["schedule"]
  contains a valid value:
  • "step"
     : Initializes
    StepDecay
     .
  • "linear"
     : Initializes
    PolynomialDecay
      with
    power=1
      indicating that a linear learning rate decay will be utilized.
  • "poly"
     : 
    PolynomialDecay
      with a
    power=5
      will be used.

After you’ve reproduced the results of the experiments in this tutorial, be sure to revisit Lines 38-50 and insert additional

elif
  statements of your own so you can run some of your own experiments!

Lines 54 and 55 initialize the

LearningRateScheduler
  with the schedule as a single callback part of the
callbacks
  list. There is a case where no learning rate decay will be used (i.e. if the
--schedule
  command line argument is not overridden when the script is executed).

Let’s go ahead and load our data:

# load the training and testing data, then scale it into the
# range [0, 1]
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer",
	"dog", "frog", "horse", "ship", "truck"]

Line 60 loads our CIFAR-10 data. The dataset is conveniently already split into training and testing sets.

The only preprocessing we must perform is to scale the data into the range [0, 1] (Lines 61 and 62).

Lines 65-67 binarize the labels and then Lines 70 and 71 initialize our

labelNames
  (i.e. classes). Do not add to or alter the
labelNames
  list as order and length of the list matter.

Let’s initialize

decay
 parameter:
# initialize the decay for the optimizer
decay = 0.0

# if we are using Keras' "standard" decay, then we need to set the
# decay parameter
if args["schedule"] == "standard":
	print("[INFO] using 'keras standard' learning rate decay...")
	decay = 1e-1 / epochs

# otherwise, no learning rate schedule is being used
elif schedule is None:
	print("[INFO] no learning rate schedule being used")

Line 74 initializes our learning rate

decay
 .

If we’re using the

"standard"
  learning rate decay schedule, then the decay is initialized as
1e-1 / epochs
  (Lines 78-80).

With all of our initializations taken care of, let’s go ahead and compile + train our

ResNet
  model:
# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-1, momentum=0.9, decay=decay)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
	(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
H = model.fit(trainX, trainY, validation_data=(testX, testY),
	batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)

Our Stochastic Gradient Descent (

SGD
 ) optimizer is initialized on Line 87 using our
decay
 .

From there, Lines 88 and 89 build our

ResNet
  CNN with an input shape of 32x32x3 and 10 classes. For an in-depth review of ResNet, be sure refer to Chapter 10: ResNet of Deep Learning for Computer Vision with Python.

Our

model
  is compiled with a
loss
  function of
"categorical_crossentropy"
  since our dataset has > 2 classes. If you use a different dataset with only 2 classes, be sure to use
loss="binary_crossentropy"
 .

Lines 94 and 95 kick of our training process. Notice that we’ve provided the

callbacks
  as a parameter. The

callbacks
  will be called when each epoch is completed. Our
LearningRateScheduler
  contained therein will handle our learning rate decay (so long as
callbacks
  isn’t an empty list).

Finally, let’s evaluate our network and generate plots:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=128)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=labelNames))

# plot the training loss and accuracy
N = np.arange(0, args["epochs"])
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on CIFAR-10")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["train_plot"])

# if the learning rate schedule is not empty, then save the learning
# rate plot
if schedule is not None:
	schedule.plot(N)
	plt.savefig(args["lr_plot"])

Lines 99-101 evaluate our network and print a classification report to our terminal.

Lines 104-115 generate and save our training history plot (accuracy/loss curves). Lines 119-121 generate a learning rate schedule plot, if applicable. We will inspect these plot visualizations in the next section.

Keras learning rate schedule results

With both our (1) learning rate schedules and (2) training scripts implemented, let’s run some experiments to see which learning rate schedule will perform best given:

  1. An initial learning rate of
    1e-1
  2. Training for a total of
    100
      epochs

Experiment #1: No learning rate decay/schedule

As a baseline, let’s first train our ResNet model on CIFAR-10 with no learning rate decay or schedule:

$ python train.py --train-plot output/train_no_schedule.png
[INFO] loading CIFAR-10 data...
[INFO] no learning rate being used
Train on 50000 samples, validate on 10000 samples
Epoch 1/100
50000/50000 [==============================] - 186s 4ms/step - loss: 2.1204 - acc: 0.4372 - val_loss: 1.9361 - val_acc: 0.5118
Epoch 2/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.5150 - acc: 0.6440 - val_loss: 1.5013 - val_acc: 0.6413
Epoch 3/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.2186 - acc: 0.7369 - val_loss: 1.2288 - val_acc: 0.7315
...
Epoch 98/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.5220 - acc: 0.9568 - val_loss: 1.0223 - val_acc: 0.8372
Epoch 99/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.5349 - acc: 0.9532 - val_loss: 1.0423 - val_acc: 0.8230
Epoch 100/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.5209 - acc: 0.9579 - val_loss: 0.9883 - val_acc: 0.8421
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.84      0.86      0.85      1000
  automobile       0.90      0.93      0.92      1000
        bird       0.83      0.74      0.78      1000
         cat       0.67      0.79      0.73      1000
        deer       0.78      0.88      0.83      1000
         dog       0.85      0.69      0.76      1000
        frog       0.85      0.89      0.87      1000
       horse       0.94      0.82      0.88      1000
        ship       0.91      0.90      0.90      1000
       truck       0.90      0.90      0.90      1000

   micro avg       0.84      0.84      0.84     10000
   macro avg       0.85      0.84      0.84     10000
weighted avg       0.85      0.84      0.84     10000

Figure 3: Our first experiment for training ResNet on CIFAR-10 does not have learning rate decay.

Here we obtain ~85% accuracy, but as we can see, validation loss and accuracy stagnate past epoch ~15 and do not improve over the rest of the 100 epochs.

Our goal is now to utilize learning rate scheduling to beat our 85% accuracy (without overfitting).

Experiment: #2: Keras standard optimizer learning rate decay

In our second experiment we are going to use Keras’ standard decay-based learning rate schedule:

$ python train.py --schedule standard --train-plot output/train_standard_schedule.png
[INFO] loading CIFAR-10 data...
[INFO] using 'keras standard' learning rate decay...
Train on 50000 samples, validate on 10000 samples
Epoch 1/100
50000/50000 [==============================] - 184s 4ms/step - loss: 2.1074 - acc: 0.4460 - val_loss: 1.8397 - val_acc: 0.5334
Epoch 2/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.5068 - acc: 0.6516 - val_loss: 1.5099 - val_acc: 0.6663
Epoch 3/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.2097 - acc: 0.7512 - val_loss: 1.2928 - val_acc: 0.7176
...
Epoch 98/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1752 - acc: 1.0000 - val_loss: 0.8892 - val_acc: 0.8209
Epoch 99/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1746 - acc: 1.0000 - val_loss: 0.8923 - val_acc: 0.8204
Epoch 100/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1740 - acc: 1.0000 - val_loss: 0.8924 - val_acc: 0.8208
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.81      0.86      0.84      1000
  automobile       0.91      0.91      0.91      1000
        bird       0.75      0.71      0.73      1000
         cat       0.68      0.65      0.66      1000
        deer       0.78      0.81      0.79      1000
         dog       0.77      0.74      0.75      1000
        frog       0.83      0.88      0.85      1000
       horse       0.86      0.87      0.86      1000
        ship       0.90      0.90      0.90      1000
       truck       0.90      0.88      0.89      1000

   micro avg       0.82      0.82      0.82     10000
   macro avg       0.82      0.82      0.82     10000
weighted avg       0.82      0.82      0.82     10000

Figure 4: Our second learning rate decay schedule experiment uses Keras’ standard learning rate decay schedule.

This time we only obtain 82% accuracy, which goes to show, learning rate decay/scheduling will not always improve your results! You need to be careful which learning rate schedule you utilize.

Experiment #3: Step-based learning rate schedule results

Let’s go ahead and perform step-based learning rate scheduling which will drop our learning rate by a factor of 0.25 every 15 epochs:

$ python train.py --schedule step --lr-plot output/lr_step_schedule.png --train-plot output/train_step_schedule.png
[INFO] using 'step-based' learning rate decay...
[INFO] loading CIFAR-10 data...
Train on 50000 samples, validate on 10000 samples
Epoch 1/100
50000/50000 [==============================] - 186s 4ms/step - loss: 2.2839 - acc: 0.4328 - val_loss: 1.8936 - val_acc: 0.5530
Epoch 2/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.6425 - acc: 0.6213 - val_loss: 1.4599 - val_acc: 0.6749
Epoch 3/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.2971 - acc: 0.7177 - val_loss: 1.3298 - val_acc: 0.6953
...
Epoch 98/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1817 - acc: 1.0000 - val_loss: 0.7221 - val_acc: 0.8653
Epoch 99/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1817 - acc: 1.0000 - val_loss: 0.7228 - val_acc: 0.8661
Epoch 100/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1817 - acc: 1.0000 - val_loss: 0.7267 - val_acc: 0.8652
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.86      0.89      0.87      1000
  automobile       0.94      0.93      0.94      1000
        bird       0.83      0.80      0.81      1000
         cat       0.75      0.73      0.74      1000
        deer       0.82      0.87      0.84      1000
         dog       0.82      0.77      0.79      1000
        frog       0.89      0.90      0.90      1000
       horse       0.91      0.90      0.90      1000
        ship       0.93      0.93      0.93      1000
       truck       0.90      0.93      0.92      1000

   micro avg       0.87      0.87      0.87     10000
   macro avg       0.86      0.87      0.86     10000
weighted avg       0.86      0.87      0.86     10000

Figure 5: Experiment #3 demonstrates a step-based learning rate schedule (left). The training history accuracy/loss curves are shown on the right.

Figure 5 (left) visualizes our learning rate schedule. Notice how after every 15 epochs our learning rate drops, creating the “stair-step”-like effect.

Figure 5 (right) demonstrates the classic signs of step-based learning rate scheduling — you can clearly see our:

  1. Training/validation loss decrease
  2. Training/validation accuracy increase

…when our learning rate is dropped.

This is especially pronounced in the first two drops (epochs 15 and 30), after which the drops become less substantial.

This type of steep drop is a classic sign of a step-based learning rate schedule being utilized — if you see that type of training behavior in a paper, publication, or another tutorial, you can be almost sure that they used step-based decay!

Getting back to our accuracy, we’re now at 86-87% accuracy, an improvement from our first experiment.

Experiment #4: Linear learning rate schedule results

Let’s try using a linear learning rate schedule with Keras by setting 

power=1.0
 :
$ python train.py --schedule linear --lr-plot output/lr_linear_schedule.png --train-plot output/train_linear_schedule.png
[INFO] using 'linear' learning rate decay...
[INFO] loading CIFAR-10 data...
Epoch 1/100
50000/50000 [==============================] - 187s 4ms/step - loss: 2.0399 - acc: 0.4541 - val_loss: 1.6900 - val_acc: 0.5789
Epoch 2/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.4623 - acc: 0.6588 - val_loss: 1.4535 - val_acc: 0.6557
Epoch 3/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.1790 - acc: 0.7480 - val_loss: 1.2633 - val_acc: 0.7230
...
Epoch 98/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1025 - acc: 1.0000 - val_loss: 0.5623 - val_acc: 0.8804
Epoch 99/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1021 - acc: 1.0000 - val_loss: 0.5636 - val_acc: 0.8800
Epoch 100/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1019 - acc: 1.0000 - val_loss: 0.5622 - val_acc: 0.8808
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.88      0.91      0.89      1000
  automobile       0.94      0.94      0.94      1000
        bird       0.84      0.81      0.82      1000
         cat       0.78      0.76      0.77      1000
        deer       0.86      0.90      0.88      1000
         dog       0.84      0.80      0.82      1000
        frog       0.90      0.92      0.91      1000
       horse       0.91      0.91      0.91      1000
        ship       0.93      0.94      0.93      1000
       truck       0.93      0.93      0.93      1000

   micro avg       0.88      0.88      0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000

Figure 6: Linear learning rate decay (left) applied to ResNet on CIFAR-10 over 100 epochs with Keras. The training accuracy/loss curve is displayed on the right.

Figure 6 (left) shows that our learning rate is decreasing linearly over time while Figure 6 (right) visualizes our training history.

We’re now seeing a sharper drop in both training and validation loss, especially past approximately epoch 75; however, note that our training loss is dropping significantly faster than our validation loss — we may be at risk of overfitting.

Regardless, we are now obtaining 88% accuracy on our data, our best result thus far.

Experiment #5: Polynomial learning rate schedule results

As a final experiment let’s apply polynomial learning rate scheduling with Keras by setting

power=5
 :
$ python train.py --schedule poly --lr-plot output/lr_poly_schedule.png --train-plot output/train_poly_schedule.png
[INFO] using 'polynomial' learning rate decay...
[INFO] loading CIFAR-10 data...
Epoch 1/100
50000/50000 [==============================] - 186s 4ms/step - loss: 2.0470 - acc: 0.4445 - val_loss: 1.7379 - val_acc: 0.5576
Epoch 2/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.4793 - acc: 0.6448 - val_loss: 1.4536 - val_acc: 0.6513
Epoch 3/100
50000/50000 [==============================] - 171s 3ms/step - loss: 1.2080 - acc: 0.7332 - val_loss: 1.2363 - val_acc: 0.7183
...
Epoch 98/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1547 - acc: 1.0000 - val_loss: 0.6960 - val_acc: 0.8581
Epoch 99/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1547 - acc: 1.0000 - val_loss: 0.6883 - val_acc: 0.8596
Epoch 100/100
50000/50000 [==============================] - 171s 3ms/step - loss: 0.1548 - acc: 1.0000 - val_loss: 0.6942 - val_acc: 0.8601
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.86      0.89      0.87      1000
  automobile       0.94      0.94      0.94      1000
        bird       0.78      0.80      0.79      1000
         cat       0.75      0.70      0.73      1000
        deer       0.83      0.86      0.84      1000
         dog       0.81      0.78      0.79      1000
        frog       0.86      0.91      0.89      1000
       horse       0.92      0.88      0.90      1000
        ship       0.94      0.92      0.93      1000
       truck       0.91      0.92      0.91      1000

   micro avg       0.86      0.86      0.86     10000
   macro avg       0.86      0.86      0.86     10000
weighted avg       0.86      0.86      0.86     10000

Figure 7: Polynomial-based learning decay results using Keras.

Figure 7 (left) visualizes the fact that our learning rate is now decaying according to our polynomial function while Figure 7 (right) plots our training history.

This time we obtain ~86% accuracy.

Commentary on learning rate schedule experiments

Our best experiment was from our fourth experiment where we utilized a linear learning rate schedule.

But does that mean we should always use a linear learning rate schedule?

No, far from it, actually.

The key takeaway here is that for this:

  • Particular dataset (CIFAR-10)
  • Particular neural network architecture (ResNet)
  • Initial learning rate of 1e-2
  • Number of training epochs (100)

…is that linear learning rate scheduling worked the best.

No two deep learning projects are alike so you will need to run your own set of experiments, including varying the initial learning rate and the total number of epochs, to determine the appropriate learning rate schedule (additional commentary is included in the “Summary” section of this tutorial as well).

Do other learning rate schedules exist?

Other learning rate schedules exist, and in fact, any mathematical function that can accept an epoch or batch number as an input and returns a learning rate can be considered a “learning rate schedule”. Two other learning rate schedules you may encounter include (1) exponential learning rate decay, as well as (2) cyclical learning rates.

I don’t often use exponential decay as I find that linear and polynomial decay are more than sufficient, but you are more than welcome to subclass the

LearningRateDecay
  class and implement exponential decay if you so wish.

Cyclical learning rates, on the other hand, are very powerful — we’ll be covering cyclical learning rates in a tutorial later in this series.

How do I choose my initial learning rate?

You’ll notice that in this tutorial we did not vary our learning rate, we kept it constant at

1e-2
 .

When performing your own experiments you’ll want to combine:

  1. Learning rate schedules…
  2. …with different learning rates

Don’t be afraid to mix and match!

The four most important hyperparameters you’ll want to explore, include:

  1. Initial learning rate
  2. Number of training epochs
  3. Learning rate schedule
  4. Regularization strength/amount (L2, dropout, etc.)

Finding an appropriate balance of each can be challenging, but through many experiments, you’ll be able to find a recipe that leads to a highly accurate neural network.

If you’d like to learn more about my tips, suggestions, and best practices for learning rates, learning rate schedules, and training your own neural networks, refer to my book, Deep Learning for Computer Vision with Python.

Where can I learn more?

Figure 8: Deep Learning for Computer Vision with Python is a deep learning book for beginners, practitioners, and experts alike.

Today’s tutorial introduced you to learning rate decay and schedulers using Keras. To learn more about learning rates, schedulers, and how to write custom callback functions, refer to my book, Deep Learning for Computer Vision with Python.

Inside the book I cover:

  1. More details on learning rates (and how a solid understanding of the concept impacts your deep learning success)
  2. How to spot under/overfitting on-the-fly with a custom training monitor callback
  3. How to checkpoint your models with a custom callback
  4. My tips/tricks, suggestions, and best practices for training CNNs

Besides content on learning rates, you’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial, you learned how to utilize Keras for learning rate decay and learning rate scheduling.

Specifically, you discovered how to implement and utilize a number of learning rate schedules with Keras, including:

  • The decay schedule built into most Keras optimizers
  • Step-based learning rate schedules
  • Linear learning rate decay
  • Polynomial learning rate schedules

After implementing our learning rate schedules we evaluated each on a set of experiments on the CIFAR-10 dataset.

Our results demonstrated that for an initial learning rate of

1e-2
 , the linear learning rate schedule, decaying over
100
  epochs, performed the best.

However, this does not mean that a linear learning rate schedule will always outperform other types of schedules. Instead, all this means is that for this:

  • Particular dataset (CIFAR-10)
  • Particular neural network architecture (ResNet)
  • Initial learning rate of
    1e-2
  • Number of training epochs (
    100
     )

…that linear learning rate scheduling worked the best.

No two deep learning projects are alike so you will need to run your own set of experiments, including varying the initial learning rate, to determine the appropriate learning rate schedule.

I suggest you keep an experiment log that details any hyperparameter choices and associated results, that way you can refer back to it and double-down on experiments that look promising.

Do not expect that you’ll be able to train a neural network and be “one and done” — that rarely, if ever, happens. Instead, set the expectation with yourself that you’ll be running many experiments and tuning hyperparameters as you go along. Machine learning, deep learning, and artificial intelligence as a whole are iterative — you build on your previous results.

Later in this series of tutorials I’ll also be showing you how to select your initial learning rate.

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Keras learning rate schedules and decay appeared first on PyImageSearch.

Cyclical Learning Rates with Keras and Deep Learning

$
0
0

In this tutorial, you will learn how to use Cyclical Learning Rates (CLR) and Keras to train your own neural networks. Using Cyclical Learning Rates you can dramatically reduce the number of experiments required to tune and find an optimal learning rate for your model.

Today is part two in our three-part series on tuning learning rates for deep neural networks:

  1. Part #1: Keras learning rate schedules and decay (last week’s post)
  2. Part #2: Cyclical Learning Rates with Keras and Deep Learning (today’s post)
  3. Part #3: Automatically finding optimal learning rates (next week’s post)

Last week we discussed the concept of learning rate schedules and how we can decay and decrease our learning rate over time according to a set function (i.e., linear, polynomial, or step decrease).

However, there are two problems with basic learning rate schedules:

  1. We don’t know what the optimal initial learning rate is.
  2. Monotonically decreasing our learning rate may lead to our network getting “stuck” in plateaus of the loss landscape.

Cyclical Learning Rates take a different approach. Using CLRs, we now:

  1. Define a minimum learning rate
  2. Define a maximum learning rate
  3. Allow the learning rate to cyclically oscillate between the two bounds

In practice, using Cyclical Learning Rates leads to faster convergence and with fewer experiments/hyperparameter updates.

And when we combine CLRs with next week’s technique on automatically finding optimal learning rates, you may never need to tune your learning rates again! (or at least run far fewer experiments to tune them).

To learn how to use Cyclical Learning Rates with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Cyclical Learning Rates with Keras and Deep Learning

In the first part of this tutorial, we’ll discuss Cyclical Learning Rates, including:

  • What are Cyclical Learning Rates?
  • Why should we use Cyclical Learning Rates?
  • How do we use Cyclical Learning Rates with Keras?

From there, we’ll implement CLRs and train a variation of GoogLeNet on the CIFAR-10 dataset — I’ll even point out how to use Cyclical Learning Rates with your own custom datasets.

Finally, we’ll review the results of our experiments and you’ll see firsthand how CLRs can reduce the number of learning rate trials you need to perform to find an optimal learning rate range.

What are cyclical learning rates?

Figure 1: Cyclical learning rates oscillate back and forth between two bounds when training, slowly increasing the learning rate after every batch update. To implement cyclical learning rates with Keras, you simply need a callback.

As we discussed in last week’s post, we can define learning rate schedules that monotonically decrease our learning rate after each epoch.

By decreasing our learning rate over time we can allow our model to (ideally) descend into lower areas of the loss landscape.

In practice; however, there are a few problems with a monotonically decreasing learning rate:

  • First, our model and optimizer are still sensitive to our initial choice in learning rate.
  • Second, we don’t know what the initial learning rate should be — we may need to perform 10s to 100s of experiments just to find our initial learning rate.
  • Finally, there is no guarantee that our model will descend into areas of low loss when lowering the learning rate.

To address these issues, Leslie Smith of the NRL introduced Cyclical Learning Rates in his 2015 paper, Cyclical Learning Rates for Training Neural Networks.

Now, instead of monotonically decreasing our learning rate, we instead:

  1. Define the lower bound on our learning rate (called “base_lr”).
  2. Define the upper bound on the learning rate (called the “max_lr”).
  3. Allow the learning rate to oscillate back and forth between these two bounds when training, slowly increasing and decreasing the learning rate after every batch update.

An example of a Cyclical Learning Rate can be seen in Figure 1.

Notice how our learning rate follows a triangular pattern. First, the learning rate is very small. Then, over time, the learning rate continues to grow until it hits the maximum value. The learning rate then descends back down to the base value. This cyclical pattern continues throughout training.

Why should we use Cyclical Learning Rates?

Figure 2: Monotonically decreasing learning rates could lead to a model that is stuck in saddle points or a local minima. By oscillating learning rates cyclically, we have more freedom in our initial learning rate, can break out of saddle points and local minima, and reduce learning rate tuning experimentation. (image source)

As mentioned above, Cyclical Learning Rates enables our learning rate to oscillate back and forth between a lower and upper bound.

So, why bother going through all the trouble?

Why not just monotonically decrease our learning rate, just as we’ve always done?

The first reason is that our network may become stuck in either saddle points or local minima, and the low learning rate may not be sufficient to break out of the area and descend into areas of the loss landscape with lower loss.

Secondly, our model and optimizer may be very sensitive to our initial learning rate choice. If we make a poor initial choice in learning rate, our model may be stuck from the very start.

Instead, we can use Cyclical Learning Rates to oscillate our learning rate between upper and lower bounds, enabling us to:

  1. Have more freedom in our initial learning rate choices.
  2. Break out of saddle points and local minima.

In practice, using CLRs leads to far fewer learning rate tuning experiments along with near identical accuracy to exhaustive hyperparameter tuning.

How do we use Cyclical Learning Rates?

Figure 3: Brad Kenstler’s implementation of deep learning Cyclical Learning Rates for Keras includes three modes — “triangular”, “triangular2”, and “exp_range”. Cyclical learning rates seek to handle training issues when your learning rate is too high or too low shown in this figure. (image source)

We’ll be using Brad Kenstler’s implementation of Cyclical Learning Rates for Keras.

In order to use this implementation we need to define a few values first:

  • Batch size: Number of training examples to use in a single forward and backward pass of the network during training.
  • Batch/Iteration: Number of weight updates per epoch (i.e., # of total training examples divided by the batch size).
  • Cycle: Number of iterations it takes for our learning rate to go from the lower bound, ascend to the upper bound, and then descend back to the lower bound again.
  • Step size: Number of iterations in a half cycle. Leslie Smith, the creator of CLRs, recommends that the step_size should be
    (2-8) * training_iterations_in_epoch
    ). In practice, I have found that step sizes or either 4 or 8 work well in most situations.

With these terms defined, let’s see how they work together to define a Cyclical Learning Rate policy.

The “triangular” policy

Figure 4: The “triangular” policy mode for deep learning cyclical learning rates with Keras.

The “triangular” Cyclical Learning Rate policy is a simple triangular cycle.

Our learning rate starts off at the base value and then starts to increase.

We reach the maximum learning rate value halfway through the cycle (i.e., the step size, or number of iterations in a half cycle). Once the maximum learning rate is hit, we then decrease the learning rate back to the base value. Again, it takes a half cycle to return to the base learning rate.

This entire process repeats (i.e., cyclical) until training is terminated.

The “triangular2” policy

Figure 5: The deep learning cyclical learning rate “triangular2” policy mode is similar to “triangular” but cuts the max learning rate bound in half after every cycle.

The “triangular2” CLR policy is similar to the standard “triangular” policy, but instead cuts our max learning rate bound in half after every cycle.

The argument here is that we get the best of both worlds:

We can oscillate our learning rate to break out of saddle points/local minima…

…and at the same time decrease our learning rate, enabling us to descend into lower loss areas of the loss landscape.

Furthermore, reducing our maximum learning rate over time helps stabilize our training. Later epochs with the “triangular” policy may exhibit large jumps in both loss and accuracy — the “triangular2” policy will help stabilize these jumps.

The “exp_range” policy

Figure 6: The “exp_range” cyclical learning rate policy undergoes exponential decay for the max learning rate bound while still exhibiting the “triangular” policy characteristics.

The “exp_range” Cyclical Learning Rate policy is similar to the “triangular2” policy, but, as the name suggests, instead follows an exponential decay, giving you more fine-tuned control in the rate of decline in max learning rate.

Note: In practice, I don’t use the “exp_range” policy — the “triangular” and “triangular2” policies are more than sufficient in the vast majority of projects.

How do I install Cyclical Learning Rates on my system?

The Cyclical Learning Rate implementation we are using is not pip-installable.

Instead, you can either:

  1. Use the “Downloads” section to grab the file and associated code/data for this tutorial.
  2. Download the
    clr_callback.py
    file from the GitHub repo (linked to above) and insert it into your project.

From there, let’s move on to training our first CNN using a Cyclical Learning Rate.

Project structure

Go ahead and run the

tree
  command from within the
keras-cyclical-learning-rates/
  directory to print our project structure:
$ tree --dirsfirst
.
├── output
│   ├── triangular2_clr_plot.png
│   ├── triangular2_training_plot.png
│   ├── triangular_clr_plot.png
│   └── triangular_training_plot.png
├── pyimagesearch
│   ├── __init__.py
│   ├── clr_callback.py
│   ├── config.py
│   └── minigooglenet.py
└── train_cifar10.py

2 directories, 9 files

The

output/
  directory will contain our CLR and accuracy/loss plots.

The

pyimagesearch
  module contains our cyclical learning rate callback class, MiniGoogLeNet CNN, and configuration file:
  • The
    clr_callback.py
      file contains the Cyclical Learning Rate callback which will update our learning rate automatically at the end of each batch update.
  • The
    minigooglenet.py
      file holds the
    MiniGoogLeNet
      CNN which we will train using CIFAR-10 data. We will not review MiniGoogLeNet today — please refer to Deep Learning for Computer Vision with Python to learn more about this CNN architecture.
  • Our
    config.py
      is simply a Python file containing configuration variables — we’ll review it in the next section.

Our training script,

train_cifar10.py
 , trains MiniGoogLeNet using the CIFAR-10 dataset. The training script takes advantage of our CLR callback and configuration.

Our configuration file

Before we implement our training script, let’s first review our configuration file:

# import the necessary packages
import os

# initialize the list of class label names
CLASSES = ["airplane", "automobile", "bird", "cat", "deer", "dog",
	"frog", "horse", "ship", "truck"]

We will use the

os
  module in our config so that we can construct operating system-agnostic paths directly (Line 2).

From there, our CIFAR-10

CLASSES
  are defined (Lines 5 and 6).

Let’s define our cyclical learning rate parameters:

# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-7
MAX_LR = 1e-2
BATCH_SIZE = 64
STEP_SIZE = 8
CLR_METHOD = "triangular"
NUM_EPOCHS = 96

The

MIN_LR
and
MAX_LR
define our base learning rate and maximum learning rate, respectively (Lines 10 and 11). I know these learning rates will work well when training MiniGoogLeNet per the experiments I have already run for Deep Learning for Computer Vision with Python — next week I will show you how to automatically find these values.

The

BATCH_SIZE
(Line 12) is the number of training examples per batch update.

We then have the

STEP_SIZE
which is the number of batch updates in a half cycle (Line 13).

The

CLR_METHOD
controls our Cyclical Learning Rate policy (Line 14). Here we are using the
”triangular”
policy, as discussed in the previous section.

We can calculate the number of full CLR cycles in a given number of epochs via:

NUM_CLR_CYCLES = NUM_EPOCHS / STEP_SIZE / 2

For example, with

NUM_EPOCHS = 96
and
STEP_SIZE = 8
, there will be a total of 6 full cycles:
96 / 8 / 2 = 6
.

Finally, we define our output plot paths/filenames:

# define the path to the output training history plot and cyclical
# learning rate plot
TRAINING_PLOT_PATH = os.path.sep.join(["output", "training_plot.png"])
CLR_PLOT_PATH = os.path.sep.join(["output", "clr_plot.png"])

We’ll plot a training history accuracy/loss plot as well as a cyclical learning rate plot. You may specify the paths + filenames of the plots on Lines 19 and 20.

Implementing our Cyclical Learning Rate training script

With our configuration defined, we can move on to implementing our training script.

Open up

trian_cifar10.py
and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.minigooglenet import MiniGoogLeNet
from pyimagesearch.clr_callback import CyclicLR
from pyimagesearch import config
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np

Lines 2-15 import our necessary packages. Most notably our

CyclicLR
  (from the
clr_callback
  file) is imported via Line 7. The
matplotlib
  backend is set on Line 3 so that our plots can be written to disk at the end of the training process.

Next, let’s load our CIFAR-10 data:

# load the training and testing data, converting the images from
# integers to floats
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float")
testX = testX.astype("float")

# apply mean subtraction to the data
mean = np.mean(trainX, axis=0)
trainX -= mean
testX -= mean

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(width_shift_range=0.1,
	height_shift_range=0.1, horizontal_flip=True,
	fill_mode="nearest")

Lines 20-22 load the CIFAR-10 image dataset. The data is pre-split into training and testing sets.

From there, we calculate the

mean
  and apply mean subtraction (Lines 25-27). Mean subtraction is a normalization/scaling technique that results in improved model accuracy. For more details, please refer to the Practitioner Bundle of Deep Learning for Computer Vision with Python.

Labels are then binarized (Lines 30-32).

Next, we initialize our data augmentation object (Lines 35-37). Data augmentation increases model generalization by producing randomly mutated images from your dataset during training. I’ve written about data augmentation in-depth in Deep Learning for Computer Vision with Python as well as two blog posts (How to use Keras fit and fit_generator (a hands-on tutorial) and Keras ImageDataGenerator and Data Augmentation).

Let’s initialize (1) our model, and (2) our cyclical learning rate callback:

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=config.MIN_LR, momentum=0.9)
model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# initialize the cyclical learning rate callback
print("[INFO] using '{}' method".format(config.CLR_METHOD))
clr = CyclicLR(
	mode=config.CLR_METHOD,
	base_lr=config.MIN_LR,
	max_lr=config.MAX_LR,
	step_size= config.STEP_SIZE * (trainX.shape[0] // config.BATCH_SIZE))

Our

model
  is initialized with stochastic gradient descent (
SGD
 ) optimization and
"categorical_crossentropy"
  loss (Lines 41-44). If you have only two classes in your dataset, be sure to set
loss="binary_crossentropy"
 .

Next, we initialize the cyclical learning rate callback via Lines 48-52. The CLR parameters are provided to the constructor. Now is a great time to review them at the top of the “How do we use Cyclical Learning Rates?” section above. The

step_size
  follow’s Leslie Smith’s recommendation of setting it to be a multiple of the number of batch updates per epoch.

Let’s train and evaluate our model using CLR now:

# train the network
print("[INFO] training network...")
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=config.BATCH_SIZE),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // config.BATCH_SIZE,
	epochs=config.NUM_EPOCHS,
	callbacks=[clr],
	verbose=1)

# evaluate the network and show a classification report
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=config.BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=config.CLASSES))

Lines 56-62 launch training using the

clr
  callback and data augmentation.

Then Lines 66-68 evaluate the network on the testing set and print a

classification_report
 .

Finally, we’ll generate two plots:

# construct a plot that plots and saves the training history
N = np.arange(0, config.NUM_EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(config.TRAINING_PLOT_PATH)

# plot the learning rate history
N = np.arange(0, len(clr.history["lr"]))
plt.figure()
plt.plot(N, clr.history["lr"])
plt.title("Cyclical Learning Rate (CLR)")
plt.xlabel("Training Iterations")
plt.ylabel("Learning Rate")
plt.savefig(config.CLR_PLOT_PATH)

Two plots are generated:

  • Training accuracy/loss history (Lines 71-82). The standard plot format included in most of my tutorials and every experiment of my deep learning book.
  • Learning rate history (Lines 86-91). This plot will help us to visually verify that our learning rate is oscillating according to our intentions.

Training with cyclical learning rates

We are now ready to train our CNN using Cyclical Learning Rates with Keras!

Make sure you’ve used the “Downloads” section of this post to download the source code — from there, open up a terminal and execute the following command:

$ python train_cifar10.py
[INFO] loading CIFAR-10 data...
[INFO] compiling model...
[INFO] using 'triangular' method
[INFO] training network...
Epoch 1/96
781/781 [==============================] - 100s 128ms/step - loss: 1.9371 - acc: 0.2864 - val_loss: 1.5345 - val_acc: 0.4460
Epoch 2/96
781/781 [==============================] - 97s 124ms/step - loss: 1.3926 - acc: 0.4956 - val_loss: 1.3106 - val_acc: 0.5454
Epoch 3/96
781/781 [==============================] - 96s 123ms/step - loss: 1.1397 - acc: 0.5906 - val_loss: 1.1766 - val_acc: 0.5875
Epoch 4/96
781/781 [==============================] - 96s 123ms/step - loss: 0.9629 - acc: 0.6600 - val_loss: 0.8561 - val_acc: 0.7054
Epoch 5/96
781/781 [==============================] - 96s 123ms/step - loss: 0.8405 - acc: 0.7067 - val_loss: 0.9309 - val_acc: 0.6837
...
Epoch 92/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0604 - acc: 0.9798 - val_loss: 0.3729 - val_acc: 0.9047
Epoch 93/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0475 - acc: 0.9842 - val_loss: 0.3786 - val_acc: 0.9087
Epoch 94/96
781/781 [==============================] - 96s 122ms/step - loss: 0.0383 - acc: 0.9878 - val_loss: 0.3567 - val_acc: 0.9114
Epoch 95/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0303 - acc: 0.9904 - val_loss: 0.3551 - val_acc: 0.9144
Epoch 96/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0275 - acc: 0.9921 - val_loss: 0.3347 - val_acc: 0.9168
[INFO] evaluating network...
10000/10000 [==============================] - 5s 517us/step
              precision    recall  f1-score   support

    airplane       0.92      0.94      0.93      1000
  automobile       0.94      0.97      0.95      1000
        bird       0.88      0.89      0.89      1000
         cat       0.84      0.83      0.83      1000
        deer       0.92      0.92      0.92      1000
         dog       0.89      0.85      0.87      1000
        frog       0.92      0.95      0.93      1000
       horse       0.95      0.94      0.95      1000
        ship       0.95      0.95      0.95      1000
       truck       0.95      0.94      0.94      1000

   micro avg       0.92      0.92      0.92     10000
   macro avg       0.92      0.92      0.92     10000
weighted avg       0.92      0.92      0.92     10000

As you can see, by using the “triangular” CLR policy we are obtaining 92% accuracy on our CIFAR-10 testing set.

The following figure shows the learning rate plot, demonstrating how it cyclically starts at our lower learning rate bound, increases to the maximum value at half a cycle, and then decreases again to the lower bound, completing the cycle:

Figure 7: Our first experiment of deep learning cyclical learning rates with Keras uses the “triangular” policy.

Examining our training history you can see the cyclical behavior of the learning rate:

Figure 8: Our first experiment training history plot shows the effects of the “triangular” policy on the accuracy/loss curves.

Notice the “wave” in the training accuracy and validation accuracy — the bottom of the wave is our base learning rate, the top of the wave is the upper bound on the learning rate, and the bottom of the wave, just before the next one starts, is the lower learning rate.

Just for fun, go back to Line 14 of the

config.py
file and update the
CLR_METHOD
to be
”triangular2”
instead of
”triangular”
:
# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-7
MAX_LR = 1e-2
BATCH_SIZE = 64
STEP_SIZE = 8
CLR_METHOD = "triangular2"
NUM_EPOCHS = 96

From there, train the network:

$ python train_cifar10.py
[INFO] loading CIFAR-10 data...
[INFO] compiling model...
[INFO] using 'triangular2' method
[INFO] training network...
Epoch 1/96
781/781 [==============================] - 100s 128ms/step - loss: 1.9570 - acc: 0.2773 - val_loss: 1.6161 - val_acc: 0.4176
Epoch 2/96
781/781 [==============================] - 96s 123ms/step - loss: 1.4089 - acc: 0.4848 - val_loss: 1.3741 - val_acc: 0.5182
Epoch 3/96
781/781 [==============================] - 96s 123ms/step - loss: 1.1591 - acc: 0.5876 - val_loss: 1.3593 - val_acc: 0.5570
Epoch 4/96
781/781 [==============================] - 96s 123ms/step - loss: 0.9897 - acc: 0.6520 - val_loss: 0.9174 - val_acc: 0.6727
Epoch 5/96
781/781 [==============================] - 96s 123ms/step - loss: 0.8656 - acc: 0.6985 - val_loss: 0.9247 - val_acc: 0.6855
...
Epoch 92/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0715 - acc: 0.9766 - val_loss: 0.3728 - val_acc: 0.8963
Epoch 93/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0692 - acc: 0.9772 - val_loss: 0.3744 - val_acc: 0.8957
Epoch 94/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0707 - acc: 0.9770 - val_loss: 0.3689 - val_acc: 0.8959
Epoch 95/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0694 - acc: 0.9780 - val_loss: 0.3685 - val_acc: 0.8972
Epoch 96/96
781/781 [==============================] - 96s 123ms/step - loss: 0.0687 - acc: 0.9773 - val_loss: 0.3708 - val_acc: 0.8978
[INFO] evaluating network...
10000/10000 [==============================] - 5s 520us/step
              precision    recall  f1-score   support

    airplane       0.90      0.92      0.91      1000
  automobile       0.94      0.96      0.95      1000
        bird       0.85      0.86      0.85      1000
         cat       0.82      0.79      0.80      1000
        deer       0.88      0.87      0.88      1000
         dog       0.85      0.84      0.85      1000
        frog       0.90      0.94      0.92      1000
       horse       0.94      0.92      0.93      1000
        ship       0.96      0.94      0.95      1000
       truck       0.93      0.94      0.94      1000

   micro avg       0.90      0.90      0.90     10000
   macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000

This time we are obtaining 90% accuracy, slightly lower than using the “triangular” policy.

Our learning rate plot visualizes how our learning rate is cyclically updated:

Figure 9: Our second experiment uses the “triangular2” cyclical learning rate policy mode. The actual learning rates throughout training are shown in the plot.

Notice at after each complete cycle the maximum learning rate is halved. Since our maximum learning rate is decreasing after every cycle, our “waves” in the training and validation accuracy will be much less pronounced:

Figure 10: Our second experiment training history plot shows the effects of the “triangular2” policy on the accuracy/loss curves.

While the “triangular” Cyclical Learning Rate policy obtained slightly better accuracy, it also exhibited far much fluctuation and had more risk of overfitting.

In contrast, the “triangular2” policy, while being less accurate, is more stable in its training.

When performing your own experiments with Cyclical Learning Rates I suggest you test both policies and choose the one that balances both accuracy and stability (i.e., stable training with less risk of overfitting).

In next week’s tutorial, I’ll show you how you can automatically define your minimum and maximum learning rate bounds with Cyclical Learning Rates.

Where can I learn more?

Figure 11: My deep learning book, Deep Learning for Computer Vision with Python, is trusted by employees and students of top institutions.

If you’re interested in diving head-first into the world of computer vision/deep learning and discovering how to:

  • Train Convolutional Neural Networks on your own custom datasets
  • Replicate the results of state-of-the-art papers, including ResNet, SqueezeNet, VGGNet, and others
  • Train your own custom Faster R-CNN, Single Shot Detectors (SSDs), and RetinaNet object detectors
  • Use Mask R-CNN to train your own instance segmentation networks
  • Train Generative Adversarial Networks (GANs)

…then be sure to take a look at my book, Deep Learning for Computer Vision with Python!

My complete, self-study deep learning book is trusted by members of top machine learning schools, companies, and organizations, including Microsoft, Google, Stanford, MIT, CMU, and more!

Readers of my book have gone on to win Kaggle competitions, secure academic grants, and start careers in CV and DL using the knowledge they gained through study and practice.

My book not only teaches the fundamentals, but also teaches advanced techniques, best practices, and tools to ensure that you are armed with practical knowledge and proven coding recipes to tackle nearly any computer vision and deep learning problem presented to you in school, in your research, or in the modern workforce.

Be sure to take a look  — and while you’re at it, don’t forget to grab your (free) table of contents + sample chapters.

Summary

In this tutorial, you learned how to use Cyclical Learning Rates (CLRs) with Keras.

Unlike standard learning rate decay schedules, which monotonically decrease our learning rate, CLRs instead:

  • Define a minimum learning rate
  • Define a maximum learning rate
  • Allow the learning rate to cyclically oscillate between the two bounds

Cyclical Learning rates often lead to faster convergence with fewer experiments and hyperparameter tuning.

But there’s still a problem…

How do we know that the optimal lower and upper bounds of the learning rate are?

That’s a great question — and I’ll be answering it in next week’s post where I’ll show you how to automatically find optimal learning rate values.

I hope you enjoyed today’s post!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Cyclical Learning Rates with Keras and Deep Learning appeared first on PyImageSearch.

Keras Learning Rate Finder

$
0
0

In this tutorial, you will learn how to automatically find learning rates using Keras. This guide provides a Keras implementation of fast.ai’s popular “lr_find” method.

Today is part three in our three-part series of learning rate schedules, policies, and decay using Keras:

  1. Part #1: Keras learning rate schedules and decay
  2. Part #2: Cyclical Learning Rates with Keras and Deep Learning (last week’s post)
  3. Part #3: Keras Learning Rate Finder (today’s post)

Last week we discussed Cyclical Learning Rates (CLRs) and how they can be used to obtain high accuracy models with fewer experiments and limited hyperparameter tuning.

The CLR method allows our learning rate to cyclically oscillate between a lower and upper bound; however, the question still remains, how do we know what are good choices for our learning rates?

Today I’ll be answering that question.

And by the time you have completed this tutorial, you will understand how to automatically find optimal learning rates for your neural network, saving you 10s, 100s or even 1000s of hours in compute time running experiments to tune your hyperparameters.

To learn how to find optimal learning rates using Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras Learning Rate Finder

In the first part of this tutorial we’ll briefly discuss a simple, yet elegant, algorithm that can be used to automatically find optimal learning rates for your deep neural network.

From there, I’ll show you how to implement this method using the Keras deep learning framework.

We’ll apply the learning rate finder implementation to an example dataset, enabling us to obtain our optimal learning rates.

We’ll then take the found learning rates and fully train our network using a Cyclical Learning Rate policy.

How to automatically find optimal learning rates for your neural network architecture

Figure 1: Deep learning requires tuning of hyperparameters such as the learning rate. Using a learning rate finder in Keras, we can automatically find a suitable min/max learning rate for cyclical learning rate scheduling and apply it with success to our experiments. This figure contains the basic algorithm for a learning rate finder.

In last week’s tutorial on Cyclical Learning Rates (CLRs), we discussed Leslie Smith’s 2017 paper, Cyclical Learning Rates for Training Neural Networks.

Based on the title of the paper alone, the obvious contribution to the deep learning community by Dr. Smith is the Cyclical Learning Rate algorithm.

However, there is a second contribution that is arguably more important than CLRs — automatically finding learning rates. Inside the paper, Dr. Smith proposes a simple, yet elegant solution on how we can automatically find optimal learning rates for training.

Note: While the algorithm was introduced by Dr. Smith, it wasn’t popularized until Jermey Howard of fast.ai suggested that his students use it. If you’re a fast.ai user, you may remember the

learn.lr_find
  function — we’ll be implementing a similar method using Keras instead.

The automatic learning rate finder algorithm works like this:

  • Step #1: We start by defining an upper and lower bound on our learning rate. The lower bound should be very small (1e-10) and the upper bound should be very large (1e+1).
    • At 1e-10 the learning rate will be too small for our network to learn, while at 1e+1 the learning rate will be too large and our model will overfit.
    • Both of these are okay, and in fact, that’s what we hope to see!
  • Step #2: We then start training our network, starting at the lower bound.
    • After each batch update, we exponentially increase our learning rate.
    • We log the loss after each batch update as well.
  • Step #3: Training continues, and therefore the learning rate continues to increase until we hit our maximum learning rate value.
    • Typically, this entire training process/learning rate increase only takes 1-5 epochs.
  • Step #4: After training is complete we plot a smoothed loss over time, enabling us to see when the learning rate is both:
    • Just large enough for loss to decrease
    • And too large, to the point where loss starts to increase.

The following figure (which is the same as the header at the very top of the post, but included here again so you can easily follow along) visualizes the output of the learning rate finder algorithm on the CIFAR-10 dataset using a variation of GoogLeNet (which we’ll utilize later in this tutorial):

Figure 2: When inspecting a deep learning experiment, be sure to look at the loss landscape. This plot helps you identify when your learning rate is too low or too high.

While examining this plot, keep in mind that our learning rate is exponentially increasing after each batch update. After a given batch completes, we increase the learning rate for the next batch.

Notice that from 1e-10 to 1e-6 our loss is essentially flat — the learning rate is too small for the network to actually learn anything. Starting at approximately 1e-5 our loss starts to decline — this is the smallest learning rate where our network can actually learn.

By the time we hit 1e-4 our network is learning very quickly. At a little past 1e-2 there is a small increase in loss, but the big increase doesn’t begin until 1e-1.

Finally, by 1e+1 our loss has exploded — the learning rate is far too high for our model to learn.

Given this plot, I can visually examine it and pick out my lower and upper bounds on my learning rate for CLR:

  • Lower bound: 1e-5
  • Upper bound: 1e-2

If you are using a learning rate schedule/decay policy then you would select 1e-2 as your initial learning rate and then decrease it as you train.

However, in this tutorial we’ll be using a Cyclical Learning Rate policy so we need both the lower and upper bound.

But, the question still remains:

How did I generate that learning rate plot?

I’ll be answering that question later in this tutorial.

Project structure

Go ahead and grab the “Downloads” for today’s post which contain last week’s CLR implementation and this week’s Learning Rate Finder in addition to our training script.

Then, inspect the project layout with the

tree
  command:
$ tree --dirsfirst
.
├── output
│   ├── clr_plot.png
│   ├── lrfind_plot.png
│   └── training_plot.png
├── pyimagesearch
│   ├── __init__.py
│   ├── clr_callback.py
│   ├── config.py
│   ├── learningratefinder.py
│   └── minigooglenet.py
└── train.py

2 directories, 9 files

Our

output/
  directory contains three plots:
  • The Cyclical Learning Rate graph.
  • The Learning Rate Finder loss vs. learning rate plot.
  • Our training accuracy/loss history plot.

The pyimagesearch module contains three classes and a configuration:

  • clr_callback.py
     : Contains the
    CyclicLR
      class which is implemented as a Keras callback.
  • learningratefinder.py
     : Holds the
    LearningRateFinder
      class — the focus of today’s tutorial.
  • minigooglenet.py
     : Has our MiniGoogLeNet CNN class. We won’t go into details today. Please refer to Deep Learning for Computer Vision with Python for a deep dive into GoogLeNet as well as MiniGoogLeNet.
  • config.py
     : A Python file which holds configuration parameters/variables.

After we review the

config.py
  and
learningratefinder.py
  files, we’ll walk through
train.py
 , our Keras CNN training script.

Implementing our automatic Keras learning rate finder

Let’s now define a class named

LearningRateFinder
  that encapsulates the algorithm logic from the “How to automatically find optimal learning rates for your neural network architecture” section above.

Our implementation is inspired by the

LRFinder
  class from the ktrain library — my primary contribution is a few adjustments to the algorithm, including detailed documentation and comments for the source code, making it easier for readers to understand how the algorithm works.

Let’s go ahead and get started — open up the

learningratefinder.py
  file in your project directory and insert the following code:
# import the necessary packages
from keras.callbacks import LambdaCallback
from keras import backend as K
import matplotlib.pyplot as plt
import numpy as np
import tempfile

class LearningRateFinder:
	def __init__(self, model, stopFactor=4, beta=0.98):
		# store the model, stop factor, and beta value (for computing
		# a smoothed, average loss)
		self.model = model
		self.stopFactor = stopFactor
		self.beta = beta

		# initialize our list of learning rates and losses,
		# respectively
		self.lrs = []
		self.losses = []

		# initialize our learning rate multiplier, average loss, best
		# loss found thus far, current batch number, and weights file
		self.lrMult = 1
		self.avgLoss = 0
		self.bestLoss = 1e9
		self.batchNum = 0
		self.weightsFile = None

Packages are imported on Lines 2-6. We’ll use a

LambdaCallback
  to instruct our callback to run at the end of each batch update. We will use
matplotlib
  to implement a method called
plot_loss
  which plots the loss vs. learning rate.

The constructor to the

LearningRateFinder
  class begins on Line 9.

First, we store the initialized model,

stopFactor
 which indicates when to bail out of training (if our loss becomes too large), and
beta
  value (used for averaging/smoothing our loss for visualization purposes).

We then initialize our lists of learning rates along with the loss values for the respective learning rates (Lines 17 and 18).

We proceed to perform a few more initializations, including:

  • lrMult
    : The learning rate multiplication factor.
  • avgLoss
    : Average loss over time.
  • bestLoss
    : The best loss we have found when training.
  • batchNum
    : The current batch number update.
  • weightsFile
    : Our path to the initial model weights (so we can reload the weights and set them to their initial values prior to training after our learning rate has been found).

The

reset
  method is a convenience/helper function used to reset all variables from our constructor:
def reset(self):
		# re-initialize all variables from our constructor
		self.lrs = []
		self.losses = []
		self.lrMult = 1
		self.avgLoss = 0
		self.bestLoss = 1e9
		self.batchNum = 0
		self.weightsFile = None

When using our

LearningRateFinder
  class, must be able to work with both:
  • Data that can fit into memory.
  • Data that either (1) is loaded via a data generator and is therefore too large to fit into memory or (2) needs data augmentation/additional processing applied to it and thus uses a generator.

If the entire dataset can fit into memory and no data augmentation is applied, we can use Keras’

.fit
method to train our model.

However, if we are using a data generator, either out of memory necessity or because we are applying data augmentation, we instead need to use Keras’

.fit_generator
function for training.

Note: You can read about the differences between

.fit
and
.fit_generator
in this tutorial.

The

is_data_generator
function can determine if our input
data
is a raw NumPy array or if we are using a data generator:
def is_data_iter(self, data):
		# define the set of class types we will check for
		iterClasses = ["NumpyArrayIterator", "DirectoryIterator",
			 "DataFrameIterator", "Iterator", "Sequence"]

		# return whether our data is an iterator
		return data.__class__.__name__ in iterClasses

Here we are checking to see if the name of the input data class belongs to the set of data iterator classes defined in Lines 41 and 42.

If you are using your own custom data generator simply encapsulate it in a class and then add your class name to the list of

iterClasses
.

The

on_batch_end
  function is responsible for updating our learning rate after every batch is complete (i.e., both the forward and backward pass):
def on_batch_end(self, batch, logs):
		# grab the current learning rate and add log it to the list of
		# learning rates that we've tried
		lr = K.get_value(self.model.optimizer.lr)
		self.lrs.append(lr)

		# grab the loss at the end of this batch, increment the total
		# number of batches processed, compute the average average
		# loss, smooth it, and update the losses list with the
		# smoothed value
		l = logs["loss"]
		self.batchNum += 1
		self.avgLoss = (self.beta * self.avgLoss) + ((1 - self.beta) * l)
		smooth = self.avgLoss / (1 - (self.beta ** self.batchNum))
		self.losses.append(smooth)

		# compute the maximum loss stopping factor value
		stopLoss = self.stopFactor * self.bestLoss

		# check to see whether the loss has grown too large
		if self.batchNum > 1 and smooth > stopLoss:
			# stop returning and return from the method
			self.model.stop_training = True
			return

		# check to see if the best loss should be updated
		if self.batchNum == 1 or smooth < self.bestLoss:
			self.bestLoss = smooth

		# increase the learning rate
		lr *= self.lrMult
		K.set_value(self.model.optimizer.lr, lr)

We’ll be using this method as a Keras callback and therefore we need to ensure our function accepts two variables Keras expects to see —

batch
  and
logs.

Lines 50 and 51 grab the current learning rate from our optimizer and then add it to our learning rates list (

lrs
 ).

Lines 57 and 58 grab the loss at the end of the batch and then increment our batch number.

Lines 58-61 compute the average loss, smooth it, and then update the

losses
list with the
smooth
average.

Line 64 computes our maximum loss value which is a function of our

stopFactor
and our
bestLoss
found thus far.

If our

smoothLoss
has grown to be larger than our
stopLoss
, then we stop training (Lines 67-70).

Lines 73 and 74 check to see if a new

bestLoss
  has been found, and if so, we update the variable.

Finally, Line 77 increases our learning rate while Line 78 sets the learning rate value for the next batch.

Our next method,

find
, is responsible for automatically finding our optimal learning rate for training. We’ll call this method from our driver script when we are ready to find our learning rate:
def find(self, trainData, startLR, endLR, epochs=None,
		stepsPerEpoch=None, batchSize=32, sampleSize=2048,
		verbose=1):
		# reset our class-specific variables
		self.reset()

		# determine if we are using a data generator or not
		useGen = self.is_data_iter(trainData)

		# if we're using a generator and the steps per epoch is not
		# supplied, raise an error
		if useGen and stepsPerEpoch is None:
			msg = "Using generator without supplying stepsPerEpoch"
			raise Exception(msg)

		# if we're not using a generator then our entire dataset must
		# already be in memory
		elif not useGen:
			# grab the number of samples in the training data and
			# then derive the number of steps per epoch
			numSamples = len(trainData[0])
			stepsPerEpoch = np.ceil(numSamples / float(batchSize))

		# if no number of training epochs are supplied, compute the
		# training epochs based on a default sample size
		if epochs is None:
			epochs = int(np.ceil(sampleSize / float(stepsPerEpoch)))

Our

find
  method accepts a number of parameters, including:
  • trainData
    : Our training data (either a NumPy array of data or a data generator).
  • startLR
    : The initial, starting learning rate.
  • epochs
    : The number of epochs to train for (if no value is supplied we’ll compute the number of epochs).
  • stepsPerEpoch
    : The total number of batch update steps per each epoch.
  • batchSize
    : The batch size of our optimizer.
  • sampleSize
    : The number of samples from
    trainData
    to use when finding the optimal learning rate.
  • verbose
    : Verbosity setting for Keras
    .fit
    and
    .fit_generator
    functions.

Line 84 resets our class-specific variables while Line 87 checks to see if we are using a data iterator or a raw NumPy array.

In the case that we are using a data generator and there is no

stepsPerEpoch
variable supplied, we raise an exception since we cannot possibly determine the
stepsPerEpoch
from a generator (see this tutorial for more information on why this is true).

Otherwise, if we are not using a data generator we grab the

numSamples
from the
trainData
and then compute the number of
stepsPerEpoch
by dividing the number of data points by our
batchSize
(Lines 97-101).

Finally, if no number of epochs were supplied, we compute the number of training epochs by dividing the

sampleSize
by the number of
stepsPerEpoch
. My personal preference is to have the number of epochs ultimately be in the 3-5 range — long enough to obtain reliable results but not so long that I waste hours and hours of training time.

Let’s move on to the heart of the automatic learning rate finder algorithm:

# compute the total number of batch updates that will take
		# place while we are attempting to find a good starting
		# learning rate
		numBatchUpdates = epochs * stepsPerEpoch

		# derive the learning rate multiplier based on the ending
		# learning rate, starting learning rate, and total number of
		# batch updates
		self.lrMult = (endLR / startLR) ** (1.0 / numBatchUpdates)

		# create a temporary file path for the model weights and
		# then save the weights (so we can reset the weights when we
		# are done)
		self.weightsFile = tempfile.mkstemp()[1]
		self.model.save_weights(self.weightsFile)

		# grab the *original* learning rate (so we can reset it
		# later), and then set the *starting* learning rate
		origLR = K.get_value(self.model.optimizer.lr)
		K.set_value(self.model.optimizer.lr, startLR)

Line 111 computes the total number of batch updates that will take place when finding our learning rate.

Using the

numBatchUpdates
we deriving the learning rate multiplication factor (
lrMult
) which is used to exponentially increase our learning rate (Line 116).

Lines 121 and 122 create a temporary file to store our model’s initial weights. We’ll then restore these weights when our learning rate finder has completed running.

Next, we grab the original learning rate for the optimizer (Line 126), store it in

origLR
 , and then instruct Keras to set the initial learning rate (
startLR
 ) for the optimizer.

Let’s create our

LambdaCallback
which will call our
on_batch_end
method each time a batch completes:
# construct a callback that will be called at the end of each
		# batch, enabling us to increase our learning rate as training
		# progresses
		callback = LambdaCallback(on_batch_end=lambda batch, logs:
			self.on_batch_end(batch, logs))

		# check to see if we are using a data iterator
		if useGen:
			self.model.fit_generator(
				trainData,
				steps_per_epoch=stepsPerEpoch,
				epochs=epochs,
				verbose=verbose,
				callbacks=[callback])

		# otherwise, our entire training data is already in memory
		else:
			# train our model using Keras' fit method
			self.model.fit(
				trainData[0], trainData[1],
				batch_size=batchSize,
				epochs=epochs,
				callbacks=[callback],
				verbose=verbose)

		# restore the original model weights and learning rate
		self.model.load_weights(self.weightsFile)
		K.set_value(self.model.optimizer.lr, origLR)

Lines 132 and 133 construct our

callback
— each time a batch completes the
on_batch_end
method will be called to automatically update our learning rate.

In the event we’re using a data generator, we’ll train our model using the

.fit_generator
  method (Lines 136-142). Otherwise, our entire training data exists in memory as a NumPy array, so we can use the
.fit
method (Lines 145- 152).

After training is complete, we reset the initial model weights and learning rate value (Lines 155 and 156).

Our final method,

plot_loss
, is used to plot both our learning rates and losses over time:
def plot_loss(self, skipBegin=10, skipEnd=1, title=""):
		# grab the learning rate and losses values to plot
		lrs = self.lrs[skipBegin:-skipEnd]
		losses = self.losses[skipBegin:-skipEnd]

		# plot the learning rate vs. loss
		plt.plot(lrs, losses)
		plt.xscale("log")
		plt.xlabel("Learning Rate (Log Scale)")
		plt.ylabel("Loss")

		# if the title is not empty, add it to the plot
		if title != "":
			plt.title(title)

This exact method generated the plot you saw in Figure 2.

Later, in the “Finding our optimal learning rate with Keras” section of this tutorial, you’ll discover how to use the

LearningRateFinder
  class we just implemented to automatically find optimal learning rates with Keras.

Our configuration file

Before we implement our actual training script, let’s create our configuration.

Open up the

config.py
file and insert the following code:
# import the necessary packages
import os

# initialize the list of class label names
CLASSES = ["top", "trouser", "pullover", "dress", "coat",
	"sandal", "shirt", "sneaker", "bag", "ankle boot"]

# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-5
MAX_LR = 1e-2
BATCH_SIZE = 64
STEP_SIZE = 8
CLR_METHOD = "triangular"
NUM_EPOCHS = 48

# define the path to the output learning rate finder plot, training
# history plot and cyclical learning rate plot
LRFIND_PLOT_PATH = os.path.sep.join(["output", "lrfind_plot.png"])
TRAINING_PLOT_PATH = os.path.sep.join(["output", "training_plot.png"])
CLR_PLOT_PATH = os.path.sep.join(["output", "clr_plot.png"])

We’ll be using the Fashion MNIST as the dataset for this project. Lines 5 and 6 set the class labels for the Fashion MNIST dataset.

Our Cyclical Learning Rate parameters are specified on Lines 10-15. The

MIN_LR
and
MAX_LR
will be found in the “Finding our optimal learning rate with Keras” section below, but we’re including them here as a matter of completeness. If you need to review these parameters, please refer to last week’s post.

We will output three types of plots and the paths are specified via Lines 19-21. Two of the plots will be the same format as last week’s (training and CLR). The new type is the “learning rate finder” plot.

Implementing our learning rate finder training script

Our learning rate finder script will be responsible for both:

  1. Automatically finding our initial learning rate using the
    LearningRateFinder
      class we implemented earlier in this guide
  2. And taking the learning rate values we find and then training the network on the entire dataset.

Let’s go ahead and get started!

Open up the

train.py
file and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.learningratefinder import LearningRateFinder
from pyimagesearch.minigooglenet import MiniGoogLeNet
from pyimagesearch.clr_callback import CyclicLR
from pyimagesearch import config
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.datasets import fashion_mnist
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import sys

Lines 2-19 import our required packages. Notice how we import our

LearningRateFinder
  as well as our
CyclicLR
  callback. We’ll be training
MiniGoogLeNet
  on
fashion_mnist
 . To learn more about the dataset, be sure to read Fashion MNIST with Keras and Deep Learning.

Let’s go ahead and parse a command line argument:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-f", "--lr-find", type=int, default=0,
	help="whether or not to find optimal learning rate")
args = vars(ap.parse_args())

We have a single command line argument,

--lr-find
 , a flag which indicates whether or not to find the optimal learning rate.

Next, let’s prepare our data:

# load the training and testing data
print("[INFO] loading Fashion MNIST data...")
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

# Fashion MNIST images are 28x28 but the network we will be training
# is expecting 32x32 images
trainX = np.array([cv2.resize(x, (32, 32)) for x in trainX])
testX = np.array([cv2.resize(x, (32, 32)) for x in testX])

# scale the pixel intensities to the range [0, 1]
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

# reshape the data matrices to include a channel dimension (required
# for training)
trainX = trainX.reshape((trainX.shape[0], 32, 32, 1))
testX = testX.reshape((testX.shape[0], 32, 32, 1))

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(width_shift_range=0.1,
	height_shift_range=0.1, horizontal_flip=True,
	fill_mode="nearest")

Here we:

  • Load the Fashion MNIST dataset (Line 29).
  • Resize each image from 28×28 images to 32×32 images (what our network expects the inputs to be) on Lines 33 and 34.
  • Scale pixel intensities to the range [0, 1] (Lines 37 and 38).
  • Binarize labels (Lines 46-48).
  • Construct our data augmentation object (Lines 51-53). Read more about data augmentation in my previous posts as well as in the Practitioner Bundle of Deep Learning for Computer Vision with Python.

From here, we can

compile
  our
model
 :
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=config.MIN_LR, momentum=0.9)
model = MiniGoogLeNet.build(width=32, height=32, depth=1, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

Our

model
  is compiled with
SGD
  (Stochastic Gradient Descent) optimization. We use
"categorical_crossentropy"
  loss since we have > 2 classes. Be sure to use
"binary_crossentropy"
  if your dataset only has 2 classes.

The following if-then block handles the case when we’re finding the optimal learning rate:

# check to see if we are attempting to find an optimal learning rate
# before training for the full number of epochs
if args["lr_find"] > 0:
	# initialize the learning rate finder and then train with learning
	# rates ranging from 1e-10 to 1e+1
	print("[INFO] finding learning rate...")
	lrf = LearningRateFinder(model)
	lrf.find(
		aug.flow(trainX, trainY, batch_size=config.BATCH_SIZE),
		1e-10, 1e+1,
		stepsPerEpoch=np.ceil((len(trainX) / float(config.BATCH_SIZE))),
		batchSize=config.BATCH_SIZE)

	# plot the loss for the various learning rates and save the
	# resulting plot to disk
	lrf.plot_loss()
	plt.savefig(config.LRFIND_PLOT_PATH)

	# gracefully exit the script so we can adjust our learning rates
	# in the config and then train the network for our full set of
	# epochs
	print("[INFO] learning rate finder complete")
	print("[INFO] examine plot and adjust learning rates before training")
	sys.exit(0)

Line 64 checks to see if we should attempt to find optimal learning rates. Assuming so, we:

  • Initialize
    LearningRateFinder
      (Line 68).
  • Start training with a
    1e-10
      learning rate and exponentially increase it until we hit
    1e+1
      (Lines 69-73).
  • Plot the loss vs. learning rate and save the resulting figure (Lines 77 and 78).
  • Gracefully exit the script after printing a couple of messages to the user (Lines 83-85).

After this code executes we now need to:

  1. Review the generated plot.
  2. Update
    config.py
    with our
    MIN_LR
      and
    MAX_LR
    , respectively.
  3. Train the network on our full dataset.

Assuming we have completed steps 1 and 2, now let’s handle the 3rd step where our minimum and maximum learning rate have already been found and updated in the config. In this case, it is time to initialize our Cyclical Learning Rate class and commence training:

# otherwise, we have already defined a learning rate space to train
# over, so compute the step size and initialize the cyclic learning
# rate method
stepSize = config.STEP_SIZE * (trainX.shape[0] // config.BATCH_SIZE)
clr = CyclicLR(
	mode=config.CLR_METHOD,
	base_lr=config.MIN_LR,
	max_lr=config.MAX_LR,
	step_size=stepSize)

# train the network
print("[INFO] training network...")
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=config.BATCH_SIZE),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // config.BATCH_SIZE,
	epochs=config.NUM_EPOCHS,
	callbacks=[clr],
	verbose=1)

# evaluate the network and show a classification report
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=config.BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=config.CLASSES))

Our

CyclicLR
  is initialized with the freshly set parameters in our config file (Lines 90-95).

Then our

model
  is trained using .fit_generator with our
aug
  data augmentation object and our
clr
  callback (Lines 99-105).

Upon training completion, we proceed to evaluate our network on the testing set (Line 109). A

classification_report
  is printed in the terminal for us to inspect.

Finally, let’s plot both our training history and CLR history:

# construct a plot that plots and saves the training history
N = np.arange(0, config.NUM_EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(config.TRAINING_PLOT_PATH)

# plot the learning rate history
N = np.arange(0, len(clr.history["lr"]))
plt.figure()
plt.plot(N, clr.history["lr"])
plt.title("Cyclical Learning Rate (CLR)")
plt.xlabel("Training Iterations")
plt.ylabel("Learning Rate")
plt.savefig(config.CLR_PLOT_PATH)

Two plots are generated for the training procedure to accompany the learning rate finder plot that we already should have:

  • Training accuracy/loss history (Lines 114-125). The standard plot format included in most of my tutorials and every experiment of my deep learning book.
  • Learning rate history (Lines 128-134). This plot will help us to visually verify that our learning rate is oscillating according to our CLR intentions.

Finding our optimal learning rate with Keras

We are now ready to find our optimal learning rates!

Make sure you’ve used the “Downloads” section of the tutorial to download the source code — from there, open up a terminal and execute the following command:

$ python train.py --lr-find 1
[INFO] loading Fashion MNIST data...
[INFO] compiling model...
[INFO] finding learning rate...
Epoch 1/3
938/938 [==============================] - 110s 118ms/step - loss: 2.5725 - acc: 0.1049
Epoch 2/3
938/938 [==============================] - 107s 114ms/step - loss: 2.1090 - acc: 0.2554
Epoch 3/3
894/938 [==============================] - 107s 114ms/step - loss: 0.9854 - acc: 0.6744
[INFO] learning finder complete
[INFO] examine plot and adjust learning rates before training

Figure 3: Analyzing a deep learning loss vs. learning rate plot to find an optimal learning rate for Keras.

The

--lr-find
flag instructs our script to utilize the
LearningRateFinder
class to exponentially increase our learning rate from 1e-10 to 1e+1.

The learning rate is increased after each batch update until our max learning rate is achieved.

Figure 3 visualizes our loss:

  • Loss is stagnant and does not decrease from 1e-10 to approximately 1e-6, implying that the learning rate is too small and our network is not learning.
  • At approximately 1e-5 our loss starts to decrease, meaning that our learning rate is just large enough that the model can start to learn.
  • By 1e-4 and 1e-3 loss is dropping rapidly, indicating that this is a “sweet spot” where the network can learn quickly.
  • Just after 1e-2 there is a tiny increase in loss, implying that our learning rate may soon be too large.
  • At 1e-1 we see a larger jump, again, indicating that the learning rate is too large.
  • And by the time we reach 1e+1, our loss has exploded (the learning rate is far too large).

Based on this plot, we should choose 1e-5 as our base learning rate and 1e-2 as our max learning rate — these values indicate a learning rate just small enough for our network to start to learn, along with a learning rate that this is large enough for our network to rapidly learn, but not so large that our loss explodes.

Be sure to refer back to Figure 2 if you need help analyzing Figure 3.

Training the entire network architecture

If you haven’t yet, go back to our

config.py
file and set
MIN_LR = 1e-5
  and
MAX_LR = 1e-2
, respectively:
# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-5
MAX_LR = 1e-2
BATCH_SIZE = 64
STEP_SIZE = 8
CLR_METHOD = "triangular"
NUM_EPOCHS = 48

From there, execute the following command:

$ python train.py
[INFO] loading Fashion MNIST data...
[INFO] compiling model...
[INFO] training network...
Epoch 1/48
937/937 [==============================] - 115s 122ms/step - loss: 1.2828 - acc: 0.5510 - val_loss: 0.6948 - val_acc: 0.7364
Epoch 2/48
937/937 [==============================] - 112s 120ms/step - loss: 0.5656 - acc: 0.7954 - val_loss: 0.4651 - val_acc: 0.8203
Epoch 3/48
937/937 [==============================] - 112s 119ms/step - loss: 0.4317 - acc: 0.8440 - val_loss: 0.4496 - val_acc: 0.8387

...
Epoch 46/48
937/937 [==============================] - 112s 119ms/step - loss: 0.0925 - acc: 0.9666 - val_loss: 0.1781 - val_acc: 0.9409
Epoch 47/48
937/937 [==============================] - 112s 119ms/step - loss: 0.0860 - acc: 0.9689 - val_loss: 0.1688 - val_acc: 0.9443
Epoch 48/48
937/937 [==============================] - 112s 119ms/step - loss: 0.0764 - acc: 0.9723 - val_loss: 0.1679 - val_acc: 0.9452
[INFO] evaluating network...
              precision    recall  f1-score   support

         top       0.91      0.90      0.90      1000
     trouser       1.00      0.99      0.99      1000
    pullover       0.93      0.92      0.93      1000
       dress       0.95      0.94      0.95      1000
        coat       0.92      0.93      0.92      1000
      sandal       0.99      0.99      0.99      1000
       shirt       0.83      0.84      0.83      1000
     sneaker       0.96      0.98      0.97      1000
         bag       0.99      0.99      0.99      1000
  ankle boot       0.98      0.96      0.97      1000

   micro avg       0.95      0.95      0.95     10000
   macro avg       0.95      0.95      0.95     10000
weighted avg       0.95      0.95      0.95     10000

Figure 4: Our one and only experiment’s training accuracy/loss curves are plotted while using the min/max learning rate determined via our Keras learning rate finder method.

After training completes we are obtaining 95% accuracy on the Fashion MNIST dataset.

Again, keep in mind that we only ran one experiment to tune the learning rates to our model and we’re over 95% accurate in our predictions!

Our training history is plotted in Figure 4. Notice the characteristic “waves” from a triangular Cyclical Learning Rate policy — our training and validation loss gently rides the waves up and down as our learning rate increases and decreases.

Speaking of the Cyclical Learning Rate policy, let’s look at how the learning rate changes during training:

Figure 5: A plot of our cyclical learning rate schedule using the “triangular” mode. The min/max learning rates were determined with a Keras learning rate finder.

Here we complete three full cycles of the triangular policy, starting at an initial learning rate of 1e-5, increasing to 1e-2, and then decreasing back to 1e-5 again.

Using a combination of Cyclical Learning Rates along with our automatic learning rate finder implementation, we are able to obtain a highly accurate model in only a single experiment!

You can use this combination of CLRs and automatic learning rate finder in your own projects to help you quickly and effectively tune your learning rates.

Where can I learn more?

Figure 6: My deep learning book is the go-to resource for deep learning students, developers, researchers, and hobbyists, alike. Use the book to build your skillset from the bottom up, or read it to gain a deeper understanding. Don’t be left in the dust as the fast paced AI revolution continues to accelerate.

Today’s tutorial introduced you to an automatic method to find learning rate parameters for Cyclical Learning Rates using Keras.

If you’re looking for more of my tips, suggestions, and best practices when training deep neural networks, be sure to refer to my book, Deep Learning for Computer Vision with Python.

Inside the book I cover:

  1. Deep learning fundamentals and theory without the unnecessary mathematical fluff. I present the basic equations and back them up with code walkthroughs that you can implement and easily understand.
  2. More details on learning rates (and how a solid understanding of the concept impacts your deep learning success).
  3. How to spot underfitting and overfitting on-the-fly and how to checkpoint your models with a custom callbacks.
  4. My tips/tricks, suggestions, and best practices for training CNNs.

Besides content on learning rates, you’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial you learned how to create an automatic learning rate finder using the Keras deep learning library.

This automatic learning rate finder algorithm follows the suggestions of Dr. Leslie Smith in their 2017 paper, Cyclical Learning Rates for Training Neural Networks (but the method itself wasn’t popularized until Jermey Howard suggested that fast.ai users utilize it).

The primary contribution of this guide is to provide a well-documented Keras learning rate finder, including an example of how to use it.

You can use this learning rate finder implementations when training your own neural networks using Keras, enabling you to:

  1. Bypass 10s to 100s of experiments tuning your learning rate.
  2. Obtain a high accuracy model with less effort.

Make sure you use this implementation when exploring learning rates with your own datasets and architectures.

I hope you enjoyed my final post in a series of tutorials on finding, tuning, and scheduling learning rates with Keras!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Keras Learning Rate Finder appeared first on PyImageSearch.

Simple Scene Boundary/Shot Transition Detection with OpenCV

$
0
0

In this tutorial, you will learn how to implement a simple scene boundary/shot transition detector with OpenCV.

Two weeks ago I flew out to San Diego, CA for a vacation with my Dad.

We were on the first flight out of Philadelphia and landed in San Diego at 10:30 AM, but unfortunately, our hotel rooms weren’t ready yet so we couldn’t check-in.

Both of us were a bit tired from waking up early, and not to mention, the six-hour flight, so we decided to hang out in the hotel lounge and relax until our rooms were ready.

I settled into a cozy hotel lounge chair, opened my iPhone, and started scrolling through notifications I missed while flying. A text message from my buddy Justin caught my eye:

Dude, I picked up issue #7 of The Batman Who Laughs last night. It’s SO GOOD. You’re going to love it. Let me know when you’ve read it so we can talk about it.

I’m a bit of a comic book nerd and the DC’s latest series, The Batman Who Laughs, is hands down my favorite series of the year — and according to Justin, the final issue in the story arc had just been released!

I opened Google Maps to see if there was a local comic book shop where I could pick up a copy.

No dice.

The closest store was two miles away — I wasn’t going to trek that far and leave my Dad at the hotel.

I’m not the biggest fan of reading comics on a screen, but in this case, I decided to make an exception.

I opened up the comiXology app on my iPhone (an app that lets you purchase and download digital comics), found the latest issue of The Batman Who Laughs, paid my $5, and downloaded it to my iPhone.

Now, you might be thinking that it would be a terribly painful experience to read a comic on a
digital screen, especially a screen as small as an iPhone.

How in the world would you handle pinching, zooming, and scrolling on such a small screen? Wouldn’t that be a dreadful user experience, one that would potentially ruin reading a comic?

Trust me, it used to be.

But comic book publishers have wised up.

Instead of forcing you to use the equivalent of a mobile PDF viewer to read digital comics, publishers such as DC, Marvel, comiXology, etc. have locked up some poor intern in a dark dingy basement (hopefully kidding), and forced them to annotate the location of each panel in a comic.

Now, instead of having to manually scroll to the next panel in a comic, all you need to do is tap either the left or ride side of your phone screen and then the app automatically scrolls/zooms for you!

It’s a pretty neat feature, and while I will always prefer having the physical comic in my hands, the automatic scroll and zoom is a real game-changer for reading digital comics.

After I finished reading The Batman Who Laughs #7 (which was absolutely AWESOME, by the way), I got to thinking…

…what if I could use computer vision to automatically extract each panel from a digital comic?

The general algorithm would work like this:

  1. Record my iPhone screen as I’m reading the comic in the comiXology app.
  2. Post-process the video by using OpenCV to detect when the comic app is finished zooming, scrolling, etc.
  3. Save the current comic book panel to disk.
  4. Repeat for the entire length of the video.

The end result would be a directory containing each individual panel of the comic book!

You might think that such an algorithm would be challenging and tedious to implement — but it’s actually quite easy once you realize that it’s just an application of scene boundary detection!

Today I’ll be showing you how to implement the exact algorithm detailed above (and in only 100 lines of code).

To learn how to perform scene boundary detection with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Simple Scene Boundary/Shot Transition Detection with OpenCV

In the first part of this tutorial, we’ll discuss scene boundary and shot transition detection, including how computer vision algorithms can be be used to automatically segment clips from video files.

From there, we’ll look at how scene boundary detection can be applied to digital comic books, essentially creating an algorithm that can automatically extract comic book panels from a video.

Finally, we’ll implement the actual algorithm and review the results.

What are “scene boundaries” and “shot transitions”?

Figure 1: A boundary scene transition from a TV series trailer, HBO’s Six Feet Under (video credit). We will learn to extract boundary scene transitions with OpenCV.

A “scene boundary” or a “shot transition” in a movie, TV show, or video is a natural way for the producers and editors to indicate that the current scene is complete and the next scene is starting. Shot transitions, when done correctly, are nonintrusive to the person watching the video — we intuitively process that the current “chapter” of the story is over and the next chapter is starting.

The most common type of scene boundary is a “fade to black”, similar to Figure 1 above. Notice how the as the current scene ends the video fades to black, then fades back in, indicating that the next scene is starting.

Using computer vision, we seek to automatically find these scene boundaries, enabling us to create a “smart video segmentation” system.

Such a video segmentation system could be used to automatically:

  • Extract scenes from a movie/TV show, saving each scene/clip in a separate video file.
  • Segment commercials from a given TV station for advertising research.
  • Summarize slower moving sports games, such as baseball, golf, and American football.

Scene boundary detection is an active area of research and one that has existed for years.

I encourage you to use Google Scholar to search for the phrase “scene boundary detection” if you are interested in reading some of the publications.

Applying the scene boundary detection algorithm to digital comic books

Figure 2: Using a motion detection based OpenCV method, we can extract boundary scenes from videos in less than 100 lines of Python code.

In the context of this tutorial, we’ll be applying scene boundary detection through a real-world application — automatically extracting frames/panels from a digital comic book.

You might be thinking:

But Adrian, digital comic books are images, not video! How are you going to apply scene boundary detection to an image?

You’re right, comics are images — but part of being a computer vision practitioner is learning how to look at problems differently.

Using my iPhone, I can:

  • Start recording my screen
  • Open up the comiXology app
  • Open a specific comic in the app
  • Start reading the comic
  • Tap my screen when I want to advance to the next panel
  • Stop the video recording when I’m done reading the comic

Figure 2 at the top of this section demonstrates how I’ve turned a digital comic book into a video file. Notice how the app animates the pinching, zooming, and scrolling. After the app has finished “moving” the comic, the frame settles out, and I’m left with the current panel.

The trick to extracting comic book panels from this video is to detect when the moving stops, like in the following figure:

Figure 3: Detecting when motion stops is the basis of our system to extract scene boundaries from a comic book using OpenCV and Python.

To accomplish this task, all we need is a basic scene boundary detection algorithm.

Project structure

Let’s review our project structure:

$ tree --dirsfirst
.
├── output
│   ├── 0.png
│   ├── 1.png
│   ├── ...
│   ├── 15.png
├── batman_who_laughs_7.mp4
└── detect_scene.py

1 directory, 18 files

Our project is quite simple.

We have a single Python script,

detect_scene.py
 , which reads an input video (such as
batman_who_laughs_7.mp4
  or one of your own videos). The script then runs our boundary scene detection method to extract frames from the video. Each of the frames are exported to the
output/
  directory.

Implementing our scene boundary detector with OpenCV

Let’s go ahead and implement our basic scene boundary detector which we’ll later use to extract panels from comic books.

This algorithm is based on background subtraction/motion detection — if our “scene” in the video does not have any motion for a given amount of time, then we know the comic book app has finished scrolling/zooming us to the panel, in which case we can capture the current panel and save it to disk.

Are you ready to implement our scene boundary detector?

Open up the

detect_scene.py
file and insert the following code:
# import the necessary packages
import argparse
import imutils
import cv2
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video", required=True, type=str,
	help="path to input video file")
ap.add_argument("-o", "--output", required=True, type=str,
	help="path to output directory to store frames")
ap.add_argument("-p", "--min-percent", type=float, default=1.0,
	help="lower boundary of percentage of motion")
ap.add_argument("-m", "--max-percent", type=float, default=10.0,
	help="upper boundary of percentage of motion")
ap.add_argument("-w", "--warmup", type=int, default=200,
	help="# of frames to use to build a reasonable background model")
args = vars(ap.parse_args())

Lines 2-5 import necessary packages. You need OpenCV and

imutils
  installed for this project. I recommend that you install OpenCV in a virtual environment using pip.

From there, Lines 8-19 parse our command line arguments:

  • --video
     : The path to the input video file.
  • --output
     : The path to the output directory to store comic book panel images.
  • --min-percent
     : Default lower boundary of percentage of frame motion.
  • --max-percent
     : Default upper boundary of percentage of frame motion.
  • --warmup
     : Default number of frames to build our background model.

Let’s go ahead and initialize our background subtractor along with other important variables:

# initialize the background subtractor
fgbg = cv2.bgsegm.createBackgroundSubtractorGMG()

# initialize a boolean used to represent whether or not a given frame
# has been captured along with two integer counters -- one to count
# the total number of frames that have been captured and another to
# count the total number of frames processed
captured = False
total = 0
frames = 0

# open a pointer to the video file initialize the width and height of
# the frame
vs = cv2.VideoCapture(args["video"])
(W, H) = (None, None)

Line 22 initializes our background subtractor model. We will apply it to every frame in our

while
  loop in the next code block.

Lines 28-30 then initialize three housekeeping variables. The

captured
  boolean indicates whether a frame has been captured. Two counters are initialized to
0
 :
  • total
      indicates how many frames we have captured
  • frames
      indicates how many frames from our video we have processed

Line 34 initializes our video stream using the input video file specified via command line argument in your terminal. The frame dimensions are set to

None
  for now.

Let’s begin looping over video frames:

# loop over the frames of the video
while True:
	# grab a frame from the video
	(grabbed, frame) = vs.read()

	# if the frame is None, then we have reached the end of the
	# video file
	if frame is None:
		break

	# clone the original frame (so we can save it later), resize the
	# frame, and then apply the background subtractor
	orig = frame.copy()
	frame = imutils.resize(frame, width=600)
	mask = fgbg.apply(frame)

	# apply a series of erosions and dilations to eliminate noise
	mask = cv2.erode(mask, None, iterations=2)
	mask = cv2.dilate(mask, None, iterations=2)

	# if the width and height are empty, grab the spatial dimensions
	if W is None or H is None:
		(H, W) = mask.shape[:2]

	# compute the percentage of the mask that is "foreground"
	p = (cv2.countNonZero(mask) / float(W * H)) * 100

Line 40 grabs the next

frame
  from the video file.

Subsequently, Line 49 makes a copy (so we can save the original frame to disk later) and Line 50 resizes it. The smaller the frame is, the faster our algorithm will run.

Line 51 applies background subtractionyielding our

mask
 . White pixels in the
mask
  are our foreground while the black pixels represent the background.

Liens 54 and 55 apply a series of morphological operations to eliminate noise.

Line 62 computes the percentage of the

mask
  that is “foreground” versus “background”. Next, we’ll analyze the percentage,
p
 , to determine if motion has stopped:
# if there is less than N% of the frame as "foreground" then we
	# know that the motion has stopped and thus we should grab the
	# frame
	if p < args["min_percent"] and not captured and frames > args["warmup"]:
		# show the captured frame and update the captured bookkeeping
		# variable
		cv2.imshow("Captured", frame)
		captured = True

		# construct the path to the output frame and increment the
		# total frame counter
		filename = "{}.png".format(total)
		path = os.path.sep.join([args["output"], filename])
		total += 1

		# save the  *original, high resolution* frame to disk
		print("[INFO] saving {}".format(path))
		cv2.imwrite(path, orig)

	# otherwise, either the scene is changing or we're still in warmup
	# mode so let's wait until the scene has settled or we're finished
	# building the background model
	elif captured and p >= args["max_percent"]:
		captured = False

Line 67 compares the foreground pixel percentage,

p
 , to the
"min_percent"
  constant. If (1)
p
  indicates that less than N% of the frame has motion, (2) we have
not captured
  this frame, and (3) we are done warming up, then we’ll save this comic scene to disk!

Assuming we are saving this frame, we:

  • Display the
    frame
      in the
    "Captured"
      window (Line 70) and mark it as
    captured
      (Line 71).
  • Build our
    filename
      and path (Lines 75-76).
  • Increment the 
    total
      number of panels written to disk (Line 77).
  • Write the
    orig
      frame to disk (Line 81).

Otherwise, we mark

captured
  as
False
  (Lines 86 and 87), indicating that the above
if
  statement did not pass and the frame was not written to disk.

To wrap up, we’ll display the

frame
  and
mask
  until we are done processing all
frames
 :
# display the frame and detect if there is a key press
	cv2.imshow("Frame", frame)
	cv2.imshow("Mask", mask)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# increment the frames counter
	frames += 1

# do a bit of cleanup
vs.release()

The

frame
  and
mask
  are displayed until either the
q
  key is pressed or there are no more frames left in the video process.

In the next section, we’ll analyze our results.

Scene boundary detection results

Now that we’ve implemented our scene boundary detector, let’s give it a try.

Make sure you’ve used the “Downloads” section of this tutorial to download the source code and example video for this guide.

From there, open up a terminal and execute the following command:

$ python detect_scene.py --video batman_who_laughs_7.mp4 --output output
[INFO] saving 0.png
[INFO] saving 1.png
[INFO] saving 2.png
[INFO] saving 3.png
[INFO] saving 4.png
[INFO] saving 5.png
[INFO] saving 6.png
[INFO] saving 7.png
[INFO] saving 8.png
[INFO] saving 9.png
[INFO] saving 10.png
[INFO] saving 11.png
[INFO] saving 12.png
[INFO] saving 13.png
[INFO] saving 14.png
[INFO] saving 15.png

Figure 4: Our Python + OpenCV scene boundary/shot transition detection algorithm is based on a background detection method to determine when motion has stopped. When the motion stops, the panel is captured and saved to disk.

Figure 4 shows our comic book panel extractor in action.

Our algorithm is able to detect when the app is automatically “moving” the page of the comic by zooming, scrolling, etc. — when this movement stops, we consider it the scene boundaryIn the context of our end goal, this scene boundary marks when we have arrived at the next panel of the comic.

We then save this panel to disk and then continue to monitor the video file for when the next movement occurs, indicating that we’re moving to the next panel in the comic.

If you check the contents of the

output/
directory after processing the video you’ll see that we’ve successfully extracted each panel from the comic:

Figure 5: Each comic panel frame is exported to disk as an image file in the output/ directory as shown. This scene boundary detection system was built with OpenCV and Python.

I’ve included a full video of the demo, including my commentary, below:

As I mentioned earlier in this post, being a successful computer vision practitioner often involves looking at problems differently — sometimes you can repurpose video processing algorithms and apply them to images, simply by figuring out how to take images and capture them as a video instead.

In this post, we were able to apply scene boundary detection to extract panels from a comic book, simply by recording ourselves reading a comic via the comiXology app!

Sometimes all you need is a slightly different viewpoint to solve a potentially challenging problem.

Credits

  • Music: “Sci-Fi” — Benjamin Tissot
  • Comic: The Batman Who Laughs #7 — DC Comics (Written by: Scott Snyder, Art by: Jock)
    • Note: I have only used the first few frames of the comic in the example video. I have not included the entire comic as that would be quite the severe copyright violation! Again, this demo is for educational purposes only.

What’s next?

Computer vision, machine learning, and deep learning are all the rage right now.

But to become a successful, well-rounded computer vision practitioner, you must bring the right tools to the job.

You wouldn’t try to bang in a screw with a hammer, you would simply use a screwdriver instead. Similarly, you wouldn’t use a crowbar to cut a piece of wire — you would use pliers.

The same concept is true with computer vision — you must bring the right tools to the job.

In order to help build your toolbox of computer vision algorithms and methodologies, I have put together the PyImageSearch Gurus course.

Inside the course you’ll learn:

  • Machine learning and image classification
  • Automatic License/Number Plate Recognition (ANPR)
  • Face recognition
  • How to train your own custom object detectors
  • Content-based Image Retrieval (i.e., image search engines)
  • Processing image datasets with Hadoop and MapReduce
  • Hand gesture recognition
  • Deep learning fundamentals
  • …and much more!

PyImageSearch Gurus is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons, with other 2,161 pages of content. You won’t find a more detailed computer vision course anywhere else online, I guarantee it.

The PyImageSearch Gurus course also includes private community forums. I participate in the Gurus forum virtually every day. The community is a great way to get expert advice, both from me and from the other advanced students, on a daily basis.

To learn more about the PyImageSearch Gurus course + community (and grab 10 FREE sample lessons), just click the button below:

Click here to learn more about PyImageSearch Gurus!

Summary

In this tutorial, you learned how to implement a simple scene boundary detection algorithm using OpenCV.

We specifically applied this algorithm to digital comic books, enabling us to automatically extract each individual panel of a comic book.

You can take this algorithm and apply it to your own video files as well.

If you are interested in learning more about scene boundary detection algorithms, use the comment form at the bottom of this post to let me know — I may decide to cover these algorithms in more detail in the future!

I hope you enjoyed the tutorial!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Simple Scene Boundary/Shot Transition Detection with OpenCV appeared first on PyImageSearch.


Building an Image Hashing Search Engine with VP-Trees and OpenCV

$
0
0

In this tutorial, you will learn how to build a scalable image hashing search engine using OpenCV, Python, and VP-Trees.

Image hashing algorithms are used to:

  1. Uniquely quantify the contents of an image using only a single integer.
  2. Find duplicate or near-duplicate images in a dataset of images based on their computed hashes.

Back in 2017, I wrote a tutorial on image hashing with OpenCV and Python (which is required reading for this tutorial). That guide showed you how to find identical/duplicate images in a given dataset.

However, there was a scalability problem with that original tutorial — namely that it did not scale!

To find near-duplicate images, our original image hashing method would require us to perform a linear search, comparing the query hash to each individual image hash in our dataset.

In a practical, real-world application that’s far too slow — we need to find a way to reduce that search to sub-linear time complexity.

But how can we reduce search time so dramatically?

The answer is a specialized data structure called a VP-Tree.

Using a VP-Tree we can reduce our search complexity from O(n) to O(log n), enabling us to obtain our sub-linear goal!

In the remainder of this tutorial you will learn how to:

  1. Build an image hashing search engine to find both identical and near-identical images in a dataset.
  2. Utilize a specialized data structure, called a VP-Tree, that can be used used to scale image hashing search engines to millions of images.

To learn how to build your first image hashing search engine with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Building an Image Hashing Search Engine with VP-Trees and OpenCV

In the first part of this tutorial, I’ll review what exactly an image search engine is for newcomers to PyImageSearch.

Then, we’ll discuss the concept of image hashing and perceptual hashing, including how they can be used to build an image search engine.

We’ll also take a look at problems associated with image hashing search engines, including algorithmic complexity.

Note: If you haven’t read my tutorial on Image Hashing with OpenCV and Python, make sure you do so now. That guide is required reading before you continue here.

From there, we’ll briefly review Vantage-point Trees (VP-Trees) which can be used to dramatically improve the efficiency and performance of image hashing search engines.

Armed with our knowledge we’ll implement our own custom image hashing search engine using VP-Trees and then examine the results.

What is an image search engine?

Figure 1: An example of an image search engine. A query image is presented and the search engine finds similar images in a dataset.

In this section, we’ll review the concept of an image search engine and direct you to some additional resources.

PyImageSearch has roots in image search engines — that was my main interest when I started the blog back in 2014. This tutorial is a fun one for me to share as I have a soft spot for image search engines as a computer vision topic.

Image search engines are a lot like textual search engines, only instead of using text as a query, we instead use an image.

When you use a text search engines, such as Google, Bing, or DuckDuckGo, etc., you enter your search query — a word or phrase. Indexed websites of interest are returned to you as results, and ideally, you’ll find what you are looking for.

Similarly, for an image search engine, you present a query image (not a textual word/phrase). The image search engine then returns similar image results based solely on the contents of the image.

Of course, there is a lot that goes on under the hood in any type of search engine — just keep this key concept of query/results in mind going forward as we build an image search engine today.

To learn more about image search engines, I suggest you refer to the following resources:

Read those guides to obtain a basic understanding of what an image search engine is, then come back to this post to learn about image hash search engines.

What is image hashing/perceptual hashing?

Figure 2: An example of an image hashing function. Top-left: An input image. Top-right: An image hashing function. Bottom: The resulting hash value. We will build a basic image hashing search engine with VP-Trees and OpenCV in this tutorial.

Image hashing, also called perceptual hashing, is the process of:

  1. Examining the contents of an image.
  2. Constructing a hash value (i.e., an integer) that uniquely quantifies an input image based on the contents of the image alone.

One of the benefits of using image hashing is that the resulting storage used to quantify the image is super small.

For example, let’s suppose we have an 800x600px image with 3 channels. If we were to store that entire image in memory using an 8-bit unsigned integer data type, the image would require 1.44MB of RAM.

Of course, we would rarely, if ever, store the raw image pixels when quantifying an image.

Instead, we would use algorithms such as keypoint detectors and local invariant descriptors (i.e., SIFT, SURF, etc.).

Applying these methods can typically lead to 100s to 1000s of features per image.

If we assume a modest 500 keypoints detected, each resulting in a feature vector of 128-d with a 32-bit floating point data type, we would require a total of 0.256MB to store the quantification of each individual image in our dataset.

Image hashing, on the other hand, allows us to quantify an image using only a 32-bit integer, requiring only 4 bytes of memory!

Figure 3: An image hash requires far less disk space in comparison to the original image bitmap size or image features (SIFT, etc.). We will use image hashes as a basis for an image search engine with VP-Trees and OpenCV.

Furthermore, image hashes should also be comparable.

Let’s suppose we compute image hashes for three input images, two of which near-identical images:

Figure 4: Three images with different hashes. The Hamming Distance between the top two hashes is closer than the Hamming distance to the third image. We will use a VP-Tree data structure to make an image hashing search engine.

To compare our image hashes we will use the Hamming distance. The Hamming distance, in this context, is used to compare the number of different bits between two integers.

In practice, this means that we count the number of 1s when taking the XOR between two integers.

Therefore, going back to our three input images above, the Hamming distance between our two similar images should be smaller (indicating more similarity) than the Hamming distance between the third less similar image:

Figure 5: The Hamming Distance between image hashes is shown. Take note that the Hamming Distance between the first two images is smaller than that of the first and third (or 2nd and 3rd). The Hamming Distance between image hashes will play a role in our image search engine using VP-Trees and OpenCV.

Again, note how the Hamming distance between the two images is smaller than the distances between the third image:

  • The smaller the Hamming distance is between two hashes, the more similar the images are.
  • And conversely, the larger the Hamming distance is between two hashes, the less similar the images are.

Also note how the distance between identical images (i.e., along the diagonal of Figure 5) are all zero — the Hamming distance between two hashes will be zero if the two input images are identical, otherwise the distance will be > 0, with larger values indicating less similarity.

There are a number of image hashing algorithms, but one of the most popular ones is called the difference hash, which includes four steps:

  1. Step #1: Convert the input image to grayscale.
  2. Step #2: Resize the image to fixed dimensions, N + 1 x N, ignoring aspect ratio. Typically we set N=8 or N=16. We use N + 1 for the number of rows so that we can compute the difference (hence “difference hash”) between adjacent pixels in the image.
  3. Step #3: Compute the difference. If we set N=8 then we have 9 pixels per row and 8 pixels per column. We can then compute the difference between adjacent column pixels, yielding 8 differences. 8 rows of 8 differences (i.e., 8×8) results in 64 values.
  4. Step #4: Finally, we can build the hash. In practice all we actually need to perform is a “greater than” operation comparing the columns, yielding binary values. These 64 binary values are compacted into an integer, forming our final hash.

Typically, image hashing algorithms are used to find near-duplicate images in a large dataset.

I’ve covered image hashing in detail inside this tutorial so if the concept is new to you, I would suggest reading that guide before continuing here.

What is an image hashing search engine?

Figure 6: Image search engines consist of images, an indexer, and a searcher. We’ll index all of our images by computing and storing their hashes. We’ll build a VP-Tree of the hashes. The searcher will compute the hash of the query image and search the VP tree for similar images and return the closest matches. Using Python, OpenCV, and vptree, we can implement our image hashing search engine.

An image hashing search engine consists of two components:

  • Indexing: Taking an input dataset of images, computing the hashes, and storing them in a data structure to facilitate fast, efficient search.
  • Searching/Querying: Accepting an input query image from the user, computing the hash, and finding all near-identical images in our indexed dataset.

A great example of an image hashing search engine is TinEye, which is actually a reverse image search engine.

A reverse image search engine:

  1. Accepts an input image.
  2. Finds all near-duplicates of that image on the web, telling you the website/URL of where the near duplicate can be found.

Using this tutorial you will learn how to build your own TinEye!

What makes scaling image hashing search engines problematic?

One of the biggest issues with building an image hashing search engine is scalability — the more images you have, the longer it can take to perform the search.

For example, let’s suppose we have the following scenario:

  • We have a dataset of 1,000,000 images.
  • We have already computed image hashes for each of these 1,000,000 images.
  • A user comes along, presents us with an image, and then asks us to find all near-identical images in that dataset.

How might you go about performing that search?

Would you loop over all 1,000,000 image hashes, one by one, and compare them to the hash of the query image?

Unfortunately, that’s not going to work. Even if you assume that each Hamming distance comparison takes 0.00001 seconds, with a total 1,000,000 images, it would take you 10 seconds to complete the search — far too slow for any type of search engine.

Instead, to build an image hashing search engine that scales, you need to utilize specialized data structures.

What are VP-Trees and how can they help scale image hashing search engines?

Figure 7: We’ll use VP-Trees for our image hash search engine using Python and OpenCV. VP-Trees are based on a recursive algorithm that computes vantage points and medians until we reach child nodes containing an individual image hash. Child nodes that are closer together (i.e. smaller Hamming Distances in our case) are assumed to be more similar to each other. (image source)

In order to scale our image hashing search engine, we need to use a specialized data structure that:

  • Reduces our search from linear complexity,
    O(n)
  • …down to sub-linear complexity, ideally
    O(log n)
    .

To accomplish that task we can use Vantage-point Trees (VP-Trees). VP-Trees are a metric tree that operates in a metric space by selecting a given position in space (i.e., the “vantage point”) and then partitioning the data points into two sets:

  1. Points that are near the vantage point
  2. Points that are far from the vantage point

We then recursively apply this process, partitioning the points into smaller and smaller sets, thus creating a tree where neighbors in the tree have smaller distances.

To visualize the process of constructing a VP-Tree, consider the following figure:

Figure 8: A visual depiction of the process of building a VP-Tree (vantage point tree). We will use the vptree Python implementation by Richard Sjogren. (image source)

First, we select a point in space (denoted as the v in the center of the circle) — we call this point the vantage point. The vantage point is the point furthest from the parent vantage point in the tree.

We then compute the median, μ, for all points, X.

Once we have μ, we then divide X into two sets, S1 and S2:

  • All points with distance <= μ belong to S1.
  • All points with distance > μ belong to S2.

We then recursively apply this process, building a tree as we go, until we are left with a child node.

A child node contains only a single data point (in this case, one individual hash). Child nodes that are closer together in the tree thus have:

  1. Smaller distances between them.
  2. And therefore assumed to be more similar to each other than the rest of the data points in the tree.

After recursively applying the VP-Tree construction method, we end up with a data structure, that, as the name suggests, is a tree:

Figure 9: An example VP-Tree is depicted. We will use Python to build VP-Trees for use in an image hash search engine.

Notice how we recursively split subsets of our dataset into smaller and smaller subsets, until we eventually reach the child nodes.

VP-Trees take

O(n log n)
to build, but once we’ve constructed it, a search takes only
O(log n)
, thus reducing our search time to sub-linear complexity!

Later in this tutorial, you’ll learn to utilize VP-Trees with Python to build and scale our image hashing search engine.

Note: This section is meant to be a gentle introduction to VP-Trees. If you are interested in learning more about them, I would recommend (1) consulting a data structures textbook, (2) following this guide from Steve Hanov’s blog, or (3) reading this writeup from Ivan Chen.

The CALTECH-101 dataset

Figure 10: The CALTECH-101 dataset consists of 101 object categories. Our image hash search engine using VP-Trees, Python, and OpenCV will use the CALTECH-101 dataset for our practical example.

The dataset we’ll be working today is the CALTECH-101 dataset which consists of 9,144 total images across 101 categories (with 40 to 800 images per category).

The dataset is large enough to be interesting to explore from an introductory to image hashing perspective but still small enough that you can run the example Python scripts in this guide without having to wait hours and hours for your system to finish chewing on the images.

You can download the CALTECH 101 dataset from their official webpage or you can use the following

wget
command:
$ wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz
$ tar xvzf 101_ObjectCategories.tar.gz

Project structure

Let’s inspect our project structure:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── hashing.py
├── 101_ObjectCategories [9,144 images] 
├── queries
│   ├── accordion.jpg
│   ├── accordion_modified1.jpg
│   ├── accordion_modified2.jpg
│   ├── buddha.jpg
│   └── dalmation.jpg
├── index_images.py
└── search.py

The

pyimagesearch
  module contains
hashing.py
  which includes three hashing functions. We will review the functions in the “Implementing our hashing utilities” section below.

Our dataset is in the

101_ObjectCategories/
  folder (CALTECH-101) contains 101 sub-directories with our images. Be sure to read the previous section to learn how to download the dataset.

There are five query images in the

queries/
  directory. We will search for images with similar hashes to these images. The
accordion_modified1.jpg
  and
accordion_modiied2.jpg
  images will present unique challenges to our VP-Trees image hashing search engine.

The core of today’s project lies in two Python scripts:

index_images.py
  and
search.py
 :
  • Our indexer will calculate hashes for all 9,144 images and organize the hashes in a VP-Tree. This index will reside in two
    .pickle
      files: (1) a dictionary of all computed hashes, and (2) the VP-Tree.
  • The searcher will calculate the hash for a query image and search the VP-Tree for the closest images via Hamming Distance. The results will be returned to the user.

If that sounds like a lot, don’t worry! This tutorial will break everything down step-by-step.

Configuring your development environment

For this blog post, your development environment needs the following packages installed:

Luckily for us, everything is pip-installable. My recommendation for you is to follow the first OpenCV link to pip-install OpenCV in a virtual environment on your system. From there you’ll just pip-install everything else in the same environment.

It will look something like this:

# setup pip, virtualenv, and virtualenvwrapper (using the "pip install OpenCV" instructions)
$ workon <env_name>
$ pip install numpy
$ pip install opencv-contrib-python
$ pip install imutils
$ pip install vptree

Replace

<env_name>
  with the name of your virtual environment. The
workon
  command will only be available once you set up
virtualenv
  and
virtualenvwrapper
  following these instructions.

Implementing our image hashing utilities

Before we can build our image hashing search engine, we first need to implement a few helper utilities.

Open up the

hashing.py
file in the project structure and insert the following code:
# import the necessary packages
import numpy as np
import cv2

def dhash(image, hashSize=8):
	# convert the image to grayscale
	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

	# resize the grayscale image, adding a single column (width) so we
	# can compute the horizontal gradient
	resized = cv2.resize(gray, (hashSize + 1, hashSize))

	# compute the (relative) horizontal gradient between adjacent
	# column pixels
	diff = resized[:, 1:] > resized[:, :-1]

	# convert the difference image to a hash
	return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])

We begin by importing OpenCV and NumPy (Lines 2 and 3).

The first function we’ll look at,

dhash
, is used to compute the difference hash for a given input image. Recall from above that our
dhash
  requires four steps: (1) convert to grayscale, (2) resize, (3) compute the difference, and (4) build the hash. Let’s break it down a little further:
  1. Line 7 converts the image to grayscale.
  2. Line 11 resizes the image to N + 1 rows by N columns, ignoring the aspect ratio. This ensures that the resulting image hash will match similar photos regardless of their initial spatial dimensions.
  3. Line 15 computes the horizontal gradient difference between adjacent column pixels. Assuming
    hashSize=8
     , will be 8 rows of 8 differences (there are 9 rows allowing for 8 comparisons). We will thus have a 64-bit hash as 8×8=64.
  4. Line 18 converts the difference image to a hash.

For more details, refer to this blog post.

Next, let’s look at

convert_hash
function:
def convert_hash(h):
	# convert the hash to NumPy's 64-bit float and then back to
	# Python's built in int
	return int(np.array(h, dtype="float64"))

When I first wrote the code for this tutorial, I found that the VP-Tree implementation we’re using internally converts points to a NumPy 64-bit float. That would be okay; however, hashes need to be integers and if we convert them to 64-bit floats, they become an unhashable data type. To overcome the limitation of the VP-Tree implementation, I came up with the

convert_hash
hack:
  • We accept an input hash,
    h
    .
  • That hash is then converted to a NumPy 64-bit float.
  • And that NumPy float is then converted back to Python’s built-in integer data type.

This hack ensures that hashes are represented consistently throughout the hashing, indexing, and searching process.

We then have one final helper method,

hamming
, which is used to compute the Hamming distance between two integers:
def hamming(a, b):
	# compute and return the Hamming distance between the integers
	return bin(int(a) ^ int(b)).count("1")

The Hamming distance is simply a

count
  of the number of 1s when taking the XOR (
^
) between two integers (Line 27).

Implementing our image hash indexer

Before we can perform a search, we first need to:

  1. Loop over our input dataset of images.
  2. Compute difference hash for each image.
  3. Build a VP-Tree using the hashes.

Let’s start that process now.

Open up the

index_images.py
file and insert the following code:
# import the necessary packages
from pyimagesearch.hashing import convert_hash
from pyimagesearch.hashing import hamming
from pyimagesearch.hashing import dhash
from imutils import paths
import argparse
import pickle
import vptree
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, type=str,
	help="path to input directory of images")
ap.add_argument("-t", "--tree", required=True, type=str,
	help="path to output VP-Tree")
ap.add_argument("-a", "--hashes", required=True, type=str,
	help="path to output hashes dictionary")
args = vars(ap.parse_args())

Lines 2-9 import the packages, functions, and modules necessary for this script. In particular Lines 2-4 import our three hashing related functions:

convert_hash
 ,
hamming
 , and
dhash
 . Line 8 imports the
vptree
  implementation that we will be using.

Next, Lines 12-19 parse our command line arguments:

  • --images
     : The path to our images which we will be indexing.
  • --tree
     : The path to the output VP-tree
    .pickle
      file which will be serialized to disk.
  • --hashes
     : The path to the output hashes dictionary which will be stored in
    .pickle
      format.

Now let’s compute hashes for all images:

# grab the paths to the input images and initialize the dictionary
# of hashes
imagePaths = list(paths.list_images(args["images"]))
hashes = {}

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
	# load the input image
	print("[INFO] processing image {}/{}".format(i + 1,
		len(imagePaths)))
	image = cv2.imread(imagePath)

	# compute the hash for the image and convert it
	h = dhash(image)
	h = convert_hash(h)

	# update the hashes dictionary
	l = hashes.get(h, [])
	l.append(imagePath)
	hashes[h] = l

Lines 23 and 24 grab image paths and initialize our 

hashes
dictionary.

Line 27 then begins a loop over all the

imagePaths
 . Inside the loop, we:
  • Load the
    image
      (Line 31).
  • Compute and convert the hash,
    h
      (Lines 34 and 35).
  • Grab a list of all image paths,
    l
     , with the same hash (Line 38).
  • Add this
    imagePath
      to the list,
    l
      (Line 39).
  • Update our dictionary with the hash as the key and our list of image paths with the same corresponding hash as the value (Line 40).

From here, we build our VP-Tree:

# build the VP-Tree
print("[INFO] building VP-Tree...")
points = list(hashes.keys())
tree = vptree.VPTree(points, hamming)

To construct the VP-Tree, Lines 44 and 45 pass in (1) a list of data points (i.e., the hash integer values themselves), and (2) our distance function (the Hamming distance method) to the

VPTree
  constructor.

Internally, the VP-Tree computes the Hamming distances between all input

points
and then constructs the VP-Tree such that data points with smaller distances (i.e., more similar images) lie closer together in the tree space. Be sure to refer the “What are VP-Trees and how can they help scale image hashing search engines?” section and Figures 7, 8, and 9.

With our

hashes
  dictionary populated and VP-Tree constructed, we’ll now serialize them both to disk as
.pickle
  files:
# serialize the VP-Tree to disk
print("[INFO] serializing VP-Tree...")
f = open(args["tree"], "wb")
f.write(pickle.dumps(tree))
f.close()

# serialize the hashes to dictionary
print("[INFO] serializing hashes...")
f = open(args["hashes"], "wb")
f.write(pickle.dumps(hashes))
f.close()

Extracting image hashes and building the VP-Tree

Now that we’ve implemented our indexing script, let’s put it to work. Make sure you’ve:

  1. Downloaded the CALTECH-101 dataset using the instructions above.
  2. Used the “Downloads” section of this tutorial to download the source code and example query images.
  3. Extracted the .zip of the source code and changed directory to the project.

From there, open up a terminal and issue the following command:

$ time python index_images.py --images 101_ObjectCategories \
	--tree vptree.pickle --hashes hashes.pickle
[INFO] processing image 1/9144
[INFO] processing image 2/9144
[INFO] processing image 3/9144
[INFO] processing image 4/9144
[INFO] processing image 5/9144
...
[INFO] processing image 9140/9144
[INFO] processing image 9141/9144
[INFO] processing image 9142/9144
[INFO] processing image 9143/9144
[INFO] processing image 9144/9144
[INFO] building VP-Tree...
[INFO] serializing VP-Tree...
[INFO] serializing hashes...

real	0m10.947s
user	0m9.096s
sys		0m1.386s

As our output indicates, we were able to hash all 9,144 images in just over 10 seconds.

Checking project directory after running the script we’ll find two

.pickle
files:
$ ls -l *.pickle
-rw-r--r--  1 adrianrosebrock  796620 Aug 22 07:53 hashes.pickle
-rw-r--r--  1 adrianrosebrock  707926 Aug 22 07:53 vptree.pickle

The

hashes.pickle
(796.62KB) file contains our computed hashes, mapping the hash integer value to file paths with the same hash. The
vptree.pickle
(707.926KB) is our constructed VP-Tree.

We’ll be using this VP-Tree to perform queries and searches in the following section.

Implementing our image hash searching script

The second component of an image hashing search engine is the search script. The search script will:

  1. Accept an input query image.
  2. Compute the hash for the query image.
  3. Search the VP-Tree using the query hash to find all duplicate/near-duplicate images.

Let’s implement our image hash searcher now — open up the

search.py
file and insert the following code:
# import the necessary packages
from pyimagesearch.hashing import convert_hash
from pyimagesearch.hashing import dhash
import argparse
import pickle
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-t", "--tree", required=True, type=str,
	help="path to pre-constructed VP-Tree")
ap.add_argument("-a", "--hashes", required=True, type=str,
	help="path to hashes dictionary")
ap.add_argument("-q", "--query", required=True, type=str,
	help="path to input query image")
ap.add_argument("-d", "--distance", type=int, default=10,
	help="maximum hamming distance")
args = vars(ap.parse_args())

Lines 2-9 import the necessary components for our searching script. Notice that we need the

dhash
  and
convert_hash
  functions once again as we’ll have to compute the hash for our
--query
  image.

Lines 10-19 parse our command line arguments (the first three are required):

  • --tree
     : The path to our pre-constructed VP-Tree on disk.
  • --hashes
     : The path to our pre-computed hashes dictionary on disk.
  • --query
     : Our query image’s path.
  • --distance
     : The maximum hamming distance between hashes is set with a
    default
      of
    10
     . You may override it if you so choose.

It’s important to note that the larger the

--distance
  is, the more hashes the VP-Tree will compare, and thus the searcher will be slower. Try to keep your
--distance
  as small as possible without compromising the quality of your results.

Next, we’ll (1) load our VP-Tree + hashes dictionary, and (2) compute the hash for our

--query
  image:
# load the VP-Tree and hashes dictionary
print("[INFO] loading VP-Tree and hashes...")
tree = pickle.loads(open(args["tree"], "rb").read())
hashes = pickle.loads(open(args["hashes"], "rb").read())

# load the input query image
image = cv2.imread(args["query"])
cv2.imshow("Query", image)

# compute the hash for the query image, then convert it
queryHash = dhash(image)
queryHash = convert_hash(queryHash)

Lines 23 and 24 load the pre-computed index including the VP-Tree and

hashes
  dictionary.

From there, we load and display the

--query
  image (Lines 27 and 28).

We then take the query

image
  and compute the
queryHash
  (Lines 31 and 32).

At this point, it is time to perform a search using our VP-Tree:

# perform the search
print("[INFO] performing search...")
start = time.time()
results = tree.get_all_in_range(queryHash, args["distance"])
results = sorted(results)
end = time.time()
print("[INFO] search took {} seconds".format(end - start))

Lines 37 and 38 perform a search by querying the VP-Tree for hashes with the smallest Hamming distance relative to the

queryHash
 . The
results
  are
sorted
  so that “more similar” hashes are at the front of the
results
  list.

Both of these lines are sandwiched with timestamps for benchmarking purposes, the results of which are printed via Line 40.

Finally, we will loop over

results
  and display each of them:
# loop over the results
for (d, h) in results:
	# grab all image paths in our dataset with the same hash
	resultPaths = hashes.get(h, [])
	print("[INFO] {} total image(s) with d: {}, h: {}".format(
		len(resultPaths), d, h))

	# loop over the result paths
	for resultPath in resultPaths:
		# load the result image and display it to our screen
		result = cv2.imread(resultPath)
		cv2.imshow("Result", result)
		cv2.waitKey(0)

Line 43 begins a loop over the

results
 :
  • The
    resultPaths
      for the current hash,
    h
     , are grabbed from the hashes dictionary (Line 45).
  • Each
    result
      image is displayed as a key is pressed on the keyboard (Lines 50-54).

Image hashing search engine results

We are now ready to test our image search engine!

But before we do that, make sure you have:

  1. Downloaded the CALTECH-101 dataset using the instructions above.
  2. Used the “Downloads” section of this tutorial to download the source code and example query images.
  3. Extracted the .zip of the source code and changed directory to the project.
  4. Ran the
    index_images.py
    file to generate the
    hashes.pickle
    and
    vptree.pickle
    files.

After all the above steps are complete, open up a terminal and execute the following command:

python search.py --tree vptree.pickle --hashes hashes.pickle \
	--query queries/buddha.jpg
[INFO] loading VP-Tree and hashes...
[INFO] performing search...
[INFO] search took 0.015203237533569336 seconds
[INFO] 1 total image(s) with d: 0, h: 8.162938100012111e+18

Figure 11: Our Python + OpenCV image hashing search engine found a match in the VP-Tree in just 0.015 seconds!

On the left, you can see our input query image of our Buddha. On the right, you can see that we have found the duplicate image in our indexed dataset.

The search itself took only 0.015 seconds.

Additionally, note that the distance between the input query image and the hashed image in the dataset is zero, indicating that the two images are identical.

Let’s try again, this time with an image of a Dalmatian:

$ python search.py --tree vptree.pickle --hashes hashes.pickle \
	--query queries/dalmation.jpg 
[INFO] loading VP-Tree and hashes...
[INFO] performing search...
[INFO] search took 0.014827728271484375 seconds
[INFO] 1 total image(s) with d: 0, h: 6.445556196029652e+18

Figure 12: With a Hamming Distance of 0, the Dalmation query image yielded an identical image in our dataset. We built an OpenCV + Python image hash search engine with VP-Trees successfully.

Again, we see that our image hashing search engine has found the identical Dalmatian in our indexed dataset (we know the images are identical due to the Hamming distance of zero).

The next example is of an accordion:

$ python search.py --tree vptree.pickle --hashes hashes.pickle \
	--query queries/accordion.jpg 
[INFO] loading VP-Tree and hashes...
[INFO] performing search...
[INFO] search took 0.014187097549438477 seconds
[INFO] 1 total image(s) with d: 0, h: 3.380309217342405e+18

Figure 13: An example of providing a query image and finding the best resulting image with an image hash search engine created with Python and OpenCV.

We once again find our identical matched image in the indexed dataset.

We know our image hashing search engine is working great for identical images…

…but what about images that are slightly modified?

Will our hashing search engine still perform well?

Let’s give it a try:

$ python search.py --tree vptree.pickle --hashes hashes.pickle \
	--query queries/accordion_modified1.jpg 
[INFO] loading VP-Tree and hashes...
[INFO] performing search...
[INFO] search took 0.014217138290405273 seconds
[INFO] 1 total image(s) with d: 4, h: 3.380309217342405e+18

Figure 14: Our image hash search engine was able to find the matching image despite a modification (red square) to the query image.

Here I’ve added a small red square in the bottom left corner of the accordion query image. This addition will change the difference hash value!

However, if you take a look at the output result, you’ll see that we were still able to detect the near-duplicate image.

We were able to find the near-duplicate image by comparing the Hamming distance between the hashes. The difference in hash values is 4, indicating that 4 bits differ between the two hashes.

Next, let’s try a second query, this one much more modified than the first:

$ python search.py --tree vptree.pickle --hashes hashes.pickle \
	--query queries/accordion_modified2.jpg 
[INFO] loading VP-Tree and hashes...
[INFO] performing search...
[INFO] search took 0.013727903366088867 seconds
[INFO] 1 total image(s) with d: 9, h: 3.380309217342405e+18

Figure 15: On the left is the query image for our image hash search engine with VP-Trees. It has been modified with yellow and purple shapes as well as red text. The image hash search engine returns the correct resulting image (right) from an index of 9,144 in just 0.0137 seconds, proving the robustness of our search engine system.

Despite dramatically altering the query by adding in a large blue rectangle, a yellow circle, and text, we’re still able to find the near-duplicate image in our dataset in under 0.014 seconds!

Whenever you need to find duplicate or near-duplicate images in a dataset, definitely consider using image hashing and image searching algorithms — when used correctly, they can be extremely powerful!

Where can I learn more about image search engines?

Are you yearning to learn more about image search engines?

Perhaps you have a project where you need to implement an image search engine that scales to millions of images. Inside the PyImageSearch Gurus course you’ll learn how to build such a scalable image search engine from the ground up.

The PyImageSearch Gurus course and community is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons, with over 2,161 pages of content.

Inside the course, you’ll find lessons on:

  • Automatic License/Number Plate Recognition (ANPR)
  • Face recognition
  • Training your own custom object detector
  • Deep learning and Convolutional Neural Networks
  • Content-based Image Retrieval (CBIR)
  • …and much more!

You won’t find a more detailed computer vision course anywhere else online, I guarantee it.

Just take a look at what these course graduates have accomplished:

You can join these Gurus and other students with a passion to learn and become experts in their fields. Whether you are just getting started or you have a foundation to build upon, the Gurus course is for you.

To learn more about the PyImageSearch Gurus course (and grab the course syllabus PDF and 10 FREE sample lessons), just click the button below:

Send me the course syllabus and 10 free lessons!

Summary

In this tutorial, you learned how to build a basic image hashing search engine using OpenCV and Python.

To build an image hashing search engine that scaled we needed to utilize VP- Trees, a specialized metric tree data structure that recursively partitions a dataset of points such that nodes of the tree that are closer together are more similar than nodes that are farther away.

By using VP-Trees we were able to build an image hashing search engine capable of finding duplicate and near-duplicate images in a dataset in under 0.01 seconds.

Furthermore, we demonstrated that our combination of hashing algorithm and VP-Tree search was capable of finding matches in our dataset, even if our query image was modified, damaged, or altered!

If you are ever building a computer vision application that requires quickly finding duplicate or near-duplicate images in a large dataset, definitely give this method a try.

To download the source code to this post, and be notified when future posts are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Building an Image Hashing Search Engine with VP-Trees and OpenCV appeared first on PyImageSearch.

OpenCV – Stream video to web browser/HTML page

$
0
0

In this tutorial you will learn how to use OpenCV to stream video from a webcam to a web browser/HTML page using Flask and Python.

Ever have your car stolen?

Mine was stolen over the weekend. And let me tell you, I’m pissed.

I can’t share too many details as it’s an active criminal investigation, but here’s what I can tell you:

My wife and I moved to Philadelphia, PA from Norwalk, CT about six months ago. I have a car, which I don’t drive often, but still keep just in case of emergencies.

Parking is hard to find in our neighborhood, so I was in need of a parking garage.

I heard about a garage, signed up, and started parking my car there.

Fast forward to this past Sunday.

My wife and I arrive at the parking garage to grab my car. We were about to head down to Maryland to visit my parents and have some blue crab (Maryland is famous for its crabs).

I walked to my car and took off the cover.

I was immediately confused — this isn’t my car.

Where the #$&@ is my car?

After a few short minutes I realized the reality —  my car was stolen.

Over the past week, my work on my upcoming Raspberry Pi for Computer Vision book was interrupted — I’ve been working with the owner of the the parking garage, the Philadelphia Police Department, and the GPS tracking service on my car to figure out what happened.

I can’t publicly go into any details until it’s resolved, but let me tell you, there’s a whole mess of paperwork, police reports, attorney letters, and insurance claims that I’m wading neck-deep through.

I’m hoping that this issue gets resolved in the next month — I hate distractions, especially distractions that take me away from what I love doing the most — teaching computer vision and deep learning.

I’ve managed to use my frustrations to inspire a new security-related computer vision blog post.

In this post, we’ll learn how to stream video to a web browser using Flask and OpenCV.

You will be able to deploy the system on a Raspberry Pi in less than 5 minutes:

  • Simply install the required packages/software and start the script.
  • Then open your computer/smartphone browser to navigate to the URL/IP address to watch the video feed (and ensure nothing of yours is stolen).

There’s nothing like a little video evidence to catch thieves.

While I continue to do paperwork with the police, insurance, etc, you can begin to arm yourself with Raspberry Pi cameras to catch bad guys wherever you live and work.

To learn how to use OpenCV and Flask to stream video to a web browser HTML page, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV – Stream video to web browser/HTML page

In this tutorial we will begin by discussing Flask, a micro web framework for the Python programming language.

We’ll learn the fundamentals of motion detection so that we can apply it to our project. We’ll proceed to implement motion detection by means of a background subtractor.

From there, we will combine Flask with OpenCV, enabling us to:

  1. Access frames from RPi camera module or USB webcam.
  2. Process the frames and apply an arbitrary algorithm (here we’ll be using background subtraction/motion detection, but you could apply image classification, object detection, etc.).
  3. Stream the results to a web page/web browser.

Additionally, the code we’ll be covering will be able to support multiple clients (i.e., more than one person/web browser/tab accessing the stream at once), something the vast majority of examples you will find online cannot handle.

Putting all these pieces together results in a home surveillance system capable of performing motion detection and then streaming the video result to your web browser.

Let’s get started!

The Flask web framework

Figure 1: Flask is a micro web framework for Python (image source).

In this section we’ll briefly discuss the Flask web framework and how to install it on your system.

Flask is a popular micro web framework written in the Python programming language.

Along with Django, Flask is one of the most common web frameworks you’ll see when building web applications using Python.

However, unlike Django, Flask is very lightweight, making it super easy to build basic web applications.

As we’ll see in this section, we’ll only need a small amount of code to facilitate live video streaming with Flask — the rest of the code either involves (1) OpenCV and accessing our video stream or (2) ensuring our code is thread safe and can handle multiple clients.

If you ever need to install Flask on a machine, it’s as simple as the following command:

$ pip install flask

While you’re at  it, go ahead and install NumPy, OpenCV, and imutils:

$ pip install numpy
$ pip install opencv-contrib-python
$ pip install imutils

Note: If you’d like the full-install of OpenCV including “non-free” (patented) algorithms, be sure to compile OpenCV from source.

Project structure

Before we move on, let’s take a look at our directory structure for the project:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── motion_detection
│   │   ├── __init__.py
│   │   └── singlemotiondetector.py
│   └── __init__.py
├── templates
│   └── index.html
└── webstreaming.py

3 directories, 5 files

To perform background subtraction and motion detection we’ll be implementing a class named

SingleMotionDetector
— this class will live inside the
singlemotiondetector.py
file found in the
motion_detection
submodule of
pyimagesearch
.

The

webstreaming.py
file will use OpenCV to access our web camera, perform motion detection via
SingleMotionDetector
, and then serve the output frames to our web browser via the Flask web framework.

In order for our web browser to have something to display, we need to populate the contents of

index.html
with HTML used to serve the video feed. We’ll only need to insert some basic HTML markup — Flask will handle actually sending the video stream to our browser for us.

Implementing a basic motion detector

Figure 2: Video surveillance with Raspberry Pi, OpenCV, Flask and web streaming. By use of background subtraction for motion detection, we have detected motion where I am moving in my chair.

Our motion detector algorithm will detect motion by form of background subtraction.

Most background subtraction algorithms work by:

  1. Accumulating the weighted average of the previous N frames
  2. Taking the current frame and subtracting it from the weighted average of frames
  3. Thresholding the output of the subtraction to highlight the regions with substantial differences in pixel values (“white” for foreground and “black” for foreground)
  4. Applying basic image processing techniques such as erosions and dilations to remove noise
  5. Utilizing contour detection to extract the regions containing motion

Our motion detection implementation will live inside the

SingleMotionDetector
class which can be found in
singlemotiondetector.py
.

We call this a “single motion detector” as the algorithm itself is only interested in finding the single, largest region of motion.

We can easily extend this method to handle multiple regions of motion as well.

Let’s go ahead and implement the motion detector.

Open up the

singlemotiondetector.py
file and insert the following code:
# import the necessary packages
import numpy as np
import imutils
import cv2

class SingleMotionDetector:
	def __init__(self, accumWeight=0.5):
		# store the accumulated weight factor
		self.accumWeight = accumWeight

		# initialize the background model
		self.bg = None

Lines 2-4 handle our required imports.

All of these are fairly standard, including NumPy for numerical processing,

imutils
for our convenience functions, and
cv2
for our OpenCV bindings.

We then define our

SingleMotionDetector
class on Line 6. The class accepts an optional argument,
accumWeight
, which is the factor used to our accumulated weighted average.

The larger

accumWeight
is, the less the background (
bg
) will be factored in when accumulating the weighted average.

Conversely, the smaller

accumWeight
is, the more the background
bg
will be considered when computing the average.

Setting

accumWeight=0.5
weights both the background and foreground evenly — I often recommend this as a starting point value (you can then adjust it based on your own experiments).

Next, let’s define the

update
method which will accept an input frame and compute the weighted average:
def update(self, image):
		# if the background model is None, initialize it
		if self.bg is None:
			self.bg = image.copy().astype("float")
			return

		# update the background model by accumulating the weighted
		# average
		cv2.accumulateWeighted(image, self.bg, self.accumWeight)

In the case that our

bg
frame is
None
(implying that
update
has never been called), we simply store the
bg
frame (Lines 15-18).

Otherwise, we compute the weighted average between the input

frame
, the existing background
bg
, and our corresponding
accumWeight
factor.

Given our background

bg
we can now apply motion detection via the
detect
method:
def detect(self, image, tVal=25):
		# compute the absolute difference between the background model
		# and the image passed in, then threshold the delta image
		delta = cv2.absdiff(self.bg.astype("uint8"), image)
		thresh = cv2.threshold(delta, tVal, 255, cv2.THRESH_BINARY)[1]

		# perform a series of erosions and dilations to remove small
		# blobs
		thresh = cv2.erode(thresh, None, iterations=2)
		thresh = cv2.dilate(thresh, None, iterations=2)

The

detect
method requires a single parameter along with an optional one:
  • image
    : The input frame/image that motion detection will be applied to.
  • tVal
    : The threshold value used to mark a particular pixel as “motion” or not.

Given our input

image
we compute the absolute difference between the
image
and the
bg
(Line 27).

Any pixel locations that have a difference

> tVal
are set to 255 (white; foreground), otherwise they are set to 0 (black; background) (Line 28).

A series of erosions and dilations are performed to remove noise and small, localized areas of motion that would otherwise be considered false-positives (likely due to reflections or rapid changes in light).

The next step is to apply contour detection to extract any motion regions:

# find contours in the thresholded image and initialize the
		# minimum and maximum bounding box regions for motion
		cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
			cv2.CHAIN_APPROX_SIMPLE)
		cnts = imutils.grab_contours(cnts)
		(minX, minY) = (np.inf, np.inf)
		(maxX, maxY) = (-np.inf, -np.inf)

Lines 37-39 perform contour detection on our

thresh
image.

We then initialize two sets of bookkeeping variables to keep track of the location where any motion is contained (Lines 40 and 41). These variables will form the “bounding box” which will tell us the location of where the motion is taking place.

The final step is to populate these variables (provided motion exists in the frame, of course):

# if no contours were found, return None
		if len(cnts) == 0:
			return None

		# otherwise, loop over the contours
		for c in cnts:
			# compute the bounding box of the contour and use it to
			# update the minimum and maximum bounding box regions
			(x, y, w, h) = cv2.boundingRect(c)
			(minX, minY) = (min(minX, x), min(minY, y))
			(maxX, maxY) = (max(maxX, x + w), max(maxY, y + h))

		# otherwise, return a tuple of the thresholded image along
		# with bounding box
		return (thresh, (minX, minY, maxX, maxY))

On Lines 43-45 we make a check to see if our contours list is empty.

If that’s the case, then there was no motion found in the frame and we can safely ignore it.

Otherwise, motion does exist in the frame so we need to start looping over the contours (Line 48).

For each contour we compute the bounding box and then update our bookkeeping variables (Lines 47-53), finding the minimum and maximum (x, y)-coordinates that all motion has taken place it.

Finally, we return the bounding box location to the calling function.

Combining OpenCV with Flask

Figure 3: OpenCV and Flask (a Python micro web framework) make the perfect pair for web streaming and video surveillance projects involving the Raspberry Pi and similar hardware.

Let’s go ahead and combine OpenCV with Flask to serve up frames from a video stream (running on a Raspberry Pi) to a web browser.

Open up the

webstreaming.py
file in your project structure and insert the following code:
# import the necessary packages
from pyimagesearch.motion_detection import SingleMotionDetector
from imutils.video import VideoStream
from flask import Response
from flask import Flask
from flask import render_template
import threading
import argparse
import datetime
import imutils
import time
import cv2

Lines 2-12 handle our required imports:

  • Line 2 imports our
    SingleMotionDetector
    class which we implemented above.
  • The
    VideoStream
    class (Line 3) will enable use to access our Raspberry Pi camera module or USB webcam.
  • Lines 4-6 handle importing our required Flask packages — we’ll be using these packages to render our
    index.html
    template and serve it up to clients.
  • Line 7 imports the
    threading
    library to ensure we can support concurrency (i.e., multiple clients, web browsers, and tabs at the same time).

Let’s move on to performing a few initializations:

# initialize the output frame and a lock used to ensure thread-safe
# exchanges of the output frames (useful when multiple browsers/tabs
# are viewing the stream)
outputFrame = None
lock = threading.Lock()

# initialize a flask object
app = Flask(__name__)

# initialize the video stream and allow the camera sensor to
# warmup
#vs = VideoStream(usePiCamera=1).start()
vs = VideoStream(src=0).start()
time.sleep(2.0)

First, we initialize our

outputFrame
on Line 17 — this will be the frame (post-motion detection) that will be served to the clients.

We then create a

lock
on Line 18 which will be used to ensure thread-safe behavior when updating the
ouputFrame
(i.e., ensuring that one thread isn’t trying to read the frame as it is being updated).

Line 21 initialize our Flask

app
itself while Lines 25-27 access our video stream:
  • If you are using a USB webcam, you can leave the code as is.
  • However, if you are using a RPi camera module you should uncomment Line 25 and comment out Line 26.

The next function,

index
, will render our
index.html
template and serve up the output video stream:
@app.route("/")
def index():
	# return the rendered template
	return render_template("index.html")

This function is quite simplistic — all it’s doing is calling the Flask

render_template
on our HTML file.

We’ll be reviewing the

index.html
file in the next section so we’ll hold off on a further discussion on the file contents until then.

Our next function is responsible for:

  1. Looping over frames from our video stream
  2. Applying motion detection
  3. Drawing any results on the
    outputFrame

And furthermore, this function must perform all of these operations in a thread safe manner to ensure concurrency is supported.

Let’s take a look at this function now:

def detect_motion(frameCount):
	# grab global references to the video stream, output frame, and
	# lock variables
	global vs, outputFrame, lock

	# initialize the motion detector and the total number of frames
	# read thus far
	md = SingleMotionDetector(accumWeight=0.1)
	total = 0

Our

detection_motion
function accepts a single argument,
frameCount
, which is the minimum number of required frames to build our background
bg
in the
SingleMotionDetector
class:
  • If we don’t have at least
    frameCount
    frames, we’ll continue to compute the accumulated weighted average.
  • Once
    frameCount
    is reached, we’ll start performing background subtraction.

Line 37 grabs global references to three variables:

  • vs
    : Our instantiated
    VideoStream
    object
  • outputFrame
    : The output frame that will be served to clients
  • lock
    : The thread lock that we must obtain before updating
    outputFrame

Line 41 initializes our

SingleMotionDetector
class with a value of
accumWeight=0.1
, implying that the
bg
value will be weighted higher when computing the weighted average.

Line 42 then initializes the

total
number of frames read thus far — we’ll need to ensure a sufficient number of frames have been read to build our background model.

From there, we’ll be able to perform background subtraction.

With these initializations complete, we can now start looping over frames from the camera:

# loop over frames from the video stream
	while True:
		# read the next frame from the video stream, resize it,
		# convert the frame to grayscale, and blur it
		frame = vs.read()
		frame = imutils.resize(frame, width=400)
		gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
		gray = cv2.GaussianBlur(gray, (7, 7), 0)

		# grab the current timestamp and draw it on the frame
		timestamp = datetime.datetime.now()
		cv2.putText(frame, timestamp.strftime(
			"%A %d %B %Y %I:%M:%S%p"), (10, frame.shape[0] - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1)

Line 48 reads the next

frame
from our camera while Lines 49-51 perform preprocessing, including:
  • Resizing to have a width of 400px (the smaller our input frame is, the less data there is, and thus the faster our algorithms will run).
  • Converting to grayscale.
  • Gaussian blurring (to reduce noise).

We then grab the current timestamp and draw it on the

frame
(Lines 54-57).

With one final check, we can perform motion detection:

# if the total number of frames has reached a sufficient
		# number to construct a reasonable background model, then
		# continue to process the frame
		if total > frameCount:
			# detect motion in the image
			motion = md.detect(gray)

			# check to see if motion was found in the frame
			if motion is not None:
				# unpack the tuple and draw the box surrounding the
				# "motion area" on the output frame
				(thresh, (minX, minY, maxX, maxY)) = motion
				cv2.rectangle(frame, (minX, minY), (maxX, maxY),
					(0, 0, 255), 2)
		
		# update the background model and increment the total number
		# of frames read thus far
		md.update(gray)
		total += 1

		# acquire the lock, set the output frame, and release the
		# lock
		with lock:
			outputFrame = frame.copy()

On Line 62 we ensure that we have read at least

frameCount
frames to build our background subtraction model.

If so, we apply the

.detect
motion of our motion detector, which returns a single variable,
motion
.

If

motion
is
None
, then we know no motion has taken place in the current
frame
. Otherwise, if
motion
is not
None
(Line 67), then we need to draw the bounding box coordinates of the motion region on the
frame
.

Line 76 updates our motion detection background model while Line 77 increments the

total
number of frames read from the camera thus far.

Finally, Line 81 acquires the

lock
required to support thread concurrency while Line 82 sets the
outputFrame
.

We need to acquire the look to ensure the

outputFrame
variable is not accidentally being read by a client while we are trying to update it.

Our next function,

generate
 , is a Python generator used to encode our
outputFrame
as JPEG data — let’s take a look at it now:
def generate():
	# grab global references to the output frame and lock variables
	global outputFrame, lock

	# loop over frames from the output stream
	while True:
		# wait until the lock is acquired
		with lock:
			# check if the output frame is available, otherwise skip
			# the iteration of the loop
			if outputFrame is None:
				continue

			# encode the frame in JPEG format
			(flag, encodedImage) = cv2.imencode(".jpg", outputFrame)

			# ensure the frame was successfully encoded
			if not flag:
				continue

		# yield the output frame in the byte format
		yield(b'--frame\r\n' b'Content-Type: image/jpeg\r\n\r\n' + 
			bytearray(encodedImage) + b'\r\n')

Line 86 grabs global references to our

outputFrame
and
lock
, similar to the
detect_motion
function.

Then 

generate
starts an infinite loop on Line 89 that will continue until we kill the script.

Inside the loop, we:

  • First acquire the
    lock
    (Line 91).
  • Ensure the
    outputFrame
    is not empty (Line 94), which may happen if a frame is dropped from the camera sensor.
  • Encode the
    frame
    as a JPEG image on Line 98 — JPEG compression is performed here to reduce load on the network and ensure faster transmission of frames.
  • Check to see if the success
    flag
    has failed (Lines 101 and 102), implying that the JPEG compression failed and we should ignore the frame.
  • Finally, serve the encoded JPEG frame as a byte array that can be consumed by a web browser.

That was quite a lot of work in a short amount of code, so definitely make sure you review this function a few times to ensure you understand how it works.

The next function,

video_feed
calls our
generate
function:
@app.route("/video_feed")
def video_feed():
	# return the response generated along with the specific media
	# type (mime type)
	return Response(generate(),
		mimetype = "multipart/x-mixed-replace; boundary=frame")

Notice how this function as the

app.route
signature, just like the
index
function above.

The

app.route
signature tells Flask that this function is a URL endpoint and that data is being served from
http://your_ip_address/video_feed
.

The output of

video_feed
is the live motion detection output, encoded as a byte array via the
generate
function. Your web browser is smart enough to take this byte array and display it in your browser as a live feed.

Our final code block handles parsing command line arguments and launching the Flask app:

# check to see if this is the main thread of execution
if __name__ == '__main__':
	# construct the argument parser and parse command line arguments
	ap = argparse.ArgumentParser()
	ap.add_argument("-i", "--ip", type=str, required=True,
		help="ip address of the device")
	ap.add_argument("-o", "--port", type=int, required=True,
		help="ephemeral port number of the server (1024 to 65535)")
	ap.add_argument("-f", "--frame-count", type=int, default=32,
		help="# of frames used to construct the background model")
	args = vars(ap.parse_args())

	# start a thread that will perform motion detection
	t = threading.Thread(target=detect_motion, args=(
		args["frame_count"],))
	t.daemon = True
	t.start()

	# start the flask app
	app.run(host=args["ip"], port=args["port"], debug=True,
		threaded=True, use_reloader=False)

# release the video stream pointer
vs.stop()

Lines 118-125 handle parsing our command line arguments.

We need three arguments here, including:

  • --ip
    : The IP address of the system you are launching the
    webstream.py
      file from.
  • --port
    : The port number that the Flask app will run on (you’ll typically supply a value of
    8000
    for this parameter).
  • --frame-count
    : The number of frames used to accumulate and build the background model before motion detection is performed. By default, we use
    32
      frames to build the background model.

Lines 128-131 launch a thread that will be used to perform motion detection.

Using a thread ensures the

detect_motion
function can safely run in the background — it will be constantly running and updating our
outputFrame
so we can serve any motion detection results to our clients.

Finally, Lines 134 and 135 launches the Flask app itself.

The HTML page structure

As we saw in

webstreaming.py
, we are rendering an HTML template named
index.html
.

The template itself is populated by the Flask web framework and then served to the web browser.

Your web browser then takes the generated HTML and renders it to your screen.

Let’s inspect the contents of our

index.html
file:
<html>
  <head>
    <title>Pi Video Surveillance</title>
  </head>
  <body>
    <h1>Pi Video Surveillance</h1>
    <img src="{{ url_for('video_feed') }}">
  </body>
</html>

As we can see, this is super basic web page; however, pay close attention to Line 7 — notice how we are instructing Flask to dynamically render the URL of our

video_feed
route.

Since the

video_feed
function is responsible for serving up frames from our webcam, the
src
of the image will be automatically populated with our output frames.

Our web browser is then smart enough to properly render the webpage and serve up the live video stream.

Putting the pieces together

Now that we’ve coded up our project, let’s put it to the test.

Open up a terminal and execute the following command:

$ python webstreaming.py --ip 0.0.0.0 --port 8000
 * Serving Flask app "webstreaming" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
127.0.0.1 - - [26/Aug/2019 14:43:23] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [26/Aug/2019 14:43:23] "GET /video_feed HTTP/1.1" 200 -
127.0.0.1 - - [26/Aug/2019 14:43:24] "GET /favicon.ico HTTP/1.1" 404 -

As you can see in the video, I opened connections to the Flask/OpenCV server from multiple browsers, each with multiple tabs. I even pulled out my iPhone and opened a few connections from there. The server didn’t skip a beat and continued to serve up frames reliably with Flask and OpenCV.

Join the embedded computer vision and deep learning revolution!

I first started playing guitar twenty years ago when I was in middle school. I wasn’t very good at it and I gave it up only a couple years after. Looking back, I strongly believe the reason I didn’t stick with it was because I wasn’t learning in a practical, hands-on manner.

Instead, my music teacher kept trying to drill theory into my head — but as an eleven year old kid, I was just trying to figure out whether I even liked playing guitar, let alone if I wanted to study the theory behind music in general.

About a year and a half ago I decided to start taking guitar lessons again. This time, I took care to find a teacher who could blend theory and practice together, showing me how to play songs or riffs while at the same time learning a theoretical technique.

The result? My finger speed is now faster than ever, my rhythm is on point, and I can annoy my wife to no end rocking Sweet Child of Mine on my Les Paul.

My point is this — whenever you are learning a new skill, whether it’s computer vision, hacking with the Raspberry Pi, or even playing guitar, one of the fastest, fool-proof methods to pick up the technique is to design (small) real-world projects around the skill and try to solve it.

For guitar, that meant learning short riffs that not only taught me parts of actual songs but also gave me a valuable technique (such as mastering a particular pentatonic scale, for instance).

In computer vision and image processing, your goal should be to brainstorm mini-projects and then try to solve them. Don’t get too complicated too quickly, that’s a recipe for failure.

Instead, grab a copy of my Raspberry Pi for Computer Vision book, read it, and use it as a launchpad for your personal projects.

When you’re done reading, go back to the chapters that inspired you the most and see how you can extend them in some manner (even if it’s just applying the same technique to a different scenario).

Solving the mini-projects you brainstorm will not only keep you interested in the subject (since you personally thought of them), but they’ll teach you hands-on skills at the same time.

Today’s tutorial — motion detection and streaming to a web browser — is a great starting point for such a mini-project. I hope that now that you’ve gone through this tutorial, you have brainstormed ideas on how you may extend this project to your own applications.

But, if you’re interested in learning more…

My new book, Raspberry Pi for Computer Vision, has over 40 projects related to embedded computer vision + Internet of Things (IoT). You can build upon the projects in the book to solve problems around your home, business, and even for your clients. Each of these projects have an emphasis on:

  • Learning by doing.
  • Rolling up your sleeves.
  • Getting your hands dirty in code and implementation.
  • Building actual, real-world projects using the Raspberry Pi.

A handful of the highlighted projects include:

  • Daytime and nightime wildlife monitoring
  • Traffic counting and vehicle speed detection
  • Deep Learning classification, object detection, and instance segmentation on resource constrained devices
  • Hand gesture recognition
  • Basic robot navigation
  • Security applications
  • Classroom attendance
  • …and many more!

The book also covers deep learning using the Google Coral and Intel Movidius NCS coprocessors (Hacker + Complete Bundles). We’ll also bring in the NVIDIA Jetson Nano to the rescue when more deep learning horsepower is needed (Complete Bundle).

In case you missed the Kickstarter, you may wish to watch my announcement video:

Are you ready to join me to learn about computer vision and how to apply embedded devices such as the Raspberry Pi, Google Coral, and NVIDIA Jetson Nano?

If so, take a look at the book using the link below!

Pre-order my Raspberry Pi for Computer Vision book!

Summary

In this tutorial you learned how to stream frames from a server machine to a client web browser. Using this web streaming we were able to build a basic security application to monitor a room of our house for motion.

Background subtraction is an extremely common method utilized in computer vision. Typically, these algorithms are computationally efficient, making them suitable for resource-constrained devices, such as the Raspberry Pi.

After implementing our background subtractor, we combined it with the Flask web framework, enabling us to:

  1. Access frames from RPi camera module/USB webcam.
  2. Apply background subtraction/motion detection to each frame.
  3. Stream the results to a web page/web browser.

Furthermore, our implementation supports multiple clients, browsers, or tabs — something that you will not find in most other implementations.

Whenever you need to stream frames from a device to a web browser, definitely use this code as a template/starting point.

To download the source code to this post, and be notified when future posts are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post OpenCV – Stream video to web browser/HTML page appeared first on PyImageSearch.

Multiprocessing with OpenCV and Python

$
0
0

In this tutorial, you will learn how to use multiprocessing with OpenCV and Python to perform feature extraction. You’ll learn how to use multiprocessing with OpenCV to parallelize feature extraction across the system bus, including all processors and cores on your computer.

Today’s tutorial is inspired by PyImageSearch reader, Abigail.

Abigail writes:

Hey Adrian, I just read your tutorial on image hashing with OpenCV and really enjoyed it.

I’m trying to apply image hashing to my research project at the university.

They have provided me with a dataset of ~7.5 million images. I used your code to perform image hashing but it’s taking a long time to process the entire dataset.

Is there anything I can do to speedup the process?

Abigail asks a great question.

The image hashing post she is referring to is singled threaded, meaning that only one core of the processor is being utilized — if we switch to using multiple threads/processes we can dramatically speed up the hashing process.

But how do we actually utilize multiprocessing with OpenCV and Python?

I’ll show you in the rest of this tutorial.

To learn how to use multiprocessing with OpenCV and Python, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Multiprocessing with OpenCV and Python

In the first part of this tutorial, we’ll discuss single-threaded vs. multi-threaded applications, including why we may choose to use multiprocessing with OpenCV to speed up the processing of a given dataset.

I’ll also discuss why immediately jumping to Big Data algorithms, tools, and paradigms (such as Hadoop and MapReduce) is the wrong decision — instead, you should parallelize across the system bus first.

From there we’ll implement our Python and OpenCV multiprocessing functions to facilitate processing a large dataset quickly and easily.

Finally, we’ll put all the pieces together and compare how long it takes to process our dataset:

  1. With only a single core of a processor
  2. And distributing the load across all cores of the processor

Let’s get started!

Why use multiprocessing for processing a dataset of images?

The vast majority of projects and applications you have implemented are (very likely) single-threaded.

When you launch your Python project, the

python
binary launches a Python interpreter (i.e., the “Python process”).

How the actual Python process itself is assigned to a CPU core is dependent on how the operating system handles (1) process scheduling and (2) assigning system vs. user threads.

There are entire books dedicated to multiprocessing, operating systems, and how processes are scheduled, assigned, removed, deleted, etc. via the OS; however, for the sake of simplicity, let’s assume:

  1. We launch our Python script.
  2. The operating system assigns the Python program to a single core of the processor.
  3. The OS then allows the Python script to run on the processor core until completion.

That’s all fine and good — but we are only utilizing a small amount of our true processing power.

To see how we’re underutilizing our processor, consider the following image:

Figure 1: Multiprocessing with OpenCV and Python. By default, Python scripts use a single process. This 3GHz Intel Xeon W processor is being underutilized.

This figure is meant to visualize the 3 GHz Intel Xeon W on my iMac Pro — note how the processor has a total of 20 cores.

Now, let’s assume we launch our Python script. The operating system will assign the process to a single one of those cores:

Figure 2: Without multiprocessing, your OpenCV program may not be efficiently using all cores or processors available on your machine.

The Python script will then run to completion.

But do you see the problem here?

We are only using 5% of our true processing power!

Thus, to speed up our Python script we can utilize multiprocessing. Under the hood, Python’s

multiprocessing
package spins up a new
python
process for each core of the processor. Each
python
process is independent and separate from the others (i.e., there are no shared variables, memory, etc.).

We then assign a portion of the dataset processing workload to each individual

python
process:

Figure 3: By multiprocessing with OpenCV, we can harness the full capability of our processor. In this case, an Intel Xeon 3GHz processor has 20 cores available and each core can be running an independent Python process.

Notice how each process is assigned a small chunk of the dataset.

Each process independently chews on the subset of the dataset assigned to it until the entire dataset has been processed.

Now, instead of using just a single core of our processor, we are using all cores!

Note: Keep in mind that this example is a bit of a simplification. The OS will manage process assignment as there are more processes than just your Python script running on your system. Some cores may be responsible for more than one Python process, other cores no Python processes, and remaining cores OS/system routines.

Why not use Hadoop, MapReduce, and other Big Data tools?

Your first thought when trying to parallelize processing of a large dataset would be to apply Big Data tools, algorithms, and paradigms such as Hadoop and MapReduce — but this would be a BIG mistake.

The golden rule when working with large datasets is to:

  1. Parallelize across your system bus first.
  2. And if performance/throughput is not sufficient, then, and only then, start parallelizing across multiple machines (including Hadoop, MapReduce, etc.).

The single biggest multiprocessing mistake I see computer scientists make is to immediately jump into Big Data tools.

Don’t do that.

Instead, spread the dataset processing across your system bus first.

If you’re not getting the throughput speed you want on your system bus only then should you consider parallelizing across multiple machines and bringing in Big Data tools.

If you find yourself in need of Hadoop/MapReduce, enroll in the PyImageSearch Gurus course to learn about high-throughput Python + OpenCV image processing using Hadoop’s Streaming API!

Our example dataset

Figure 4: The CALTECH-101 dataset consists of 101 object categories. Will generate image hashes using OpenCV, Python, and multiprocessing for all images in the dataset.

The dataset we’ll be using for our multiprocessing and OpenCV example is CALTECH-101, the same dataset we use when building an image hashing search engine.

The dataset consists of 9,144 images.

We’ll be using multiprocessing to spread out the image hashing extraction across all cores of our processor.

You may download the CALTECH-101 dataset from their official webpage or you can use the following

wget
command:
$ wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz
$ tar xvzf 101_ObjectCategories.tar.gz

Project structure

Let’s inspect our project structure:

$ tree --dirsfirst --filelimit 10
.
├── pyimagesearch
│   ├── __init__.py
│   └── parallel_hashing.py
├── 101_ObjectCategories [9,144 images] 
├── temp_output
└── extract.py

Inside the

pyimagesearch
  module is our
parallel_hashing.py
  helper script. This script contains our hashing function, chunking function, and our
process_images
  workhorse.

The

101_ObjectCatories/
  directory contains 101 subdirectories of images from CALTECH-101 (downloaded via the previous section).

A number of intermediate files will be temporarily stored in the

temp_output/
  folder.

The heart of our multiprocessing lies in

extract.py
 . This script includes our pre-multiprocessing overhead, parallelization across the system bus, and post-multprocessing overhead.

Our multiprocessing helper functions

Before we can utilize multiprocessing with OpenCV to speedup our dataset processing, let’s first implement our set of helper utilities used to facilitate multiprocessing.

Open up the

parallel_hashing.py
file in your directory structure and insert the following code:
# import the necessary packages
import numpy as np
import pickle
import cv2

def dhash(image, hashSize=8):
	# convert the image to grayscale
	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

	# resize the input image, adding a single column (width) so we
	# can compute the horizontal gradient
	resized = cv2.resize(gray, (hashSize + 1, hashSize))

	# compute the (relative) horizontal gradient between adjacent
	# column pixels
	diff = resized[:, 1:] > resized[:, :-1]

	# convert the difference image to a hash
	return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])

We begin by importing NumPy, OpenCV, and

pickle
  (Lines 2-5).

From there, we define our difference hashing function,

dhash
 . There are a number of image hashing algorithms, but one of the most popular ones is called the difference hash, which includes four steps:
  1. Step #1: Convert the input image to grayscale (Line 8).
  2. Step #2: Resize the image to fixed dimensions, N + 1 x N, ignoring aspect ratio. Typically we set N=8 or N=16. We use N + 1 for the number of rows so that we can compute the difference (hence “difference hash”) between adjacent pixels in the image (Line 12).
  3. Step #3: Compute the difference. If we set N=8 then we have 9 pixels per row and 8 pixels per column. We can then compute the difference between adjacent column pixels, yielding 8 differences. 8 rows of 8 differences (i.e., 8×8) results in 64 values (Line 16).
  4. Step #4: Finally, we can build the hash. In practice all we actually need to perform is a “greater than” operation comparing the columns, yielding binary values. These 64 binary values are compacted into an integer, forming our final hash (Line 19).

Typically, image hashing algorithms are used to find near-duplicate images in a large dataset.

For a full review of difference hashing be sure to review the following two blog posts:

Next, let’s look at the

convert_hash
  function:
def convert_hash(h):
	# convert the hash to NumPy's 64-bit float and then back to
	# Python's built in int
	return int(np.array(h, dtype="float64"))

When I first wrote the code for the image hashing search engine tutorial, I found that the VP-Tree implementation internally converts points to a NumPy 64-bit float. That would be okay; however, hashes need to be integers and if we convert them to 64-bit floats, they become an unhashable data type. To overcome the limitation of the VP-Tree implementation, I came up with the

convert_hash
  hack:
  • We accept an input hash,
    h
     .
  • That hash is then converted to a NumPy 64-bit float.
  • And that NumPy float is then converted back to Python’s built-in integer data type.

This hack ensures that hashes are represented consistently throughout the hashing, indexing, and searching process.

In order to leverage multiprocessing, we first need to chunk our dataset into N equally sized chunks (one chunk per core of the processor).

Let’s define our

chunk
  generator now:
def chunk(l, n):
	# loop over the list in n-sized chunks
	for i in range(0, len(l), n):
		# yield the current n-sized chunk to the calling function
		yield l[i: i + n]

The

chunk
  generator accepts two parameters:
  • l
    : List of elements (in this case, file paths).
  • n
    : Number of N-sized chunks to generate.

Inside the function, we loop over list

l
and
yield
  N-sized chunks to the calling function.

We’re finally to the workhorse of our multiprocessing implementation — the

process_images
  function:

def process_images(payload):
	# display the process ID for debugging and initialize the hashes
	# dictionary
	print("[INFO] starting process {}".format(payload["id"]))
	hashes = {}

	# loop over the image paths
	for imagePath in payload["input_paths"]:
		# load the input image, compute the hash, and conver it
		image = cv2.imread(imagePath)
		h = dhash(image)
		h = convert_hash(h)

		# update the hashes dictionary
		l = hashes.get(h, [])
		l.append(imagePath)
		hashes[h] = l

	# serialize the hashes dictionary to disk using the supplied
	# output path
	print("[INFO] process {} serializing hashes".format(payload["id"]))
	f = open(payload["output_path"], "wb")
	f.write(pickle.dumps(hashes))
	f.close()

Inside the separate 

extract.py
 script, we’ll use Python’s
multiprocessing
library to launch a dedicated Python process, assign it to a specific core of the processor, and then run the
process_images
function on that specific core.

The

process_images
  function works like this:
  • It accepts a
    payload
    as an input (Line 32). The
    payload
    is assumed to be a Python dictionary but can actually be any datatype provided that we can pickle and unpickle it.
  • Initializes the 
    hashes
    dictionary (Line 36).
  • Loops over input image paths in the
    payload
      (Line 39). In the loop, we load each image, extract the hash, and update
    hashes
    dictionary (Lines 41-48).
  • Finally, we write the
    hashes
      to disk as a
    .pickle
      file (Lines 53-55).

For the purposes of this blog post we are utilizing multiprocessing to facilitate faster image hashing of an input dataset; however, you should use this function as a template for your own dataset processing.

You should easily swap in keypoint detection/local invariant feature extraction, color channel statistics, Local Binary Patterns, etc. From there, you may take this function an modify it for your own needs.

Implementing the OpenCV and multiprocessing script

Now that our utility methods are implemented, let’s create the multiprocessing driver script.

This script will be responsible for:

  1. Grabbing all image paths in our input dataset.
  2. Splitting the image paths into N equally sized chunks (where N is the total number of processes we wish to utilize).
  3. Using
    multiprocessing
    ,
    Pool
    , and
    map
    to call the
    process_images
    function on each core of the processor.
  4. Grab the results from each independent process and combine them.

If you need to review Python’s multiprocessing module, be sure to refer to the docs.

Let’s see how we can implement our OpenCV and multiprocessing script. Open up the

extract.py
file and insert the following code:
# import the necessary packages
from pyimagesearch.parallel_hashing import process_images
from pyimagesearch.parallel_hashing import chunk
from multiprocessing import Pool
from multiprocessing import cpu_count
from imutils import paths
import numpy as np
import argparse
import pickle
import os

Lines 2-10 import our packages, modules, and functions:

  • From our custom
    parallel_hashing
      file, we import both our
    process_images
      and
    chunk
      functions.
  • To accommodate parallel processing we’ll use Pythons 
    multiprocessing
      module. Specifically, we import
    Pool
      (to construct a processing pool) and
    cpu_count
      (to get a count of the number of available CPUs/cores if the
    --procs
      command line argument is not supplied).

All of our multiprocessing setup code must be in the main thread of execution:

# check to see if this is the main thread of execution
if __name__ == "__main__":
	# construct the argument parser and parse the arguments
	ap = argparse.ArgumentParser()
	ap.add_argument("-i", "--images", required=True, type=str,
		help="path to input directory of images")
	ap.add_argument("-o", "--output", required=True, type=str,
		help="path to output directory to store intermediate files")
	ap.add_argument("-a", "--hashes", required=True, type=str,
		help="path to output hashes dictionary")
	ap.add_argument("-p", "--procs", type=int, default=-1,
		help="# of processes to spin up")
	args = vars(ap.parse_args())

Line 13 ensures we are inside the main thread of execution. This helps prevent multiprocessing bugs, especially on Windows operating systems.

Lines 15-24 parse four command line arguments:

  • --images
     : The path to the input images directory.
  • --output
     : The path to the output directory to store intermediate files.
  • --hashes
     : The path to the output hashes dictionary in .pickle format.
  • --procs
     : The number of processes to launch for multiprocessing.

With our command line arguments parsed and ready to go, now we’ll (1) determine the number of concurrent processes to launch, and (2) prepare our image paths (a bit of pre-multiprocessing overhead):

# determine the number of concurrent processes to launch when
	# distributing the load across the system, then create the list
	# of process IDs
	procs = args["procs"] if args["procs"] > 0 else cpu_count()
	procIDs = list(range(0, procs))

	# grab the paths to the input images, then determine the number
	# of images each process will handle
	print("[INFO] grabbing image paths...")
	allImagePaths = sorted(list(paths.list_images(args["images"])))
	numImagesPerProc = len(allImagePaths) / float(procs)
	numImagesPerProc = int(np.ceil(numImagesPerProc))

	# chunk the image paths into N (approximately) equal sets, one
	# set of image paths for each individual process
	chunkedPaths = list(chunk(allImagePaths, numImagesPerProc))

Line 29 determines the total number of concurrent processes we’ll be launching, while Line 30 assigns each process an ID number. By default, we’ll utilize all CPUs/cores on our system.

Line 35 grabs paths to the input images in our dataset.

Lines 36 and 37 determine the total number of images per process by dividing the number of image paths by the number of processes and taking the ceiling to ensure we use an integer value from here forward.

Line 41 utilizes our 

chunk
function to create a list of N equally-sized lists of image paths. We will be mapping each of these chunks to an independent process.

Let’s prepare our

payloads
  to assign to each process (our final pre-multiprocessing overhead):
# initialize the list of payloads
	payloads = []

	# loop over the set chunked image paths
	for (i, imagePaths) in enumerate(chunkedPaths):
		# construct the path to the output intermediary file for the
		# current process
		outputPath = os.path.sep.join([args["output"],
			"proc_{}.pickle".format(i)])

		# construct a dictionary of data for the payload, then add it
		# to the payloads list
		data = {
			"id": i,
			"input_paths": imagePaths,
			"output_path": outputPath
		}
		payloads.append(data)

Line 44 initializes the 

payloads
list. Each payload will consist of 
data
  containing:
  1. An ID
  2. A list of input paths
  3. An output path to an intermediate file

Line 47 begins a loop over our chunked image paths. Inside the loop, we specify the intermediary output file path (which will store the respective image hashes for that specific chunk of image paths) while naming it carefully with the process ID in the filename (Lines 50 and 51).

To finish the loop, we

append
  our
data
  — a dictionary consisting of the (1) ID,
i
 , (2) input
imagePaths
 , and (3)
outputPath
 (Lines 55-60).

This next block is where we distribute processing of the dataset across our system bus:

# construct and launch the processing pool
	print("[INFO] launching pool using {} processes...".format(procs))
	pool = Pool(processes=procs)
	pool.map(process_images, payloads)

	# close the pool and wait for all processes to finish
	print("[INFO] waiting for processes to finish...")
	pool.close()
	pool.join()
	print("[INFO] multiprocessing complete")

The

Pool
class creates the Python processes/interpreters on each respective core of the processor (Line 64).

Calling

map
takes the
payloads
list and then calls
process_images
on each core, distributing the
payloads
to each core (Lines 65).

We’ll then close the

pool
  from accepting new jobs and wait for the multiprocessing to complete (Lines 69 and 70).

The final step (post-multiprocessing overhead) is to take our intermediate hashes and construct the final combined hashes.

# initialize our *combined* hashes dictionary (i.e., will combine
	# the results of each pickled/serialized dictionary into a
	# *single* dictionary
	print("[INFO] combining hashes...")
	hashes = {}

	# loop over all pickle files in the output directory
	for p in paths.list_files(args["output"], validExts=(".pickle"),):
		# load the contents of the dictionary
		data = pickle.loads(open(p, "rb").read())

		# loop over the hashes and image paths in the dictionary
		for (tempH, tempPaths) in data.items():
			# grab all image paths with the current hash, add in the
			# image paths for the current pickle file, and then
			# update our hashes dictionary
			imagePaths = hashes.get(tempH, [])
			imagePaths.extend(tempPaths)
			hashes[tempH] = imagePaths

	# serialize the hashes dictionary to disk
	print("[INFO] serializing hashes...")
	f = open(args["hashes"], "wb")
	f.write(pickle.dumps(hashes))
	f.close()

Line 77 initializes the hashes dictionary to hold our combined hashes which we will populate from each of the intermediary files.

Lines 80-91 populate the combined hashes dictionary. To do so, we loop over all intermediate

.pickle
files (i.e., one
.pickle
file for each individual process). Inside the loop, we (1) read the hashes and associated
imagePaths
  from the data, and (2) update the
hashes
dictionary.

Finally, Lines 94-97 serialize the

hashes
to disk. We could use the serialized hashes to construct a VP-Tree and search for near-duplicate images in a separate script at this point.

Note: You could update the code to delete the temporary

.pickle
  files from your system; however, I left that as an implementation decision to you, the reader.

OpenCV and multiprocessing results

Let’s put our OpenCV and multiprocessing methods to the test. Make sure you’ve:

  1. Used the “Downloads” section of this tutorial to download the source code.
  2. Downloaded the CALTECH-101 dataset using the instructions in the “Our example dataset” section above.

To start, let’s test how long it takes to process our dataset of 9,144 images using only a single core:

$ time python extract.py --images 101_ObjectCategories --output temp_output \
	--hashes hashes.pickle --procs 1
[INFO] grabbing image paths...
[INFO] launching pool using 1 processes...
[INFO] starting process 0
[INFO] process 0 serializing hashes
[INFO] waiting for processes to finish...
[INFO] multiprocessing complete
[INFO] combining hashes...
[INFO] serializing hashes...

real	0m9.576s
user	0m7.857s
sys		0m1.489s

Utilizing only a single process (single core of our processor) required 9.576 seconds to process the entire image dataset.

Now, let’s try using all 20 processes (which could be mapped to all 20 cores of my processor):

$ time python extract.py --images ~/Desktop/101_ObjectCategories \
	--output temp_output --hashes hashes.pickle 
[INFO] grabbing image paths...
[INFO] launching pool using 20 processes...
[INFO] starting process 0
[INFO] starting process 1
[INFO] starting process 2
[INFO] starting process 3
[INFO] starting process 4
[INFO] starting process 5
[INFO] starting process 6
[INFO] starting process 7
[INFO] starting process 8
[INFO] starting process 9
[INFO] starting process 10
[INFO] starting process 11
[INFO] starting process 12
[INFO] starting process 13
[INFO] starting process 14
[INFO] starting process 15
[INFO] starting process 16
[INFO] starting process 17
[INFO] starting process 18
[INFO] starting process 19
[INFO] process 3 serializing hashes
[INFO] process 4 serializing hashes
[INFO] process 6 serializing hashes
[INFO] process 8 serializing hashes
[INFO] process 5 serializing hashes
[INFO] process 19 serializing hashes
[INFO] process 11 serializing hashes
[INFO] process 10 serializing hashes
[INFO] process 16 serializing hashes
[INFO] process 14 serializing hashes
[INFO] process 15 serializing hashes
[INFO] process 18 serializing hashes
[INFO] process 7 serializing hashes
[INFO] process 17 serializing hashes
[INFO] process 12 serializing hashes
[INFO] process 9 serializing hashes
[INFO] process 13 serializing hashes
[INFO] process 2 serializing hashes
[INFO] process 1 serializing hashes
[INFO] process 0 serializing hashes
[INFO] waiting for processes to finish...
[INFO] multiprocessing complete
[INFO] combining hashes...
[INFO] serializing hashes...

real	0m1.508s
user	0m12.785s
sys		0m1.361s

By distributing the image hashing load across all 20 cores of my processor I was able to reduce the time it took to process the dataset from 9.576 seconds down to 1.508 seconds — that’s a reduction of over 535%!

But wait, if we used 20 cores, shouldn’t the total processing time be approximately 9.576 / 20 = 0.4788 seconds?

Well, not quite, for a few reasons:

  1. First, we’re performing a lot of I/O operations. Each
    cv2.imread
      call results in I/O overhead. The hashing algorithm itself is also very simple. If our algorithm were truly CPU bound, versus I/O bound, the speedup factor would be even better.
  2. Secondly, multiprocessing is not a “free” operation. There are overhead function calls, both at the Python level and operating system level, that prevent us from seeing a true 20x speedup.

Can all computer vision and OpenCV algorithms be made parallel with multiprocessing?

The short answer is no, not all algorithms can be made parallel and distributed to all cores of a processor — some algorithms are simply single threaded in nature.

Furthermore, you cannot use the

multiprocessing
library to speedup compiled OpenCV routines like
cv2.GaussianBlur
,
cv2.Canny
, or any of the deep neural network routines in the
cv2.dnn
package.

Those routines, as well as all other

cv2.*
functions are pre-compiled C/C++ functions — Python’s
multiprocessing
library will have no impact on them whatsoever.

Instead, if you are interested in how to speedup those functions, be sure to look into OpenCL, Threading Building Blocks (TBB), NEON, and VFPv3.

Additionally, if you are working with the Raspberry Pi you should read this tutorial on how to optimize your OpenCV install.

I’m also including additional OpenCV optimizations inside my book, Raspberry Pi for Computer Vision.

Summary

In this tutorial you learned how to utilize multiprocessing with OpenCV and Python.

Specifically, we learned how to use Python’s built-in

multiprocessing
library along with the
Pool
and
map
methods to parallelize and distribute processing across all processors and all cores of the processors.

The end result is a massive 535% speedup in the time it took to process our dataset of images.

We examined multiprocessing with OpenCV through indexing a dataset of images for building an image hashing search engine; however, you can modify/implement your own

process_images
function to include your own functionality.

My personal suggestion would be to use the

process_images
function as a template when building your own multiprocessing and OpenCV applications.

I hope you enjoyed this tutorial!

If you would like to see more multiprocessing and OpenCV optimization tutorials in the future please leave a comment below and let me know.

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Multiprocessing with OpenCV and Python appeared first on PyImageSearch.

Install OpenCV 4 on Raspberry Pi 4 and Raspbian Buster

$
0
0

In this tutorial you will learn how to install OpenCV 4 on the Raspberry Pi 4 and Raspbian Buster.

You will learn how to install OpenCV 4 on Raspbian Buster via both:

  1. A simple pip-install method (which can be completed in a matter of minutes)
  2. Compiling from source (which will take longer but will give you access to the full, optimized install of OpenCV)

To learn more about installing OpenCV 4 on the Raspberry Pi 4 and Raspbian Buster, just keep reading.

Install OpenCV 4 on Raspberry Pi 4 and Raspbian Buster

In this tutorial we will install and test OpenCV 4 on Raspbian Buster in five simple, easy-to-follow steps.

If you’ve ever compiled OpenCV from scratch before, you know that the process is especially time-consuming and even painstakingly frustrating if you miss a key step or if you are new to Linux and Bash.

In Q4 2018, a new, faster method for installing OpenCV on the Raspberry Pi (i.e., a pip install) was made possible thanks to the hard work of the following people:

Installing OpenCV via pip is easier than ever. In fact, you can be up and running (Step #1 – Step #4a) in less than 10 minutes.

But what’s the catch?

Using pip to install OpenCV is great, but for some projects (including many educational projects on PyImageSearch.com and in my books/courses) you might want the complete install of OpenCV (which the pip install won’t give you).

Don’t worry, I’ve got you covered in Step #4b below — you’ll learn to use CMake and Make to compile OpenCV 4 on BusterOS from scratch.

Let’s dive in!

Before we begin: Grab your Raspberry Pi and flash BusterOS to your microSD

Let’s review the hardware requirements for this tutorial:

  • Raspberry Pi: Despite the title of this tutorial, you may use the Raspberry Pi hardware including 3B, 3B+, or 4B to install OpenCV 4. These instructions only apply to Raspbian Buster, however.
  • 32GB microSD: I recommend the high-quality SanDisk 32GB 98Mb/s cards. Here’s an example on Amazon (however you can purchase them on your favorite online distributor).
  • microSD adapter: You’ll need to purchase a microSD to USB adapter so you can flash the memory card from your laptop.

If you don’t already have a Raspberry Pi 4, I highly recommend CanaKits (which are available on Amazon) and directly through Canakit’s website. Most of their kits come with a Raspberry Pi, power adapter, microSD, microSD adapter, heatsinks, and more!

Figure 1: Hardware for installing OpenCV 4 on your Raspberry Pi 4 running Raspbian Buster.

Once you have the hardware ready, you’ll need to flash a fresh copy of the Raspbian Buster operating system to the microSD card.

  1. Head on over to the official BusterOS download page (Figure 2), and start your download. I recommend the “Raspbian Buster with Desktop and recommended software”.
  2. Download Balena Etcher — software for flashing memory cards. It works on every major OS.
  3. Use Etcher to flash BusterOS to your memory card (Figure 3).

Figure 2: Download Raspbian Buster for your Raspberry Pi and OpenCV 4.

After download the Raspbian Buster .img file you can flash it to your micro-SD card using Etcher:

Figure 3: Flash Raspbian Buster with Etcher. We will use BusterOS to install OpenCV 4 on our Raspberry Pi 4.

After a few minutes the flashing process should be complete — slot the micro-SD card into your Raspberry Pi 4 and then boot.

From there you can move on to the rest of the OpenCV install steps in this guide.

Step #1: Expand filesystem and reclaim space

For the remainder of this tutorial I’ll be making the following assumptions:

  1. You are working with a brand new, fresh install of Raspbian Buster (see the previous section to learn how to flash Buster to your microSD).
  2. You are comfortable with the command line and Unix environments.
  3. You have an SSH or VNC connection established with your Pi. Alternatively, you could use a keyboard + mouse + screen.

Go ahead and insert your microSD into your Raspberry Pi and boot it up with a screen attached.

Once booted, configure your WiFi/ethernet settings to connect to the internet (you’ll need an internet connection to download and install required packages for OpenCV).

From there you can use SSH as I have done, or go ahead and open a terminal.

The first step is to run,

raspi-config
and expand your filesystem:
$ sudo raspi-config

And then select the “Advanced Options” menu item:

Figure 4: The raspi-config configuration screen for Raspbian Buster. Select 7 Advanced Options so that we can expand our filesystem.

Followed by selecting “7 Expand filesystem”:

Figure 5: The A1 Expand Filesystem menu item allows you to expand the filesystem on your microSD card containing the Raspberry Pi Buster operating system. Then we can proceed to install OpenCV 4.

Once prompted, you should select the first option, “A1 Expand File System”, hit

enter
  on your keyboard, arrow down to the “<Finish>” button, and then reboot your Pi — you may be prompted to reboot, but if you aren’t you can execute:
$ sudo reboot

After rebooting, your file system should have been expanded to include all available space on your micro-SD card. You can verify that the disk has been expanded by executing

df -h
and examining the output:
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  5.3G   23G  20% /
devtmpfs        1.8G     0  1.8G   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G  8.6M  1.9G   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mmcblk0p1  253M   40M  213M  16% /boot
tmpfs           386M     0  386M   0% /run/user/1000

As you can see, my Raspbian filesystem has been expanded to include all 32GB of the micro-SD card.

However, even with my filesystem expanded, I have already used 15% of my 32GB card.

While it’s not required, I would suggest deleting both Wolfram Engine and LibreOffice to reclaim ~1GB of space on your Raspberry Pi:

$ sudo apt-get purge wolfram-engine
$ sudo apt-get purge libreoffice*
$ sudo apt-get clean
$ sudo apt-get autoremove

Step #2: Install dependencies

The following commands will update and upgrade any existing packages, followed by installing dependencies, I/O libraries, and optimization packages for OpenCV:

The first step is to update and upgrade any existing packages:

$ sudo apt-get update && sudo apt-get upgrade

We then need to install some developer tools, including CMake, which helps us configure the OpenCV build process:

$ sudo apt-get install build-essential cmake pkg-config

Next, we need to install some image I/O packages that allow us to load various image file formats from disk. Examples of such file formats include JPEG, PNG, TIFF, etc.:

$ sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng-dev

Just as we need image I/O packages, we also need video I/O packages. These libraries allow us to read various video file formats from disk as well as work directly with video streams:

$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
$ sudo apt-get install libxvidcore-dev libx264-dev

The OpenCV library comes with a sub-module named

highgui
which is used to display images to our screen and build basic GUIs. In order to compile the
highgui
module, we need to install the GTK development library and prerequisites:
$ sudo apt-get install libfontconfig1-dev libcairo2-dev
$ sudo apt-get install libgdk-pixbuf2.0-dev libpango1.0-dev
$ sudo apt-get install libgtk2.0-dev libgtk-3-dev

Many operations inside of OpenCV (namely matrix operations) can be optimized further by installing a few extra dependencies:

$ sudo apt-get install libatlas-base-dev gfortran

These optimization libraries are especially important for resource-constrained devices such as the Raspberry Pi.

The following pre-requisites are for Step #4a and they certainly won’t hurt for Step #4b either. They are for HDF5 datasets and Qt GUIs:

$ sudo apt-get install libhdf5-dev libhdf5-serial-dev libhdf5-103
$ sudo apt-get install libqtgui4 libqtwebkit4 libqt4-test python3-pyqt5

Lastly, let’s install Python 3 header files so we can compile OpenCV with Python bindings:

$ sudo apt-get install python3-dev

If you’re working with a fresh install of the OS, it is possible that these versions of Python are already at the newest version (you’ll see a terminal message stating this).

Step #3: Create your Python virtual environment and install NumPy

We’ll be using Python virtual environments, a best practice when working with Python.

A Python virtual environment is an isolated development/testing/production environment on your system — it is fully sequestered from other environments. Best of all, you can manage the Python packages inside your your virtual environment inside with pip (Python’s package manager).

Of course, there are alternatives for managing virtual environments and packages (namely Anaconda/conda). I’ve used/tried them all, but have settled on pip, virtualenv, and virtualenvwrapper as the preferred tools that I install on all of my systems. If you use the same tools as me, you’ll receive the best support from me.

You can install pip using the following commands:

$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py
$ sudo python3 get-pip.py
$ sudo rm -rf ~/.cache/pip

Let’s install 

virtualenv
  and
virtualenvwrapper
  now:
$ sudo pip install virtualenv virtualenvwrapper

Once both

virtualenv
  and
virtualenvwrapper
  have been installed, open up your
~/.bashrc
  file:
$ nano ~/.bashrc

…and append the following lines to the bottom of the file:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

Figure 6: Using the nano editor to update ~/.bashrc with virtualenvwrapper settings.

Save and exit via

ctrl + x
 ,
y
 ,
enter
 .

From there, reload your

~/.bashrc
  file to apply the changes to your current bash session:
$ source ~/.bashrc

Next, create your Python 3 virtual environment:

$ mkvirtualenv cv -p python3

Here we are creating a Python virtual environment named

cv
  using Python 3. Going forward, I recommend Python 3 with OpenCV 4+.

Note: Python 2.7 will reach end of its life on January 1st, 2020 so I do not recommend using Python 2.7.

You can name the virtual environment whatever you want, but I use

cv
  as the standard naming convention here on PyImageSearch.

If you have a Raspberry Pi Camera Module attached to your RPi, you should install the PiCamera API now as well:

$ pip install "picamera[array]"

Step #4(a or b): Decide if you want the 1-minute quick install or the 2-hour complete install

From here you need to make a decision about the rest of your install. There are two options.

  1. Step #4a: pip install OpenCV 4: If you decide to pip install OpenCV, you will be done in a matter of seconds. It is by far the fastest, easiest method to install OpenCV. It is the method I recommend for 90% of people — especially beginners. After this step, you will skip to Step #5 to test your install.
  2. Step #4b: Compile OpenCV 4 from source: This method gives you the full install of OpenCV 4. It will take 2-4 hours depending on the processor in your Raspberry Pi.

As stated, I highly encourage you to use the pip instructions. They are faster and will work for 90% of your projects. Additionally, the patented algorithms can only be used for educational purposes (there are plenty of great alternatives to the patented algorithms too).

Step #4a: pip install OpenCV 4

In a matter of seconds, you can pip install OpenCV into the  

cv
 virtual environment:
$ pip install opencv-contrib-python

Figure 7: To quickly install OpenCV 4 on your Raspberry Pi 4 running Raspbian Buster, I recommend using pip as shown.

That’s really all there is to it. You may skip to Step #5 now to test your install.

Step #4b: Compile OpenCV 4 from source

This option installs the full install of OpenCV including patented (“Non-free”) algorithms.

Note: Do not follow Step #4b if you followed Step #4a.

Let’s go ahead and download the OpenCV source code for both the opencv and opencv_contrib repositories, followed by unarchiving them:

$ cd ~
$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.1.1.zip
$ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.1.1.zip
$ unzip opencv.zip
$ unzip opencv_contrib.zip
$ mv opencv-4.1.1 opencv
$ mv opencv_contrib-4.1.1 opencv_contrib

Note: You will need to click the “<=>” button in the toolbar of the terminal block above to grab the full paths to the zip archives.

For this blog post, we’ll be using OpenCV 4.1.1; however, as newer versions of OpenCV are released you can update the corresponding version numbers.

Increasing your SWAP space

Before you start the compile you must increase your SWAP space. Increasing the SWAP will enable you to compile OpenCV with all four cores of the Raspberry Pi (and without the compile hanging due to memory exhausting).

Go ahead and open up your

/etc/dphys-swapfile
  file:
$ sudo nano /etc/dphys-swapfile

…and then edit the

CONF_SWAPSIZE
  variable:
# set size to absolute value, leaving empty (default) then uses computed value
#   you most likely don't want this, unless you have an special disk situation
# CONF_SWAPSIZE=100
CONF_SWAPSIZE=1024

Notice that I’m increasing the swap from 100MB to 1024MB. This is critical to compiling OpenCV with multiple cores on Raspbian Buster.

Save and exit via

ctrl + x
 ,
y
 ,
enter
 .

If you do not increase SWAP it’s very likely that your Pi will hang during the compile.

From there, restart the swap service:

$ sudo /etc/init.d/dphys-swapfile stop
$ sudo /etc/init.d/dphys-swapfile start

Note: Increasing swap size is a great way to burn out your Raspberry Pi microSD card. Flash-based storage has a limited number of writes you can perform until the card is essentially unable to hold the 1’s and 0’s anymore. We’ll only be enabling large swap for a short period of time, so it’s not a big deal. Regardless, be sure to backup your 

.img
  file after installing OpenCV + Python just in case your card dies unexpectedly early. You can read more about large swap sizes corrupting memory cards on this page.

Compile and install OpenCV 4 on Raspbian Buster

We’re now ready to compile and install the full, optimized OpenCV library on the Raspberry Pi 4.

Ensure you are in the

cv
  virtual environment using the
workon
  command:
$ workon cv

Then, go ahead and install NumPy (an OpenCV dependency) into the Python virtual environment:

$ pip install numpy

And from there configure your build:

$ cd ~/opencv
$ mkdir build
$ cd build
$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
    -D ENABLE_NEON=ON \
    -D ENABLE_VFPV3=ON \
    -D BUILD_TESTS=OFF \
    -D INSTALL_PYTHON_EXAMPLES=OFF \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D CMAKE_SHARED_LINKER_FLAGS=-latomic \
    -D BUILD_EXAMPLES=OFF ..

There are four CMake flags I’d like to bring to your attention:

  • (1) NEON and (2) VFPv3 optimization flags have been enabled. These lines ensure that you compile the fastest and most optimized OpenCV for the ARM processor on the Raspberry Pi (Lines 7 and 8).
    • Note: The Raspberry Pi Zero W hardware is not compatible with NEON or VFPv3. Be sure to remove Lines 7 and 8 if you are compiling for a Raspberry Pi Zero W.
  • (3) Patented “NonFree” algorithms give you the full install of OpenCV (Line 11).
  • And by drilling into OpenCV’s source, it was determined that we need the (4)
    -latomic
      shared linker flag (Line 12).

I’d like to take a second now to bring awareness to a common pitfall for beginners:

  • In the terminal block above, you change directories into
    ~/opencv/
     .
  • You then create a
    build/
      directory therein and change directories into it.
  • If you try to execute CMake without being in the
    ~/opencv/build
      directory, CMake will fail.
    Try running
    pwd
      to see which working directory you are in before running
    cmake
     .

The

cmake
  command will take about 3-5 minutes to run as it prepares and configures OpenCV for the compile.

When CMake finishes, be sure to inspect the output of CMake under the Python 3 section:

Figure 8: CMake configures your OpenCV 4 compilation from source on your Raspberry Pi 4 running Buster.

Notice how the

Interpreter
 ,
Libraries
 ,
numpy
 , and
packages
  path variables have been properly set. Each of these refers to our
cv
  virtual environment.

Now go ahead and scroll up to ensure that the “Non-Free algorithms” are set to be installed:

Figure 9: Installing OpenCV 4 with “Non-free algorithms” on Raspbian Buster.

As you can see, “Non-free algorithms” for OpenCV 4 will be compiled + installed.

Now that we’ve prepared for our OpenCV 4 compilation, it is time to launch the compile process using all four cores:

$ make -j4

Figure 10: We used Make to compile OpenCV 4 on a Raspberry Pi 4 running Raspbian Buster.

Running

make
  could take anywhere from 1-4 hours depending on your Raspberry Pi hardware (this tutorial is compatible with the Raspberry Pi 3B, 3B+, and 4). The Raspberry Pi 4 is the fastest at the time of this writing.

Assuming OpenCV compiled without error (as in my screenshot above), you can install your optimized version of OpenCV on your Raspberry Pi:

$ sudo make install
$ sudo ldconfig

Reset your SWAP

Don’t forget to go back to your

/etc/dphys-swapfile
  file and:
  1. Reset
    CONF_SWAPSIZE
      to 100MB.
  2. Restart the swap service.

Sym-link your OpenCV 4 on the Raspberry Pi

Symbolic links are a way of pointing from one directory to a file or folder elsewhere on your system. For this sub-step, we will sym-link the

cv2.so
  bindings into your
cv
  virtual environment.

Let’s proceed to create our sym-link. Be sure to use “tab-completion” for all paths below (rather than copying these commands blindly):

$ cd /usr/local/lib/python3.7/site-packages/cv2/python-3.7
$ sudo mv cv2.cpython-37m-arm-linux-gnueabihf.so cv2.so
$ cd ~/.virtualenvs/cv/lib/python3.7/site-packages/
$ ln -s /usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.so cv2.so

Keep in mind that the exact paths may change and you should use “tab-completion”.

Step 5: Testing your OpenCV 4 Raspberry Pi BusterOS install

As a quick sanity check, access the

cv
  virtual environment, fire up a Python shell, and try to import the OpenCV library:
$ cd ~
$ workon cv
$ python
>>> import cv2
>>> cv2.__version__
'4.1.1'
>>>

Congratulations! You’ve just installed an OpenCV 4 on your Raspberry Pi.

If you are looking for some fun projects to work on with OpenCV 4, be sure to checkout my Raspberry Pi archives.

What’s next?

Ready to put your Raspberry Pi and OpenCV install to work?

My brand new book, Raspberry Pi for Computer Vision, has over 40 Computer Vision and Deep Learning projects for embedded computer vision and Internet of Things (IoT) applications. You can build upon the projects in the book to solve problems around your home, business, and even for your clients. Each of these projects place an emphasis on:

  • Learning by doing
  • Rolling up your sleeves
  • Getting your hands dirty in code and implementation
  • Building actual, real-world projects using the Raspberry Pi

A handful of the highlighted projects include:

  • Daytime and nightime wildlife monitoring
  • Traffic counting and vehicle speed detection
  • Deep Learning classification, object detection, and instance segmentation on resource constrained devices
  • Hand gesture recognition
  • Basic robot navigation
  • Security applications
  • Classroom attendance
  • …and many more!

The book also covers deep learning using the Google Coral and Intel Movidius NCS coprocessors along with NVIDIA Jetson Nano board.

If you’re interested in studying Computer Vision and Deep Learning on embedded devices, you won’t find a better book than this one!

Pick up your copy of Raspberry Pi for Computer Vision today!

Summary

In today’s tutorial, you learned how to install OpenCV 4 on your Raspberry Pi 4 running the Raspbian Buster operating system via two methods:

  • A simple pip install (fast and easy)
  • Compiling from source (takes longer, but gives you the full OpenCV install/optimizations)

The pip method to install OpenCV 4 is by far the easiest way to install OpenCV (and the method I recommend for 90% of projects). It is especially great for beginners too.

If you need the full install of OpenCV, you must compile from source. Compiling from source ensures that you have the full install including the “contrib” module with patented (“NonFree”) algorithms.

While compiling from source is both (1) more complicated, and (2) more time-consuming, it is currently the only way to access all features of OpenCV.

I hope you enjoyed today’s tutorial!

And if you’re ready to put your RPi and OpenCV install to work, be sure to check out my book, Raspberry Pi for Computer Vision — inside the book you’ll learn how to build practical, real-world Computer Vision and Deep Learning applications on the Raspberry Pi, Google Coral, Movidius NCS, and NVIDIA Jetson Nano.

Be be notified when future tutorials are published on the PyImageSearch blog (and download my free 17-page CV and DL Resource Guide PDF), just enter your email address in the form below!

The post Install OpenCV 4 on Raspberry Pi 4 and Raspbian Buster appeared first on PyImageSearch.

Keras: Starting, stopping, and resuming training

$
0
0

In this tutorial, you will learn how to use Keras to train a neural network, stop training, update your learning rate, and then resume training from where you left off using the new learning rate. Using this method you can increase your accuracy while decreasing model loss.

Today’s tutorial is inspired by a question I received from PyImageSearch reader, Zhang Min.

Zhang Min writes:

Hi Adrian, thanks for the PyImageSearch blog. I have two questions:

First, I am working on my graduation project and my university is allowing me to share time on their GPU machines. The problem is that I can only access a GPU machine in two hour increments — after my two hours is up I’m automatically booted off the GPU. How can I save my training progress, safely stop training, and then resume training from where I left off?

Secondly, my initial experiments aren’t going very well. My model quickly jumps to 80%+ accuracy but then stays there for another 50 epochs. What else can I be doing to improve my model accuracy? My advisor said I should look into adjusting the learning rate but I’m not really sure how to do that.

Thanks Adrian!

Learning how to start, stop, and resume training a deep learning model is a super important skill to master — at some point in your deep learning practitioner career you’ll run into a situation similar to Zhang Min’s where:

  • You have limited time on a GPU instance (which can happen on Google Colab or when using Amazon EC2’s cheaper spot instances).
  • Your SSH connection is broken and you forgot to use a terminal multiplexer to save your session (such as
    screen
    or
    tmux
    ).
  • Your deep learning rig locks up and forces shuts down.

Just imagine spending an entire week to train a state-of-the-art deep neural network…only to have your model lost due to a power failure!

Luckily, there’s a solution — but when those situations happen you need to know how to:

  1. Take a snapshotted model that was saved/serialized to disk during training.
  2. Load the model into memory.
  3. Resume training from where you left off.

Secondly, starting, stopping, and resume training is standard practice when manually adjusting the learning rate:

  1. Start training your model until loss/accuracy plateau
  2. Snapshot your model every N epochs (typically N={1, 5, 10})
  3. Stop training, normally by force exiting via
    ctrl + c
  4. Open your code editor and adjust your learning rate (typically lowering it by an order
    of magnitude)
  5. Go back to your terminal and restart the training script, picking up from the last
    snapshot of model weights

Using this

ctrl + c
method of training you can boost your model accuracy while simultaneously driving down loss, leading to a more accurate model.

The ability to adjust the learning rate is a critical skill for any deep learning practitioner to master, so take the time now to study and practice it!

To learn how to start, stop, and resume training with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras: Starting, stopping, and resuming training

In the first part of this blog post, we’ll discuss why we would want to start, stop, and resume training of a deep learning model.

We’ll also discuss how stopping training to lower your learning rate can improve your model accuracy (and why a learning rate schedule/decay may not be sufficient).

From there we’ll implement a Python script to handle starting, stopping, and resuming training with Keras.

I’ll then walk you through the entire training process, including:

  1. Starting the initial training script
  2. Monitoring loss/accuracy
  3. Noticing when loss/accuracy is plateauing
  4. Stopping training
  5. Lowering your learning rate
  6. Resuming training from where you left off with the new, lowered learning rate

Using this method of training you’ll often be able to improve your model accuracy.

Let’s go ahead and get started!

Why do we need to start, stop, and resume training?

There are a number of reasons you may need to start, stop, and resume training of your deep learning model, but the two primary grounds include:

  1. Your training session being terminated and training stopping (due to a power outage, GPU session timing out, etc.).
  2. Needing to adjust your learning rate to improve model accuracy (typically by lowering the learning rate by an order of magnitude).

The second point is especially important — if you go back and read the seminal AlexNet, SqueezeNet, ResNet, etc. papers you’ll find that the authors all say something along the lines of:

We started training our model with the SGD optimizer and an initial learning rate of 1e-1. We reduced our learning rate by an order of magnitude on epochs 30 and 50, respectively.

Why is the drop-in learning rate so important? And how can it lead to a more accurate model?

To explore that question, take a look at the following plot of ResNet-18 trained on the CIFAR-10 dataset:

Figure 1: Training ResNet-18 on the CIFAR-10 dataset. The characteristic drops in loss and increases in accuracy are evident of learning rate changes. Here, (1) training was stopped on epochs 30 and 50, (2) the learning rate was lowered, and (3) training was resumed. (image source)

Notice for epochs 1-29 there is a fairly “standard” curve that you come across when training a network:

  1. Loss starts off very high but then quickly drops
  2. Accuracy starts off very low but then quickly rises
  3. Eventually loss and accuracy plateau out

But what is going on around epoch 30?

Why does the loss drop so dramatically? And why does the accuracy rise so considerably?

The reason for this behavior is because:

  1. Training was stopped
  2. The learning rate was lowered by an order of magnitude
  3. And then training was resumed

The same goes for epoch 50 as well — again, training was stopped, the learning rate lowered, and then training resumed.

Each time we encounter a characteristic drop in loss and then a small increase in accuracy.

As the learning rate becomes smaller, the impact of the learning rate reduction has less and less impact.

Eventually, we run into two issues:

  1. The learning rate becomes very small which in turn makes the weight updates very small and thus the model cannot make any meaningful progress.
  2. We start to overfit due to the small learning rate. The model descends into areas of lower loss in the loss landscape, overfitting to the training data and not generalizing to the validation data.

The overfitting behavior is evident past epoch 50 in Figure 1 above.

Notice how validation loss has plateaued and is even started to rise a bit. And the same time training loss is continuing to drop, a clear sign of overfitting.

Dropping your learning rate is a great way to boost the accuracy of your model during training, just realize there is (1) a point of diminishing returns, and (2) a chance of overfitting if training is not properly monitored.

Why not use learning rate schedulers or decay?

Figure 2: Learning rate schedulers are great for some training applications; however, starting/stopping Keras training typically leads to more control over your deep learning model.

You might be wondering “Why not use a learning rate scheduler?”

There are a number of learning rate schedulers available to us, including:

If the goal is to improve model accuracy by dropping the learning rate, then why not just rely on those respective schedules and classes?

Great question.

The problem is that you may not have a good idea on:

  • The approximate number of epochs to train for
  • What a proper initial learning rate is
  • What learning rate range to use for CLRs

Additionally, one of the benefits of using what I call

ctrl + c
training is that it gives you more fine-grained control over your model.

Being able to manually stop your training at a specific epoch, adjust your learning rate, and then resume training from where you left off (and with the new learning rate) is something most learning rate schedulers will not allow you to do.

Once you’ve ran a few experiments with

ctrl + c
training you’ll have a good idea on what your hyperparameters should be — when that happens, then you start incorporating hardcoded learning rate schedules to boost your accuracy even further.

Finally, keep in mind that nearly all seminal CNN papers that were trained on ImageNet used a method to start/stop/resume training.

Just because other methods exist doesn’t make them inherently better — as a deep learning practitioner, you need to learn how to use

ctrl + c
training along with learning rate scheduling (don’t rely strictly on the latter).

If you’re interested in learning more about

ctrl + c
training, along with my tips, suggestions, and best practices when training your own models, be sure to refer to my book, Deep Learning for Computer Vision with Python.

Project structure

Let’s review our project structure:

$ tree --dirsfirst
.
├── output
│   ├── checkpoints
│   ├── resnet_fashion_mnist.json
│   └── resnet_fashion_mnist.png
├── pyimagesearch
│   ├── callbacks
│   │   ├── __init__.py
│   │   ├── epochcheckpoint.py
│   │   └── trainingmonitor.py
│   ├── nn
│   │   ├── __init__.py
│   │   └── resnet.py
│   └── __init__.py
└── train.py

5 directories, 9 files

Today we will review

train.py
 , our training script. This script trains Fashion MNIST on ResNet.

The key to this training script is that it uses two “callbacks”,

epochcheckpoint.py
  and
trainingmonitor.py
 . I review these callbacks in detail inside Deep Learning for Computer Vision with Python — they aren’t covered today, but I encourage you to review the code.

These two callbacks allow us to (1) save our model at the end of every N-th epoch so we can resume training on demand, and (2) output our training plot at the conclusion of each epoch, ensuring we can easily monitor our model for signs of overfitting.

The models are checkpointed (i.e. saved) in the

output/checkpoints/
  directory and the accompanying JSON file. The training plot is overwritten upon each epoch end as
resnet_fashion_mnist.png
 . We’ll be paying close attention to the training plot to determine when to stop training.

Implementing the training script

Let’s get started implementing our Python script that will be used for starting, stopping, and resuming training with Keras.

This guide is written for intermediate practitioners, even though it teaches an essential skill. If you are new to Keras or deep learning, or maybe you just need to brush up on the basics, definitely check out my Keras Tutorial first.

Open up a new file, name it

train.py
, and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.callbacks.epochcheckpoint import EpochCheckpoint
from pyimagesearch.callbacks.trainingmonitor import TrainingMonitor
from pyimagesearch.nn.resnet import ResNet
from sklearn.preprocessing import LabelBinarizer
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.datasets import fashion_mnist
from keras.models import load_model
import keras.backend as K
import numpy as np
import argparse
import cv2
import sys
import os

Lines 2-19 import our required packages, namely our

EpochCheckpoint
  and
TrainingMonitor
  callbacks. We also import our
fashion_mnist
  dataset and
ResNet
  CNN. The
keras.backend as K
  will allow us to retrieve and set our learning rate.

Now let’s go ahead and parse command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--checkpoints", required=True,
	help="path to output checkpoint directory")
ap.add_argument("-m", "--model", type=str,
	help="path to *specific* model checkpoint to load")
ap.add_argument("-s", "--start-epoch", type=int, default=0,
	help="epoch to restart training at")
args = vars(ap.parse_args())

Our command line arguments include:

  • --checkpoints
     : The path to our output checkpoints directory.
  • --model
     : The optional path to a specific model checkpoint to load when resuming training.
  • --start-epoch
     : The optional start epoch can be provided if you are resuming training. By default, training starts at epoch
    0
     .

Let’s go ahead and load our dataset:

# grab the Fashion MNIST dataset (if this is your first time running
# this the dataset will be automatically downloaded)
print("[INFO] loading Fashion MNIST...")
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

# Fashion MNIST images are 28x28 but the network we will be training
# is expecting 32x32 images
trainX = np.array([cv2.resize(x, (32, 32)) for x in trainX])
testX = np.array([cv2.resize(x, (32, 32)) for x in testX])

# scale data to the range of [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# reshape the data matrices to include a channel dimension (required
# for training)
trainX = trainX.reshape((trainX.shape[0], 32, 32, 1))
testX = testX.reshape((testX.shape[0], 32, 32, 1))

Line 34 loads Fashion MNIST.

Lines 38-48 then preprocess the data including (1) resizing to 32×32, (2) scaling pixel intensities to the range [0, 1], and (3) adding a channel dimension.

From here we’ll (1) binarize our labels, and (2) initialize our data augmentation object:

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(width_shift_range=0.1,
	height_shift_range=0.1, horizontal_flip=True,
	fill_mode="nearest")

And now to the code for loading model checkpoints:

# if there is no specific model checkpoint supplied, then initialize
# the network (ResNet-56) and compile the model
if args["model"] is None:
	print("[INFO] compiling model...")
	opt = SGD(lr=1e-1)
	model = ResNet.build(32, 32, 1, 10, (9, 9, 9),
		(64, 64, 128, 256), reg=0.0001)
	model.compile(loss="categorical_crossentropy", optimizer=opt,
		metrics=["accuracy"])

# otherwise, we're using a checkpoint model
else:
	# load the checkpoint from disk
	print("[INFO] loading {}...".format(args["model"]))
	model = load_model(args["model"])

	# update the learning rate
	print("[INFO] old learning rate: {}".format(
		K.get_value(model.optimizer.lr)))
	K.set_value(model.optimizer.lr, 1e-2)
	print("[INFO] new learning rate: {}".format(
		K.get_value(model.optimizer.lr)))

If no model checkpoint is supplied then we need to initialize the model (Lines 62-68). Notice that we specify our initial learning rate as

1e-1
  on Line 64.

Otherwise, Lines 71-81 load the model checkpoint (i.e. a model that was previously stopped via

ctrl + c
 ) and update the learning rate. Line 79 will be the line you edit whenever you want to update the learning rate.

Next, we’ll construct our callbacks:

# build the path to the training plot and training history
plotPath = os.path.sep.join(["output", "resnet_fashion_mnist.png"])
jsonPath = os.path.sep.join(["output", "resnet_fashion_mnist.json"])

# construct the set of callbacks
callbacks = [
	EpochCheckpoint(args["checkpoints"], every=5,
		startAt=args["start_epoch"]),
	TrainingMonitor(plotPath,
		jsonPath=jsonPath,
		startAt=args["start_epoch"])]

Lines 84 and 85 specify our plot and JSON paths.

Lines 88-93 construct two

callbacks
 , putting them directly into a list:
  • EpochCheckpoint
     : This callback is responsible for saving our model as it currently stands at the conclusion of every epoch. That way, if we stop training via
    ctrl + c
      (or an unforeseeable power failure), we don’t lose our machine’s work — for training complex models on huge datasets, this could quite literally save you days of time.
  • TrainingMonitor
     : A callback that saves our training accuracy/loss information as a PNG image plot and JSON dictionary. We’ll be able to open our training plot at any time to see our training progress — valuable information to you as the practitioner, especially for multi-day training processes.

Again, please review

epochcheckpoint.py
  and
trainingmonitor.py
  on your own time for the details and/or if you need to add functionality. I cover these callbacks in detail inside Deep Learning for Computer Vision with Python.

Finally, we have everything we need to start, stop, and resume training. This last block actually starts or resumes training:

# train the network
print("[INFO] training network...")
model.fit_generator(
	aug.flow(trainX, trainY, batch_size=128),
	validation_data=(testX, testY),
	steps_per_epoch=len(trainX) // 128,
	epochs=80,
	callbacks=callbacks,
	verbose=1)

Our call to

.fit_generator
  fits/trains our
model
  using and our callbacks (Lines 97-103). Be sure to review my tutorial on Keras’ fit_generator for more details on how the
.fit_generator
  function is used to train our model.

I’d like to call your attention to the

epochs
parameter (Line 101) — when you adjust your learning rate you’ll typically want to update the epochs as well. Typically you should over-estimate the number of epochs as you’ll see in the next three sections.

For a more detailed explanation of starting, stopping, and resuming training (along with the implementations of my

EpochCheckpoint
and
TrainingMonitor
classes), be sure to refer to Deep Learning for Computer Vision with Python.

Phase #1: 40 epochs at 1e-1

Make sure you’ve used the “Downloads” section of this blog post to download the source code to this tutorial.

From there, open up a terminal and execute the following command:

$ python train.py --checkpoints output/checkpoints
[INFO] loading Fashion MNIST...
[INFO] compiling model...
[INFO] training network...
Epoch 1/40
468/468 [==============================] - 219s 468ms/step - loss: 1.2396 - acc: 0.7171 - val_loss: 0.9564 - val_acc: 0.8130
Epoch 2/40
468/468 [==============================] - 204s 435ms/step - loss: 0.8993 - acc: 0.8307 - val_loss: 0.8464 - val_acc: 0.8553
Epoch 3/40
468/468 [==============================] - 204s 435ms/step - loss: 0.8092 - acc: 0.8625 - val_loss: 0.8407 - val_acc: 0.8513
Epoch 4/40
468/468 [==============================] - 204s 435ms/step - loss: 0.7623 - acc: 0.8780 - val_loss: 0.7457 - val_acc: 0.8869
Epoch 5/40
468/468 [==============================] - 203s 435ms/step - loss: 0.7266 - acc: 0.8895 - val_loss: 0.7809 - val_acc: 0.8682
...
Epoch 36/40
468/468 [==============================] - 204s 435ms/step - loss: 0.4075 - acc: 0.9482 - val_loss: 0.4584 - val_acc: 0.9334
Epoch 37/40
468/468 [==============================] - 204s 435ms/step - loss: 0.4041 - acc: 0.9475 - val_loss: 0.4693 - val_acc: 0.9297
Epoch 38/40
468/468 [==============================] - 204s 435ms/step - loss: 0.3937 - acc: 0.9511 - val_loss: 0.4774 - val_acc: 0.9246
Epoch 39/40
468/468 [==============================] - 203s 434ms/step - loss: 0.3936 - acc: 0.9492 - val_loss: 0.4918 - val_acc: 0.9191
Epoch 40/40
468/468 [==============================] - 204s 435ms/step - loss: 0.3856 - acc: 0.9508 - val_loss: 0.4690 - val_acc: 0.9266

Figure 3: Phase 1 of training ResNet on the Fashion MNIST dataset with a learning rate of 1e-1 for 40 epochs before we stop via ctrl + c, adjust the learning rate, and resume Keras training.

Here I’ve started training ResNet on the Fashion MNIST dataset using the SGD optimizer and an initial learning rate of 1e-1.

After every epoch my loss/accuracy plot in Figure 3 updates, enabling me to monitor training in real-time.

Past epoch 20 we can see training and validation loss starting to diverge, and by epoch 40 I decided to

ctrl + c
out of the
train.py
script.

Phase #2: 10 epochs at 1e-2

The next step is to update both:

  1. My learning rate
  2. The number of epochs to train for

For the learning rate, the standard practice is to lower it by an order of magnitude.

Going back to Line 64 of

train.py
we can see that my initial learning rate is
1e-1
 :
# if there is no specific model checkpoint supplied, then initialize
# the network (ResNet-56) and compile the model
if args["model"] is None:
	print("[INFO] compiling model...")
	opt = SGD(lr=1e-1)
	model = ResNet.build(32, 32, 1, 10, (9, 9, 9),
		(64, 64, 128, 256), reg=0.0001)
	model.compile(loss="categorical_crossentropy", optimizer=opt,
		metrics=["accuracy"])

I’m now going to update my learning rate to be

1e-2
  on Line 79:
# otherwise, we're using a checkpoint model
else:
	# load the checkpoint from disk
	print("[INFO] loading {}...".format(args["model"]))
	model = load_model(args["model"])

	# update the learning rate
	print("[INFO] old learning rate: {}".format(
		K.get_value(model.optimizer.lr)))
	K.set_value(model.optimizer.lr, 1e-2)
	print("[INFO] new learning rate: {}".format(
		K.get_value(model.optimizer.lr)))

So, why am I updating Line 79 and not Line 64?

The reason is due to the

if/else
statement.

The

else
statement handles when we need to load a specific checkpoint from disk — once we have the checkpoint we’ll resume training, thus the learning rate needs to be updated in the
else
block.

Secondly, I also update my

epochs
on Line 101. Initially, the
epochs
value was
80
 :
# train the network
print("[INFO] training network...")
model.fit_generator(
	aug.flow(trainX, trainY, batch_size=128),
	validation_data=(testX, testY),
	steps_per_epoch=len(trainX) // 128,
	epochs=80,
	callbacks=callbacks,
	verbose=1)

I have decided to lower the number of epochs to train for to

40
  epochs:
# train the network
print("[INFO] training network...")
model.fit_generator(
	aug.flow(trainX, trainY, batch_size=128),
	validation_data=(testX, testY),
	steps_per_epoch=len(trainX) // 128,
	epochs=40,
	callbacks=callbacks,
	verbose=1)

Typically you’ll set the

epochs
value to be much larger than what you think it should actually be.

The reason for this is due to the fact that we’re using the

EpochCheckpoint
class to save model snapshots every 5 epochs — if at any point we decide we’re unhappy with the training progress we can just
ctrl + c
out of the script and go back to a previous snapshot.

Thus, there is no harm in training for longer since we can always resume training from a previous model weight file.

After both my learning rate and the number of epochs to train for were updated, I then executed the following command:

$ python train.py --checkpoints output/checkpoints \
	--model output/checkpoints/epoch_40.hdf5 --start-epoch 40
[INFO] loading Fashion MNIST...
[INFO] loading output/checkpoints/epoch_40.hdf5...
[INFO] old learning rate: 0.10000000149011612
[INFO] new learning rate: 0.009999999776482582
[INFO] training network...
Epoch 1/10
468/468 [==============================] - 215s 460ms/step - loss: 0.3569 - acc: 0.9617 - val_loss: 0.4245 - val_acc: 0.9413
Epoch 2/10
468/468 [==============================] - 204s 435ms/step - loss: 0.3452 - acc: 0.9662 - val_loss: 0.4272 - val_acc: 0.9393
Epoch 3/10
468/468 [==============================] - 203s 434ms/step - loss: 0.3423 - acc: 0.9666 - val_loss: 0.4276 - val_acc: 0.9404
Epoch 4/10
468/468 [==============================] - 203s 434ms/step - loss: 0.3400 - acc: 0.9677 - val_loss: 0.4246 - val_acc: 0.9411
Epoch 5/10
468/468 [==============================] - 203s 434ms/step - loss: 0.3377 - acc: 0.9677 - val_loss: 0.4256 - val_acc: 0.9412
Epoch 6/10
468/468 [==============================] - 203s 435ms/step - loss: 0.3346 - acc: 0.9698 - val_loss: 0.4268 - val_acc: 0.9415
Epoch 7/10
468/468 [==============================] - 203s 435ms/step - loss: 0.3353 - acc: 0.9681 - val_loss: 0.4245 - val_acc: 0.9411
Epoch 8/10
468/468 [==============================] - 203s 434ms/step - loss: 0.3301 - acc: 0.9701 - val_loss: 0.4266 - val_acc: 0.9418
Epoch 9/10
468/468 [==============================] - 203s 435ms/step - loss: 0.3280 - acc: 0.9712 - val_loss: 0.4313 - val_acc: 0.9411
Epoch 10/10
468/468 [==============================] - 203s 434ms/step - loss: 0.3291 - acc: 0.9712 - val_loss: 0.4302 - val_acc: 0.9393

Figure 4: Phase 2 of Keras start/stop/resume training. The learning rate is dropped from 1e-1 to 1e-2 as is evident in the plot at epoch 40. I continued training for 10 more epochs until I noticed validation metrics plateauing at which point I stopped training via ctrl + c again.

Notice how we’ve updated our learning rate from

1e-1
  to
1e-2
  and then resumed training.

We immediately see a drop in both training/validation loss as well as an increase in training/validation accuracy.

The problem here is that our validation metrics have plateaued out — there may not be much more gains left without risking overfitting. Because of this, I only allowed training to continue for another 10 epochs before once again

ctrl + c
 ing out of the script.

Phase #3: 5 epochs at 1e-3

For the final phase of training I decided to:

  1. Lower my learning rate from
    1e-2
      to
    1e-3
     .
  2. Allow training to continue (but knowing I would likely only be training for a few epochs given the risk of overfitting).

After updating my learning rate, I executed the following command:

$ python train.py --checkpoints output/checkpoints \
	--model output/checkpoints/epoch_50.hdf5 --start-epoch 50
[INFO] loading Fashion MNIST...
[INFO] loading output/checkpoints/epoch_50.hdf5...
[INFO] old learning rate: 0.009999999776482582
[INFO] new learning rate: 0.0010000000474974513
[INFO] training network...
Epoch 1/5
468/468 [==============================] - 214s 457ms/step - loss: 0.3257 - acc: 0.9721 - val_loss: 0.4278 - val_acc: 0.9411
Epoch 2/5
468/468 [==============================] - 203s 433ms/step - loss: 0.3232 - acc: 0.9728 - val_loss: 0.4292 - val_acc: 0.9406
Epoch 3/5
468/468 [==============================] - 202s 433ms/step - loss: 0.3232 - acc: 0.9730 - val_loss: 0.4291 - val_acc: 0.9399
Epoch 4/5
468/468 [==============================] - 203s 433ms/step - loss: 0.3228 - acc: 0.9729 - val_loss: 0.4287 - val_acc: 0.9407
Epoch 5/5
468/468 [==============================] - 203s 433ms/step - loss: 0.3226 - acc: 0.9725 - val_loss: 0.4276 - val_acc: 0.9415

Figure 5: Upon resuming Keras training for phase 3, I only let the network train for 5 epochs because there is not significant learning progress being made. Using a start/stop/resume training approach with Keras, we have achieved 94.15% validation accuracy.

At this point the learning rate has become so small that the corresponding weight updates are also very small, implying that the model cannot learn much more.

I only allowed training to continue for 5 epochs before killing the script. However, looking at my final metrics you can see what we are obtaining 97.25% training accuracy along with 94.15% validation accuracy.

We were able to achieve this result by using our start, stop, and resuming training method.

At this point, we could either continue to tune our learning rate, utilize a learning rate scheduler, apply Cyclical Learning Rates, or try a new model architecture altogether.

Where can I learn more deep learning tips, suggestions, and best practices?

Figure 6: My deep learning book is the go-to resource for deep learning students, developers, researchers, and hobbyists, alike. Use the book to build your skillset from the bottom up, or read it to gain a deeper understanding. Don’t be left in the dust as the fast-paced AI revolution continues to accelerate.

Today’s tutorial introduced you to starting, stopping, and resuming training with Keras.

If you’re looking for more of my tips, suggestions, and best practices when training deep neural networks, be sure to refer to my book, Deep Learning for Computer Vision with Python.

Inside the book I cover:

  1. Deep learning fundamentals and theory without unnecessary mathematical fluff. I present the basic equations and back them up with code walkthroughs that you can implement and easily understand.
  2. How to spot underfitting and overfitting while you’re using the TrainingMonitor callback.
  3. Recommendations and best practices for selecting learning rates.
  4. My tips/tricks, suggestions, and best practices for training CNNs.

Besides content on learning rates, you’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial you learned how to start, stop, and resume training using Keras and Deep Learning.

Learning how to resume from where your training left off is a super valuable skill for two reasons:

  1. It ensures that if your training script crashes, you can pick up again from the most recent model checkpoint.
  2. It enables you to adjust your learning rate and improve your model accuracy.

When training your own custom neural networks you’ll want to monitor your loss and accuracy — once you start to see validation loss/accuracy plateau, try killing the training script, lowering your learning rate by an order of magnitude, and then resume training.

You’ll often find that this method of training can lead to higher accuracy models.

However, you should be wary of overfitting!

Lowering your learning rate enables your model to descend into lower areas of the loss landscape; however, there is no guarantee that these lower loss areas will still generalize!

You likely will only be able to drop the learning rate 1-3 times before either:

  1. The learning rate becomes too small, making the corresponding weight updates too small, and preventing the model from learning further.
  2. Validation loss stagnates or explodes while training loss continues to drop (implying that the model is overfitting).

If those cases occur and your model is still not satisfactory you should consider adjusting other hyperparameters to your model, including regularization strength, dropout, etc. You may want to explore other model architectures as well.

For more of my tips, suggestions, and best practices when training your own neural networks on your custom datasets, be sure to refer to Deep Learning for Computer Vision with Python, where I cover my best practices in-depth.

To download the source code to this tutorial (and be notified when future tutorials are published on the PyImageSearch blog), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Keras: Starting, stopping, and resuming training appeared first on PyImageSearch.

Rectified Adam (RAdam) optimizer with Keras

$
0
0

In this tutorial, you will learn how to use Keras and the Rectified Adam optimizer as a drop-in replacement for the standard Adam optimizer, potentially leading to a higher accuracy model (and in fewer epochs).

Today we’re kicking off a two-part series on the Rectified Adam optimizer:

  1. Rectified Adam (RAdam) optimizer with Keras (today’s post)
  2. Is Rectified Adam actually *better* than Adam? (next week’s tutorial)

Rectified Adam is a brand new deep learning model optimizer introduced by a collaboration between members of the University of Illinois, Georgia Tech, and Microsoft Research.

The goal of the Rectified Adam optimizer is two-fold:

  1. Obtain a more accurate/more generalizable deep neural network
  2. Complete training in fewer epochs

Sound too good to be true?

Well, it might just be.

You’ll need to read the rest of this tutorial to find out.

To learn how to use the Rectified Adam optimizer with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Rectified Adam (RAdam) optimizer with Keras

In the first part of this tutorial, we’ll discuss the Rectified Adam optimizer, including how it’s different than the standard Adam optimizer (and why we should care).

From there I’ll show you how to use the Rectified Adam optimizer with the Keras deep learning library.

We’ll then run some experiments and compare Adam to Rectified Adam.

What is the Rectified Adam optimizer?

Figure 1: Using the Rectified Adam (RAdam) deep learning optimizer with Keras. (image source: Figure 6 from Liu et al.)

A few weeks ago the deep learning community was all abuzz after Liu et al. published a brand new paper entitled On the Variance of the Adaptive Learning Rate and Beyond.

This paper introduced a new deep learning optimizer called Rectified Adam (or RAdam for short).

Rectified Adam is meant to be a drop-in replacement for the standard Adam optimizer.

So, why is Liu et al.’s contribution so important? And why is the deep learning community so excited about it?

Here’s a quick rundown on why you should care about it:

  • Learning rate warmup heuristics work well to stabilize training.
  • These heuristics also work well to improve generalization.
  • Liu et al. decided to study the theory behind learning rate warmup…
  • …but they found a problem with adaptive learning rates — during the first few batches the model did not generalize well and had very high variance.
  • The authors studied the problem in detail and concluded that the issue can be resolved/mitigated by:
    • 1. Applying warm up with a low initial learning rate.
    • 2. Or, simply turning off the momentum term for the first few sets of input batches.
  • As training continues, the variance will stabilize, and from there, the learning rate can be increased and the momentum term can be added back in.

The authors call this optimizer Rectified Adam (RAdam), a variant of the Adam optimizer, as it “rectifies” (i.e., corrects) the variance/generalization issues apparent in other adaptive learning rate optimizers.

But the question remains — is Rectified Adam actually better than standard Adam?

To answer that, you’ll need to finish reading this tutorial and read next week’s post which includes a full comparison.

For more information about Rectified Adam, including details on both the theoretical and empirical results, be sure to refer to Liu et al.’s paper.

Project structure

Let’s inspect our project layout:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── resnet.py
├── cifar10_adam.png
├── cifar10_rectified_adam.png
└── train.py

1 directory, 5 files

Our ResNet CNN is contained within the

pyimagesearch
  module. The
resnet.py
  file contains the exact ResNet model class included with Deep Learning for Computer Vision with Python.

We will train ResNet on the CIFAR-10 dataset with both the Adam or RAdam optimizers inside of

train.py
 , which we’ll review later in this tutorial. The training script will generate an accuracy/loss plot each time it is run — two .png files for each of the Adam and Rectified Adam experiments are included in the “Downloads”.

Installing Rectified Adam for Keras

This tutorial requires the following software to be installed in your environment:

  • TensorFlow
  • Keras
  • Rectified Adam for Keras
  • scikit-learn
  • matplotlib

Luckily, all of the software is pip installable. If you’ve ever followed one of my installation tutorials, then you know I’m a fan of virtualenv and virtualenvwrapper for managing Python virtual environments. The first command below,

workon
 , assumes that you have these packages installed, but it is optional.

Let’s install the software now:

$ workon <env_name> # replace "<env_name>" with your environment
$ pip install tensorflow # or tensorflow-gpu
$ pip install keras
$ pip install scikit-learn
$ pip install matplotlib

The original implementation of RAdam by Liu et al. was in PyTorch; however, a Keras implementation was created by Zhao HG.

You can install the Keras implementation of Rectified Adam via the following command:

$ pip install keras-rectified-adam

To verify that the Keras + RAdam package has been successfully installed, open up a Python shell and attempt to import

keras_radam
:
$ python
>>> import keras_radam
>>>

Provided there are no errors during the import, you can assume Rectified Adam is successfully installed on your deep learning box!

Implementing Rectified Adam with Keras

Let’s now learn how we can use Rectified Adam with Keras.

If you are unfamiliar with Keras and/or deep learning, please refer to my Keras Tutorial. For a full review of deep learning optimizers, refer to the following chapters of Deep Learning for Computer Vision with Python:

  • Starter Bundle – Chapter 9: “Optimization Methods and Regularization Techniques”
  • Practitioner Bundle –  Chapter 7: “Advanced Optimization Methods”

Otherwise, if you’re ready to go, let’s dive in.

Open up a new file, name it

train.py
, and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras_radam import RAdam
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--plot", type=str, required=True,
	help="path to output training plot")
ap.add_argument("-o", "--optimizer", type=str, default="adam",
	choices=["adam", "radam"],
	help="type of optmizer")
args = vars(ap.parse_args())

Lines 2-15 import our packages and modules. Most notably, Lines 10 and 11 import

Adam
  and
RAdam
  optimizers. We will use the
"Agg"
  backend of matplotlib so that we can save our training plots to disk (Line 3).

Lines 18-24 then parse two command line arguments:

  • --plot
     : The path to our output training plot.
  • --optimizer
     : The type of optimizer that we’ll use for training (either 
    adam
      or
    radam
    ).

From here, let’s go ahead and perform a handful of initializations:

# initialize the number of epochs to train for and batch size
EPOCHS = 75
BS = 128

# load the training and testing data, then scale it into the
# range [0, 1]
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(width_shift_range=0.1,
	height_shift_range=0.1, horizontal_flip=True,
	fill_mode="nearest")

# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer",
	"dog", "frog", "horse", "ship", "truck"]

Lines 27 and 28 initialize the number of epochs to train for as well as our batch size. Feel free to tune these hyperparameters, just keep in mind that they will affect results.

Lines 33-35 load and preprocess our CIFAR-10 data including scaling data to the range [0, 1].

Lines 38-40 then binarize our class labels from integers to vectors.

Lines 43-45 construct our data augmentation object. Be sure to refer to my data augmentation tutorial if you are new to data augmentation, how it works, or why we use it.

Our CIFAR-10 class

labelNames
  are listed on Lines 48 and 49.

Now we’re to the meat of this tutorial — initializing either the Adam or RAdam optimizer:

# check if we are using Adam
if args["optimizer"] == "adam":
	# initialize the Adam optimizer
	print("[INFO] using Adam optimizer")
	opt = Adam(lr=1e-3)

# otherwise, we are using Rectified Adam
else:
	# initialize the Rectified Adam optimizer
	print("[INFO] using Rectified Adam optimizer")
	opt = RAdam(total_steps=5000, warmup_proportion=0.1, min_lr=1e-5)

Depending on the

--optimizer
  command line argument, we’ll either initialize:

With our optimizer ready to go, now we’ll compile and train our model:

# initialize our optimizer and model, then compile it
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
	(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BS),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // BS,
	epochs=EPOCHS,
	verbose=1)

We compile

ResNet
  with our specified optimizer (either Adam or RAdam) via Lines 64-67.

Lines 70-75 launch the training process. Be sure to refer to my tutorial on Keras’ fit_generator method if you are new to using this function to train a deep neural network with Keras.

To wrap up, we print our classification report and plot our loss/accuracy curves over the duration of the training epochs:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BS)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=labelNames))

# determine the number of epochs and then construct the plot title
N = np.arange(0, EPOCHS)
title = "Training Loss and Accuracy on CIFAR-10 ({})".format(
	args["optimizer"])

# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])

Standard Adam Optimizer Results

To train ResNet on the CIFAR-10 dataset using the Adam optimizer, make sure you use the “Downloads” section of this blog post to download the source guide to this guide.

From there, open up a terminal and execute the following command:

$ python train.py --plot cifar10_adam.png --optimizer adam
[INFO] loading CIFAR-10 data...
[INFO] using Adam optimizer
Epoch 1/75
390/390 [==============================] - 205s 526ms/step - loss: 1.9642 - acc: 0.4437 - val_loss: 1.7449 - val_acc: 0.5248
Epoch 2/75
390/390 [==============================] - 185s 475ms/step - loss: 1.5199 - acc: 0.6050 - val_loss: 1.4735 - val_acc: 0.6218
Epoch 3/75
390/390 [==============================] - 185s 474ms/step - loss: 1.2973 - acc: 0.6822 - val_loss: 1.2712 - val_acc: 0.6965
Epoch 4/75
390/390 [==============================] - 185s 474ms/step - loss: 1.1451 - acc: 0.7307 - val_loss: 1.2450 - val_acc: 0.7109
Epoch 5/75
390/390 [==============================] - 185s 474ms/step - loss: 1.0409 - acc: 0.7643 - val_loss: 1.0918 - val_acc: 0.7542
...
Epoch 71/75
390/390 [==============================] - 185s 474ms/step - loss: 0.4215 - acc: 0.9358 - val_loss: 0.6372 - val_acc: 0.8775
Epoch 72/75
390/390 [==============================] - 185s 474ms/step - loss: 0.4241 - acc: 0.9347 - val_loss: 0.6024 - val_acc: 0.8819
Epoch 73/75
390/390 [==============================] - 185s 474ms/step - loss: 0.4226 - acc: 0.9350 - val_loss: 0.5906 - val_acc: 0.8835
Epoch 74/75
390/390 [==============================] - 185s 474ms/step - loss: 0.4198 - acc: 0.9369 - val_loss: 0.6321 - val_acc: 0.8759
Epoch 75/75
390/390 [==============================] - 185s 474ms/step - loss: 0.4127 - acc: 0.9391 - val_loss: 0.5669 - val_acc: 0.8953
[INFO] evaluating network...
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.81      0.94      0.87      1000
  automobile       0.96      0.96      0.96      1000
        bird       0.86      0.87      0.86      1000
         cat       0.84      0.75      0.79      1000
        deer       0.91      0.91      0.91      1000
         dog       0.86      0.84      0.85      1000
        frog       0.89      0.95      0.92      1000
       horse       0.93      0.92      0.93      1000
        ship       0.97      0.88      0.92      1000
       truck       0.96      0.92      0.94      1000

   micro avg       0.90      0.90      0.90     10000
   macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000

Figure 2: To achieve a baseline, we first train ResNet using the Adam optimizer on the CIFAR-10 dataset. We will compare the results to the Rectified Adam (RAdam) optimizer using Keras.

Looking at our output you can see that we obtained 90% accuracy on our testing set.

Examining Figure 2 shows that there is little overfitting going on as well — our training progress is quite stable.

Rectified Adam Optimizer Results

Now, let’s train ResNet on CIFAR-10 using the Rectified Adam optimizer:

$ python train.py --plot cifar10_rectified_adam.png --optimizer radam
[INFO] loading CIFAR-10 data...
[INFO] using Rectified Adam optimizer
Epoch 1/75
390/390 [==============================] - 212s 543ms/step - loss: 2.4813 - acc: 0.2489 - val_loss: 2.0976 - val_acc: 0.3921
Epoch 2/75
390/390 [==============================] - 188s 483ms/step - loss: 1.8771 - acc: 0.4797 - val_loss: 1.8231 - val_acc: 0.5041
Epoch 3/75
390/390 [==============================] - 188s 483ms/step - loss: 1.5900 - acc: 0.5857 - val_loss: 1.4483 - val_acc: 0.6379
Epoch 4/75
390/390 [==============================] - 188s 483ms/step - loss: 1.3919 - acc: 0.6564 - val_loss: 1.4264 - val_acc: 0.6466
Epoch 5/75
390/390 [==============================] - 188s 483ms/step - loss: 1.2457 - acc: 0.7046 - val_loss: 1.2151 - val_acc: 0.7138
...
Epoch 71/75
390/390 [==============================] - 188s 483ms/step - loss: 0.6256 - acc: 0.9054 - val_loss: 0.7919 - val_acc: 0.8551
Epoch 72/75
390/390 [==============================] - 188s 482ms/step - loss: 0.6184 - acc: 0.9071 - val_loss: 0.7894 - val_acc: 0.8537
Epoch 73/75
390/390 [==============================] - 188s 483ms/step - loss: 0.6242 - acc: 0.9051 - val_loss: 0.7981 - val_acc: 0.8519
Epoch 74/75
390/390 [==============================] - 188s 483ms/step - loss: 0.6191 - acc: 0.9062 - val_loss: 0.7969 - val_acc: 0.8519
Epoch 75/75
390/390 [==============================] - 188s 483ms/step - loss: 0.6143 - acc: 0.9098 - val_loss: 0.7935 - val_acc: 0.8525
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.86      0.88      0.87      1000
  automobile       0.91      0.95      0.93      1000
        bird       0.83      0.76      0.79      1000
         cat       0.76      0.69      0.72      1000
        deer       0.85      0.81      0.83      1000
         dog       0.79      0.79      0.79      1000
        frog       0.81      0.94      0.87      1000
       horse       0.89      0.89      0.89      1000
        ship       0.94      0.91      0.92      1000
       truck       0.88      0.91      0.89      1000

   micro avg       0.85      0.85      0.85     10000
   macro avg       0.85      0.85      0.85     10000
weighted avg       0.85      0.85      0.85     10000

Figure 3: The Rectified Adam (RAdam) optimizer is used in conjunction with ResNet using Keras on the CIFAR-10 dataset. But how to the results compare to the standard Adam optimizer?

Notice how the

--optimizer
  switch is set to
radam
 for this second run of our training script.

But wait a second — why are we only obtaining 85% accuracy here?

Isn’t the Rectified Adam optimizer supposed to outperform standard Adam?

Why is our accuracy somehow worse?

Let’s discuss that in the next section.

Is Rectified Adam actually better than Adam?

If you look at our results you’ll see that the standard Adam optimizer outperformed the new Rectified Adam optimizer.

What’s going on here?

Isn’t Rectified Adam supposed to obtain higher accuracy and in fewer epochs?

Why is Rectified Adam performing worse than standard Adam?

Well, to start, keep in mind that we’re looking at the results from only a single dataset here — a true evaluation would look at the results across multiple datasets.

…and that’s exactly what I’ll be doing next week!

To see a full-blown comparison between Adam and Rectified Adam, and determine which optimizer is better, you’ll need to tune in for next week’s blog post!

What’s next?

Figure 4: My deep learning book, Deep Learning for Computer Vision with Python, is trusted by employees and students of top institutions.

If you’re interested in diving head-first into the world of computer vision/deep learning and discovering how to:

  • Select the best optimizer for the job
  • Train Convolutional Neural Networks on your own custom datasets
  • Replicate the results of state-of-the-art papers, including ResNet, SqueezeNet, VGGNet, and others
  • Train your own custom Faster R-CNN, Single Shot Detectors (SSDs), and RetinaNet object detectors
  • Use Mask R-CNN to train your own instance segmentation networks
  • Train Generative Adversarial Networks (GANs)

…then be sure to take a look at my book, Deep Learning for Computer Vision with Python!

My complete, self-study deep learning book is trusted by members of top machine learning schools, companies, and organizations, including Microsoft, Google, Stanford, MIT, CMU, and more!

Readers of my book have gone on to win Kaggle competitions, secure academic grants, and start careers in CV and DL using the knowledge they gained through study and practice.

My book not only teaches the fundamentals, but also teaches advanced techniques, best practices, and tools to ensure that you are armed with practical knowledge and proven coding recipes to tackle nearly any computer vision and deep learning problem presented to you in school, in your research, or in the modern workforce.

Be sure to take a look  — and while you’re at it, don’t forget to grab your (free) table of contents + sample chapters.

Summary

In this tutorial, you learned how to use the Rectified Adam optimizer as a drop-in replacement for the standard Adam optimizer using the Keras deep learning library.

We then ran a set of experiments comparing Adam performance to Rectified Adam performance. Our results show that standard Adam actually outperformed the RAdam optimizer.

So what gives?

Liu et al. reported higher accuracy with fewer epochs in their paper — are we doing anything wrong?

Is something broken with our Rectified Adam optimizer?

To answer those questions you’ll need to tune in next week where I’ll be providing a full set of benchmark experiments comparing Adam to Rectified Adam. You won’t want to miss next week’s post, it’s going to be a good one!

To download the source code to this post (and be notified when next week’s tutorial goes live), be sure to enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Rectified Adam (RAdam) optimizer with Keras appeared first on PyImageSearch.

Is Rectified Adam actually *better* than Adam?

$
0
0

Is the Rectified Adam (RAdam) optimizer actually better than the standard Adam optimizer? According to my 24 experiments, the answer is no, typically not (but there are cases where you do want to use it instead of Adam).

In Liu et al.’s 2018 paper, On the Variance of the Adaptive Learning Rate and Beyond, the authors claim that Rectified Adam can obtain:

  • Better accuracy (or at least identical accuracy when compared to Adam)
  • And in fewer epochs than standard Adam

The authors tested their hypothesis on three different datasets, including one NLP dataset and two computer vision datasets (ImageNet and CIFAR-10).

In each case Rectified Adam outperformed standard Adam…but failed to outperform standard Stochastic Gradient Descent (SGD)!

The Rectified Adam optimizer has some strong theoretical justifications — but as a deep learning practitioner, you need more than just theory — you need to see empirical results applied to a variety of datasets.

And perhaps more importantly, you need to obtain a mastery level experience operating/driving the optimizer (or a small subset of optimizers) as well.

Today is part two in our two-part series on the Rectified Adam optimizer:

  1. Rectified Adam (RAdam) optimizer with Keras (last week’s post)
  2. Is Rectified Adam actually *better* than Adam (today’s tutorial)

If you haven’t yet, go ahead and read part one to ensure you have a good understanding of how the Rectified Adam optimizer works.

From there, read today’s post to help you understand how to design, code, and run experiments used to compare deep learning optimizers.

To learn how to compare Rectified Adam to standard Adam, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Is Rectified Adam actually *better* than Adam?

In the first part of this tutorial, we’ll briefly discuss the Rectified Adam optimizer, including how it works and why it’s interesting to us as deep learning practitioners.

From there, I’ll guide you in designing and planning our set of experiments to compare Rectified Adam to Adam — you can use this section to learn how you design your own deep learning experiments as well.

We’ll then review the project structure for this post, including implementing our training and evaluation scripts by hand.

Finally, we’ll run our experiments, collect results, and ultimately decide is Rectified Adam actually better than Adam?

What is the Rectified Adam optimizer?

Figure 1: The Rectified Adam (RAdam) deep learning optimizer. Is it better than the standard Adam optimizer? (image source: Figure 6 from Liu et al.)

The Rectified Adam optimizer was proposed by Liu et al. in their 2019 paper, On the Variance of the Adaptive Learning Rate and Beyond. In their paper they discussed how their update to the Adam optimizer, called Rectified Adam, can:

  1. Obtain a higher accuracy/more generalizable deep neural network.
  2. Complete training in fewer epochs.

Their work had some strong theoretical justifications as well. They found that adaptive learning rate optimizers (such as Adam) both:

  • Struggle to generalize during the first few batch updates
  • Had very high variance

Liu et al. studied the problem in detail and found that the issue could be rectified (hence the name, Rectified Adam) by:

  1. Applying warm up with a low initial earning rate.
  2. Simply turning off the momentum term for the first few sets of input training batches.

The authors evaluated their experiments on one NLP dataset and two image classification datasets and found that their Rectified Adam implementation outperformed standard Adam (but neither optimizer outperformed standard SGD).

We’ll be continuing Liu et al.’s experiments today and comparing Rectified Adam to standard Adam in 24 separate experiments.

For more details on how the Rectified Adam optimizer works, be sure to review my previous blog post.

Planning our experiments

Figure 2: We will plan our set of experiments to evaluate the performance of the Rectified Adam (RAdam) optimizer using Keras.

To compare Adam to Rectified Adam, we’ll be training three Convolutional Neural Networks (CNNs), including:

  1. ResNet
  2. GoogLeNet
  3. MiniVGGNet

The implementations of these CNNs came directly from my book, Deep Learning for Computer Vision with Python.

These networks will be trained on four datasets:

  1. MNIST
  2. Fashion MNIST
  3. CIFAR-10
  4. CIFAR-100

For each combination of dataset and CNN architecture, we’ll apply two optimizers:

  1. Adam
  2. Rectified Adam

Taking all possible combinations, we end up with 3 x 4 x 2 = 24 separate training experiments.

We’ll run each of these experiments individually, collect, the results, and then interpret them to determine which optimizer is indeed better.

Whenever you plan your own experiments make sure you take the time to write out the list of model architectures, optimizers, and datasets you intend on applying them to. Additionally, you may want to list the hyperparameters you believe are important and are worth tuning (i.e., learning rate, L2 weight decay strength, etc.).

Considering the 24 experiments we plan to conduct, it makes the most sense to automate the data collection phase. From there, we will be able to work on other tasks while the computation is underway (often requiring days of compute time). Upon completion of the data collection for our 24 experiments, we will then be able to sit down and analyze the plots and classification reports in order to evaluate RAdam on our CNNs, datasets, and optimizers.

How to design your own deep learning experiments

Figure 3: Designing your own deep learning experiments, requires thought and planning. Consider your typical deep learning workflow and design your initial set of experiments such that a thorough preliminary investigation can be conducted using automation. Planning for automated evaluation now will save you time (and money) down the line.

Typically, my experiment design workflow goes something like this:

  1. Select 2-3 model architectures that I believe would work well on a particular dataset (i.e., ResNet, VGGNet, etc.).
  2. Decide if I want to train from scratch or perform transfer learning.
  3. Use my learning rate finder to find an acceptable initial learning rate for the SGD optimizer.
  4. Train the model on my dataset using SGD and Keras’ standard decay schedule.
  5. Look at my results from training, select the architecture that performed best, and start tuning my hyperparameters, including model capacity, regularization strength, revisiting the initial learning rate, applying Cyclical Learning Rates, and potentially exploring other optimizers.

You’ll notice that I tend to use SGD in my initial experiments instead of Adam, RMSprop, etc.

Why is that?

To answer that question you’ll need to read the “You need to obtain mastery level experience operating these three optimizers” section below.

Note: For more of my suggestions, tips, and best practices when designing and running your own experiments, be sure to refer to my book, Deep Learning for Computer Vision with Python.

However, in the context of this tutorial, we’re attempting to compare our results to the work of Liu et al.

We, therefore, need to fix the model architectures, training from scratch, learning rate, and optimizers — our experiment design now becomes:

  1. Train ResNet, GoogLeNet, and MiniVGGNet on MNIST, Fashion MNIST, CIFAR-10, and CIFAR-100, respectively.
  2. Train all networks from scratch.
  3. Use the initial, default learning rates for Adam/Rectified Adam (1e-3).
  4. Utilize the Adam and Rectified Adam optimizers for training.
  5. Since these are one-off experiments we’ll not be performing an exhaustive dive on tuning hyperparameters (you can refer to Deep Learning for Computer Vision with Python if you would like details on how to tune your hyperparameters).

At this point we’ve motivated and planned our set of experiments — now let’s learn how to implement our training and evaluation scripts.

Project structure

Go ahead and grab the “Downloads” and then inspect the project directory with the

tree
  command:
$ tree --dirsfirst --filelimit 10
.
├── output [48 entries]
├── plots [12 entries]
├── pyimagesearch
│   ├── __init__.py
│   ├── minigooglenet.py
│   ├── minivggnet.py
│   └── resnet.py
├── combinations.py
├── experiments.sh
├── plot.py
└── train.py

3 directories, 68 files

Our project consists of two output directories:

  • output/
     : Holds our classification report
    .txt
      files organized by experiment. Additionally, there is one 
    .pickle
      file per experiment containing the serialized training history data (for plotting purposes).
  • plots/
     : For each CNN/dataset combination, a stacked accuracy/loss curve plot is output so that we can conveniently compare the Adam and RAdam optimizers.

The

pyimagesearch
  module contains three Convolutional Neural Networks (CNNs) architectures constructed with Keras. These CNN implementations come directly from Deep Learning for Computer Vision with Python.

We will review three Python scripts in today’s tutorial:

  • train.py
     : Our training script accepts a CNN architecture, dataset, and optimizer via command line argument and begins fitting a model accordingly. This script will be invoked automatically for each of our 24 experiments via the
    experiments.sh
      bash script. Our training script produces two types of output files:
    • .txt
       : A classification report printout in scikit-learn’s standard format.
    • .pickle
       : Serialized training history so that it can later be recalled for plotting purposes.
  • combinations.py
     : This script computes all the experiment combinations for which we will train models and collect data. The result of executing this script is a bash/shell script named 
    experiments.sh
     .
  • plot.py
     : Plots accuracy/loss curves for Adam/RAdam using matplotlib directly from the
    output/*.pickle
      files.

Implementing the training script

Our training script will be responsible for accepting:

  1. A given model architecture
  2. A dataset
  3. An optimizer

And from there, the script will handle training the specified model, on the supplied dataset, using the specified optimizer.

We’ll use this script to run each of our 24 experiments.

Let’s go ahead and implement the

train.py
script now:
# import the necessary packages
from pyimagesearch.minigooglenet import MiniGoogLeNet
from pyimagesearch.minivggnet import MiniVGGNet
from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras_radam import RAdam
from keras.datasets import fashion_mnist
from keras.datasets import cifar100
from keras.datasets import cifar10
from keras.datasets import mnist
import numpy as np
import argparse
import pickle
import cv2

Imports include our three CNN architectures, four datasets, and two optimizers (

Adam
  and
RAdam
 ).

Let’s parse command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--history", required=True,
	help="path to output training history file")
ap.add_argument("-r", "--report", required=True,
	help="path to output classification report file")
ap.add_argument("-d", "--dataset", type=str, default="mnist",
	choices=["mnist", "fashion_mnist", "cifar10", "cifar100"],
	help="dataset name")
ap.add_argument("-m", "--model", type=str, default="resnet",
	choices=["resnet", "googlenet", "minivggnet"],
	help="type of model architecture")
ap.add_argument("-o", "--optimizer", type=str, default="adam",
	choices=["adam", "radam"],
	help="type of optmizer")
args = vars(ap.parse_args())

Our command line arguments include:

  • --history
     : The path to the output training history
    .pickle
      file.
  • --report
     : The path to the output classification report
    .txt
      file.
  • --dataset
     : The dataset to train our model on can be any of the
    choices
      listed on Line 26.
  • --model
     : The deep learning model architecture must be one of the
    choices
      on Line 29.
  • --optimizer
     : Our
    adam
      or
    radam
      deep learning optimization method.

Upon providing the command line arguments via the terminal, our training script dynamically sets up and launches the experiment. Output files are named according to the parameters of the experiment.

From here we’ll set two constants and initialize the default number of channels for the dataset:

# initialize the batch size and number of epochs to train
BATCH_SIZE = 128
NUM_EPOCHS = 60

# initialize the number of channels in the dataset
numChans = 1

If our

--dataset
  is MNIST or Fashion MNIST, we’ll load the dataset in the following manner:
# check if we are using either the MNIST or Fashion MNIST dataset
if args["dataset"] in ("mnist", "fashion_mnist"):
	# check if we are using MNIST
	if args["dataset"] == "mnist":
		# initialize the label names for the MNIST dataset
		labelNames = [str(i) for i in range(0, 10)]

		# load the MNIST dataset
		print("[INFO] loading MNIST dataset...")
		((trainX, trainY), (testX, testY)) = mnist.load_data()

	# otherwise, are are using Fashion MNIST
	else:
		# initialize the label names for the Fashion MNIST dataset
		labelNames = ["top", "trouser", "pullover", "dress", "coat",
			"sandal", "shirt", "sneaker", "bag", "ankle boot"]

		# load the Fashion MNIST dataset
		print("[INFO] loading Fashion MNIST dataset...")
		((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

	# MNIST dataset images are 28x28 but the networks we will be
	# training expect 32x32 images
	trainX = np.array([cv2.resize(x, (32, 32)) for x in trainX])
	testX = np.array([cv2.resize(x, (32, 32)) for x in testX])

	# reshape the data matrices to include a channel dimension which
	# is required for training)
	trainX = trainX.reshape((trainX.shape[0], 32, 32, 1))
	testX = testX.reshape((testX.shape[0], 32, 32, 1))

Keep in mind that MNIST images are 28×28 but we need 32×32 images for our architectures. Thus, Lines 66 and 67

resize
  all images in the dataset. Lines 71 and 72 then add the batch dimension.

Otherwise, we have a CIFAR variant

--dataset
  to load:
# otherwise, we must be using a variant of CIFAR
else:
	# update the number of channels in the images
	numChans = 3

	# check if we are using CIFAR-10
	if args["dataset"] == "cifar10":
		# initialize the label names for the CIFAR-10 dataset
		labelNames = ["airplane", "automobile", "bird", "cat",
			"deer", "dog", "frog", "horse", "ship", "truck"]

		# load the CIFAR-10 dataset
		print("[INFO] loading CIFAR-10 dataset...")
		((trainX, trainY), (testX, testY)) = cifar10.load_data()

	# otherwise, we are using CIFAR-100
	else:
		# initialize the label names for the CIFAR-100 dataset
		labelNames = ["apple", "aquarium_fish", "baby", "bear",
			"beaver", "bed", "bee", "beetle",  "bicycle", "bottle",
			"bowl", "boy", "bridge", "bus", "butterfly", "camel",
			"can", "castle", "caterpillar", "cattle", "chair",
			"chimpanzee", "clock",  "cloud", "cockroach", "couch",
			"crab", "crocodile", "cup", "dinosaur",  "dolphin",
			"elephant", "flatfish", "forest", "fox", "girl",
			"hamster",  "house", "kangaroo", "keyboard", "lamp",
			"lawn_mower", "leopard", "lion", "lizard", "lobster",
			"man", "maple_tree", "motorcycle", "mountain", "mouse",
			"mushroom", "oak_tree", "orange", "orchid", "otter",
			"palm_tree", "pear", "pickup_truck", "pine_tree", "plain",
			"plate", "poppy", "porcupine", "possum", "rabbit",
			"raccoon", "ray", "road", "rocket", "rose", "sea", "seal",
			"shark", "shrew", "skunk", "skyscraper", "snail", "snake",
			"spider", "squirrel", "streetcar", "sunflower",
			"sweet_pepper", "table", "tank", "telephone", "television",
			"tiger", "tractor", "train", "trout", "tulip", "turtle",
			"wardrobe", "whale", "willow_tree", "wolf", "woman", "worm"]

		# load the CIFAR-100 dataset
		print("[INFO] loading CIFAR-100 dataset...")
		((trainX, trainY), (testX, testY)) = cifar100.load_data()

CIFAR datasets contain 3-channel color images (Line 77). These datasets are already comprised of 32×32 images (no resizing is necessary).

From here, we’ll scale our data and determine the total number of classes:

# scale the data to the range [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# determine the total number of unique classes in the dataset
numClasses = len(np.unique(trainY))
print("[INFO] {} classes in dataset".format(numClasses))

Followed by initializing this experiment’s deep learning optimizer:

# check if we are using Adam
if args["optimizer"] == "adam":
	# initialize the Adam optimizer
	print("[INFO] using Adam optimizer")
	opt = Adam(lr=1e-3)

# otherwise, we are using Rectified Adam
else:
	# initialize the Rectified Adam optimizer
	print("[INFO] using Rectified Adam optimizer")
	opt = RAdam(total_steps=5000, warmup_proportion=0.1, min_lr=1e-5)

Either

Adam
  or
RAdam
  is initialized according to the
--optimizer
  command line argument switch.

Our

model
  is then built depending upon the
--model
  command line argument:
# check if we are using the ResNet architecture
if args["model"] == "resnet":
	# utilize the ResNet architecture
	print("[INFO] initializing ResNet...")
	model = ResNet.build(32, 32, numChans, numClasses, (9, 9, 9),
		(64, 64, 128, 256), reg=0.0005)

# check if we are using Tiny GoogLeNet
elif args["model"] == "googlenet":
	# utilize the MiniGoogLeNet architecture
	print("[INFO] initializing MiniGoogLeNet...")
	model = MiniGoogLeNet.build(width=32, height=32, depth=numChans,
		classes=numClasses)

# otherwise, we must be using MiniVGGNet
else:
	# utilize the MiniVGGNet architecture
	print("[INFO] initializing MiniVGGNet...")
	model = MiniVGGNet.build(width=32, height=32, depth=numChans,
		classes=numClasses)

Once either ResNet, GoogLeNet, or MiniVGGNet is built, we’ll binarize our labels and construct our data augmentation object:

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=18, zoom_range=0.15,
	width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
	horizontal_flip=True, fill_mode="nearest")

Followed by compiling our model and training the network:

# compile the model and train the network
print("[INFO] training network...")
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // BATCH_SIZE,
	epochs=NUM_EPOCHS,
	verbose=1)

We then evaluate the trained model and dump training history to disk:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BATCH_SIZE)
report = classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=labelNames)

# serialize the training history to disk
print("[INFO] serializing training history...")
f = open(args["history"], "wb")
f.write(pickle.dumps(H.history))
f.close()

# save the classification report to disk
print("[INFO] saving classification report...")
f = open(args["report"], "w")
f.write(report)
f.close()

Each experiment will contain a classification report

.txt
  file along with a serialized training history
.pickle
  file.

The classification reports will be inspected manually whereas the training history files will later be opened by operations inside 

plot.py
 , the training history parsed, and finally plotted.

As you’ve learned, creating a training script that dynamically sets up an experiment is quite straightforward.

Creating our experiment combinations

At this point, we have our training script which can accept a (1) model architecture, (2) dataset, and (3) optimizer, followed by fitting a model using the respective combination.

That being said, are we going to manually run each and every individual command?

No, not only is that a tedious task, it’s also prone to human error.

Instead, let’s create a Python script to generate a shell script containing the

train.py
command for each experiment we want to run.

Open up the

combinations.py
file and insert the following code:
# import the necessary packages
import argparse
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-o", "--output", required=True,
	help="path to output output directory")
ap.add_argument("-s", "--script", required=True,
	help="path to output shell script")
args = vars(ap.parse_args())

Our script requires two command line arguments:

  • --output
     : The path to the output directory where the training files will be stored.
  • --script
     : The path to the output shell script which will contain all of our training script commands with command line argument combinations.

Let’s go ahead and open a new file for writing:

# open the output shell script for writing, then write the header
f = open(args["script"], "w")
f.write("#!/bin/sh\n\n")

# initialize the list of datasets, models, and optimizers
datasets = ["mnist", "fashion_mnist", "cifar10", "cifar100"]
models = ["resnet", "googlenet", "minivggnet"]
optimizers = ["adam", "radam"]

Line 14 opens a shell script file writing. Subsequently, Line 15 writes the “shebang” to indicate that this shell script is executable.

Lines 18-20 then list our

datasets
 ,
models
 , and
optimizers
 .

We will form all possible combinations of experiments from these lists in a nested loop:

# loop over all combinations of datasets, models, and optimizers
for dataset in datasets:
	for model in models:
		for opt in optimizers:
			# build the path to the output training log file
			histFilename = "{}_{}_{}.pickle".format(model, opt, dataset)
			historyPath = os.path.sep.join([args["output"],
				histFilename])

			# build the path to the output report log file
			reportFilename = "{}_{}_{}.txt".format(model, opt, dataset)
			reportPath = os.path.sep.join([args["output"],
				reportFilename])

			# construct the command that will be executed to launch
			# the experiment
			cmd = ("python train.py --history {} --report {} "
				   "--dataset {}  --model {} --optimizer {}").format(
						historyPath, reportPath, dataset, model, opt)

			# write the command to disk
			f.write("{}\n".format(cmd))

# close the shell script file
f.close()

Inside the loop, we:

  • Construct our history file path (Lines 27-29).
  • Assemble our report file path (Lines 32-34).
  • Concatenate each command per the current loop iteration’s combination and write it to the shell file (Lines 38-43).

Finally, we close the shell script file.

Note: I am making the assumption that you are using a Unix machine to run these experiments. If you’re using Windows you should either (1) update this script to generate a batch file instead, or (2) manually execute the

train.py
command for each respective experiment. Note that I do not support Windows on the PyImageSearch blog so you will be on your own to implement it based on this script.

Generating the experiment shell script

Go ahead and use the “Downloads” section of this tutorial to download the source code to the guide.

From there, open up a terminal and execute the

combinations.py
script:
$ python combinations.py --output output --script experiments.sh

After the script has executed you should have a file named

experiments.sh
in your working directory — this file contains the 24 separate experiments we’ll be running to compare Adam to Rectified Adam.

Go ahead and investigate

experiments.sh
now:
#!/bin/sh

python train.py --history output/resnet_adam_mnist.pickle --report output/resnet_adam_mnist.txt --dataset mnist  --model resnet --optimizer adam
python train.py --history output/resnet_radam_mnist.pickle --report output/resnet_radam_mnist.txt --dataset mnist  --model resnet --optimizer radam
python train.py --history output/googlenet_adam_mnist.pickle --report output/googlenet_adam_mnist.txt --dataset mnist  --model googlenet --optimizer adam
python train.py --history output/googlenet_radam_mnist.pickle --report output/googlenet_radam_mnist.txt --dataset mnist  --model googlenet --optimizer radam
python train.py --history output/minivggnet_adam_mnist.pickle --report output/minivggnet_adam_mnist.txt --dataset mnist  --model minivggnet --optimizer adam
python train.py --history output/minivggnet_radam_mnist.pickle --report output/minivggnet_radam_mnist.txt --dataset mnist  --model minivggnet --optimizer radam
python train.py --history output/resnet_adam_fashion_mnist.pickle --report output/resnet_adam_fashion_mnist.txt --dataset fashion_mnist  --model resnet --optimizer adam
python train.py --history output/resnet_radam_fashion_mnist.pickle --report output/resnet_radam_fashion_mnist.txt --dataset fashion_mnist  --model resnet --optimizer radam
python train.py --history output/googlenet_adam_fashion_mnist.pickle --report output/googlenet_adam_fashion_mnist.txt --dataset fashion_mnist  --model googlenet --optimizer adam
python train.py --history output/googlenet_radam_fashion_mnist.pickle --report output/googlenet_radam_fashion_mnist.txt --dataset fashion_mnist  --model googlenet --optimizer radam
python train.py --history output/minivggnet_adam_fashion_mnist.pickle --report output/minivggnet_adam_fashion_mnist.txt --dataset fashion_mnist  --model minivggnet --optimizer adam
python train.py --history output/minivggnet_radam_fashion_mnist.pickle --report output/minivggnet_radam_fashion_mnist.txt --dataset fashion_mnist  --model minivggnet --optimizer radam
python train.py --history output/resnet_adam_cifar10.pickle --report output/resnet_adam_cifar10.txt --dataset cifar10  --model resnet --optimizer adam
python train.py --history output/resnet_radam_cifar10.pickle --report output/resnet_radam_cifar10.txt --dataset cifar10  --model resnet --optimizer radam
python train.py --history output/googlenet_adam_cifar10.pickle --report output/googlenet_adam_cifar10.txt --dataset cifar10  --model googlenet --optimizer adam
python train.py --history output/googlenet_radam_cifar10.pickle --report output/googlenet_radam_cifar10.txt --dataset cifar10  --model googlenet --optimizer radam
python train.py --history output/minivggnet_adam_cifar10.pickle --report output/minivggnet_adam_cifar10.txt --dataset cifar10  --model minivggnet --optimizer adam
python train.py --history output/minivggnet_radam_cifar10.pickle --report output/minivggnet_radam_cifar10.txt --dataset cifar10  --model minivggnet --optimizer radam
python train.py --history output/resnet_adam_cifar100.pickle --report output/resnet_adam_cifar100.txt --dataset cifar100  --model resnet --optimizer adam
python train.py --history output/resnet_radam_cifar100.pickle --report output/resnet_radam_cifar100.txt --dataset cifar100  --model resnet --optimizer radam
python train.py --history output/googlenet_adam_cifar100.pickle --report output/googlenet_adam_cifar100.txt --dataset cifar100  --model googlenet --optimizer adam
python train.py --history output/googlenet_radam_cifar100.pickle --report output/googlenet_radam_cifar100.txt --dataset cifar100  --model googlenet --optimizer radam
python train.py --history output/minivggnet_adam_cifar100.pickle --report output/minivggnet_adam_cifar100.txt --dataset cifar100  --model minivggnet --optimizer adam
python train.py --history output/minivggnet_radam_cifar100.pickle --report output/minivggnet_radam_cifar100.txt --dataset cifar100  --model minivggnet --optimizer radam

Note: Be sure to use the horizontal scroll bar to inspect the entire contents of the

experiments.sh
  script. I intentionally did not break up lines or automatically wrap them for better display. You can also refer to Figure 4 below — I suggest clicking the image to enlarge + inspect it.

Figure 4: The output of our combinations.py file is a shell script listing the training script commands to run in succession. Click image to enlarge.

Notice how there is a

train.py
call for each of the 24 possible combinations of model architecture, dataset, and optimizer. Furthermore, the “shebang” on Line 1 indicates that this shell script is executable.

Running our experiments

The next step is to actually perform each of these experiments.

I executed the shell script on an Amazon EC2 instance with an NVIDIA K80 GPU. It took approximately 48 hours to run all the experiments.

To launch the experiments for yourself, just run the following command:

$ ./experiments.sh

After the script has finished running, your

output/
 directory should be filled with
.pickle
and
.txt
files:
$ ls -l output/
googlenet_adam_cifar10.pickle
googlenet_adam_cifar10.txt
googlenet_adam_cifar100.pickle
googlenet_adam_cifar100.txt
...
resnet_radam_fashion_mnist.pickle
resnet_radam_fashion_mnist.txt
resnet_radam_mnist.pickle
resnet_radam_mnist.txt

The

.txt
files contain the output of scikit-learn’s
classification_report
, a human-readable output that tells us how well our model performed.

The

.pickle
files contain the training history for the model. We’ll use this
.pickle
file to plot both Adam and Rectified Adam’s performance in the next section.

Implementing our Adam vs. Rectified Adam plotting script

Our final Python script,

plot.py
, will be used to plot the performance of Adam vs. Rectified Adam, giving us a nice, clear visualization of a given model architecture trained on a specific dataset.

The plot file opens each Adam/RAdam 

.pickle
file pair and generates a corresponding plot.

Open up

plot.py
and insert the following code:
# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import os

def plot_history(adamHist, rAdamHist, accTitle, lossTitle):
	# determine the total number of epochs used for training, then
	# initialize the figure
	N = np.arange(0, len(adamHist["loss"]))
	plt.style.use("ggplot")
	(fig, axs) = plt.subplots(2, 1, figsize=(7, 9))

	# plot the accuracy for Adam vs. Rectified Adam
	axs[0].plot(N, adamHist["acc"], label="adam_train_acc")
	axs[0].plot(N, adamHist["val_acc"], label="adam_val_acc")
	axs[0].plot(N, rAdamHist["acc"], label="radam_train_acc")
	axs[0].plot(N, rAdamHist["val_acc"], label="radam_val_acc")
	axs[0].set_title(accTitle)
	axs[0].set_xlabel("Epoch #")
	axs[0].set_ylabel("Accuracy")
	axs[0].legend(loc="lower right")

	# plot the loss for Adam vs. Rectified Adam
	axs[1].plot(N, adamHist["loss"], label="adam_train_loss")
	axs[1].plot(N, adamHist["val_loss"], label="adam_val_loss")
	axs[1].plot(N, rAdamHist["loss"], label="radam_train_loss")
	axs[1].plot(N, rAdamHist["val_loss"], label="radam_val_loss")
	axs[1].set_title(lossTitle)
	axs[1].set_xlabel("Epoch #")
	axs[1].set_ylabel("Loss")
	axs[1].legend(loc="upper right")

	# update the layout of the plot
	plt.tight_layout()

Lines 2-6 handle imports, namely the

matplotlib.pyplot
  module.

The

plot_history
  function is responsible for generating two stacked plots via the subplots feature:
  • Training/validation accuracy curves (Lines 16-23).
  • Training/validation loss curves (Lines 26-33).

Both Adam and Rectified Adam training history curves are generated from

adamHist
  and
rAdamHist
  data passed as parameters to the function.

Note: If you are using TensorFlow 2.0 (i.e.,

tf.keras
 ) to run this code , you’ll need to change all occurrences of
acc
  and
val_acc
  to
accuracy
  and
val_accuracy
 , respectively as TensorFlow 2.0 has made a breaking change to the accuracy name.

Let’s handle parsing command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True,
	help="path to input directory of Keras training history files")
ap.add_argument("-p", "--plots", required=True,
	help="path to output directory of training plots")
args = vars(ap.parse_args())

# initialize the list of datasets and models
datasets = ["mnist", "fashion_mnist", "cifar10", "cifar100"]
models = ["resnet", "googlenet", "minivggnet"]

Our command line arguments consist of:

  • --input
     : The path to the input directory of training history files to be parsed for plot generation.
  • --plots
     : Our output path where the plots will be stored.

Lines 47 and 48 list our

datasets
  and
models
 . We’ll loop over the combinations of datasets and models to generate our plots:
# loop over all combinations of datasets and models
for dataset in datasets:
	for model in models:
		# construct the path to the Adam output training history files
		adamFilename = "{}_{}_{}.pickle".format(model, "adam",
			dataset)
		adamPath = os.path.sep.join([args["input"], adamFilename])

		# construct the path to the Rectified Adam output training
		# history files
		rAdamFilename = "{}_{}_{}.pickle".format(model, "radam",
			dataset)
		rAdamPath = os.path.sep.join([args["input"], rAdamFilename])

		# load the training history files for Adam and Rectified Adam,
		# respectively
		adamHist = pickle.loads(open(adamPath, "rb").read())
		rAdamHist = pickle.loads(open(rAdamPath, "rb").read())

		# plot the accuracy/loss for the current dataset, comparing
		# Adam vs. Rectified Adam
		accTitle = "Adam vs. RAdam for '{}' on '{}' (Accuracy)".format(
			model, dataset)
		lossTitle = "Adam vs. RAdam for '{}' on '{}' (Loss)".format(
			model, dataset)
		plot_history(adamHist, rAdamHist, accTitle, lossTitle)

		# construct the path to the output plot
		plotFilename = "{}_{}.png".format(model, dataset)
		plotPath = os.path.sep.join([args["plots"], plotFilename])

		# save the plot and clear it
		plt.savefig(plotPath)
		plt.clf()

Inside our nested

datasets
 /
models
  loop, we:
  • Construct Adam and Rectified Adam’s file paths (Lines 54-62).
  • Load serialized training history (Lines 66 and 67).
  • Generate the plots using our
    plot_history
      function (Lines 71-75).
  • Export the figures to disk (Lines 78-83).

Plotting Adam vs. Rectified Adam

We are now ready to run the

plot.py
script.

Again, make sure you have used the “Downloads” section of this tutorial to download the source code.

From there, execute the following command:

$ python plot.py --input output --plots plots

You can then check the

plots/
directory and ensure it has been populated with the training history figures:
$ ls -l plots/
googlenet_cifar10.png
googlenet_cifar100.png
googlenet_fashion_mnist.png
googlenet_mnist.png
minivggnet_cifar10.png
minivggnet_cifar100.png
minivggnet_fashion_mnist.png
minivggnet_mnist.png
resnet_cifar10.png
resnet_cifar100.png
resnet_fashion_mnist.png
resnet_mnist.png

In the next section, we’ll review the results of our experiments.

Adam vs. Rectified Adam Experiments with MNIST

Figure 5: Montage of samples from the MNIST digit dataset.

Our first set of experiments will compare Adam vs. Rectified Adam on the MNIST dataset, a standard benchmark image classification dataset for handwritten digit recognition.

MNIST – MiniVGGNet

Figure 6: Which is better — Adam or RAdam optimizer using MiniVGGNet on the MNIST dataset?

Our first experiment compares Adam to Rectified Adam when training MiniVGGNet on the MNIST dataset.

Below is the output classification report for the Adam optimizer:

precision    recall  f1-score   support

           0       0.99      1.00      1.00       980
           1       0.99      1.00      0.99      1135
           2       0.98      0.96      0.97      1032
           3       1.00      1.00      1.00      1010
           4       0.99      1.00      0.99       982
           5       0.97      0.98      0.98       892
           6       0.98      0.98      0.98       958
           7       0.99      0.99      0.99      1028
           8       0.99      0.99      0.99       974
           9       1.00      0.99      0.99      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

As well as the classification report for the Rectified Adam optimizer:

precision    recall  f1-score   support

           0       0.99      1.00      0.99       980
           1       1.00      0.99      1.00      1135
           2       0.97      0.97      0.97      1032
           3       0.99      0.99      0.99      1010
           4       0.99      0.99      0.99       982
           5       0.98      0.97      0.97       892
           6       0.98      0.98      0.98       958
           7       0.99      0.99      0.99      1028
           8       0.99      0.99      0.99       974
           9       0.99      0.99      0.99      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

As you can see, we’re obtaining 99% accuracy for both experiments.

Looking at Figure 6 you can observe the warmup period associated with Rectified Adam:

Loss starts off very high and accuracy very low

After warmup is complete the Rectified Adam optimizer catches up with Adam

What’s interesting to note though is that Adam obtains lower loss compared to Rectified Adam — we’ll actually see that trend continue in the rest of the experiments we run (and I’ll explain why this happens as well).

MNIST – GoogLeNet

Figure 7: Which deep learning optimizer is actually better — Rectified Adam or Adam? This plot is from my experiment notebook while testing RAdam and Adam using GoogLeNet on the MNIST dataset.

This next experiment compares Adam to Rectified Adam for GoogLeNet trained on the MNIST dataset.

Below follows the output of the Adam optimizer:

precision    recall  f1-score   support

           0       1.00      1.00      1.00       980
           1       1.00      0.99      1.00      1135
           2       0.96      0.99      0.97      1032
           3       0.99      1.00      0.99      1010
           4       0.99      0.99      0.99       982
           5       0.99      0.96      0.98       892
           6       0.98      0.99      0.98       958
           7       0.99      0.99      0.99      1028
           8       1.00      1.00      1.00       974
           9       1.00      0.98      0.99      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

As well as the output for the Rectified Adam optimizer:

precision    recall  f1-score   support

           0       1.00      1.00      1.00       980
           1       1.00      0.99      1.00      1135
           2       0.98      0.98      0.98      1032
           3       1.00      0.99      1.00      1010
           4       1.00      0.99      1.00       982
           5       0.97      0.99      0.98       892
           6       0.99      0.98      0.99       958
           7       0.99      1.00      0.99      1028
           8       0.99      1.00      1.00       974
           9       1.00      0.99      1.00      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

Again, 99% accuracy is obtained for both optimizers.

This time both the training/validation plots are near identical for both accuracy and loss.

MNIST – ResNet

Figure 8: Training accuracy/loss plot for ResNet on the MNIST dataset using both the RAdam (Rectified Adam) and Adam deep learning optimizers with Keras.

Our final MNIST experiment compares training ResNet using both Adam and Rectified Adam.

Given that MNIST is not a very challenging dataset we obtain 99% accuracy for the Adam optimizer:

precision    recall  f1-score   support

           0       1.00      1.00      1.00       980
           1       1.00      0.99      1.00      1135
           2       0.98      0.98      0.98      1032
           3       0.99      1.00      1.00      1010
           4       0.99      1.00      0.99       982
           5       0.99      0.98      0.98       892
           6       0.98      0.99      0.99       958
           7       0.99      1.00      0.99      1028
           8       0.99      1.00      1.00       974
           9       1.00      0.98      0.99      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

As well as the Rectified Adam optimizer:

precision    recall  f1-score   support

           0       1.00      1.00      1.00       980
           1       1.00      1.00      1.00      1135
           2       0.97      0.98      0.98      1032
           3       1.00      1.00      1.00      1010
           4       0.99      1.00      1.00       982
           5       0.99      0.97      0.98       892
           6       0.99      0.98      0.99       958
           7       0.99      1.00      0.99      1028
           8       1.00      1.00      1.00       974
           9       1.00      0.99      1.00      1009

   micro avg       0.99      0.99      0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

But take a look at Figure 8 — note how Adam obtains much lower loss than Rectified Adam.

That’s not necessarily a bad thing as it may imply that Rectified Adam is obtaining a more generalizable model; however, performance on the testing set is identical so we would need to test on images outside MNIST (which is outside the scope of this blog post).

Adam vs. Rectified Adam Experiments with Fashion MNIST

Figure 9: The Fashion MNIST dataset was created by e-commerce company, Zalando, as a drop-in replacement for MNIST Digits. It is a great dataset to practice/experiment with when using Keras for deep learning. (image source)

Our next set of experiments evaluate Adam vs. Rectified Adam on the Fashion MNIST dataset, a drop-in replacement for the standard MNIST dataset.

You can read more about Fashion MNIST here.

Fashion MNIST – MiniVGGNet

Figure 10: Testing optimizers with deep learning, including new ones such as RAdam, requires multiple experiments. Shown in this figure is the MiniVGGNet CNN trained on the Fashion MNIST dataset with both Adam and RAdam optimizers.

Our first experiment evaluates the MiniVGGNet architecture trained on the Fashion MNIST dataset.

Below you can find the output of training with the Adam optimizer:

precision    recall  f1-score   support

         top       0.95      0.71      0.81      1000
     trouser       0.99      0.99      0.99      1000
    pullover       0.94      0.76      0.84      1000
       dress       0.96      0.80      0.87      1000
        coat       0.84      0.90      0.87      1000
      sandal       0.98      0.98      0.98      1000
       shirt       0.59      0.91      0.71      1000
     sneaker       0.96      0.97      0.96      1000
         bag       0.98      0.99      0.99      1000
  ankle boot       0.97      0.97      0.97      1000

   micro avg       0.90      0.90      0.90     10000
   macro avg       0.92      0.90      0.90     10000
weighted avg       0.92      0.90      0.90     10000

As well as the Rectified Adam optimizer:

precision    recall  f1-score   support

         top       0.85      0.85      0.85      1000
     trouser       1.00      0.97      0.99      1000
    pullover       0.89      0.84      0.87      1000
       dress       0.93      0.81      0.87      1000
        coat       0.85      0.80      0.82      1000
      sandal       0.99      0.95      0.97      1000
       shirt       0.62      0.77      0.69      1000
     sneaker       0.92      0.96      0.94      1000
         bag       0.96      0.99      0.97      1000
  ankle boot       0.97      0.95      0.96      1000

   micro avg       0.89      0.89      0.89     10000
   macro avg       0.90      0.89      0.89     10000
weighted avg       0.90      0.89      0.89     10000

Note that the Adam optimizer outperforms Rectified Adam, obtaining 92% accuracy compared to the 90% accuracy of Rectified Adam.

Furthermore, take a look at the training plot in Figure 10 — training is very stable with validation loss falling below training loss.

With more aggressive training with Adam, we can likely improve our accuracy further.

Fashion MNIST – GoogLeNet

Figure 11: Is either RAdam or Adam a better deep learning optimizer using GoogLeNet? Using the Fashion MNIST dataset with Adam shows signs of overfitting past epoch 30. RAdam appears more stable in this experiment.

We now evaluate GoogLeNet trained on Fashion MNIST using Adam and Rectified Adam.

Below is the classification report from the Adam optimizer:

precision    recall  f1-score   support

         top       0.84      0.89      0.86      1000
     trouser       1.00      0.99      0.99      1000
    pullover       0.87      0.94      0.90      1000
       dress       0.95      0.88      0.91      1000
        coat       0.95      0.85      0.89      1000
      sandal       0.98      0.99      0.98      1000
       shirt       0.79      0.82      0.81      1000
     sneaker       0.99      0.90      0.94      1000
         bag       0.98      0.99      0.99      1000
  ankle boot       0.91      0.99      0.95      1000

   micro avg       0.92      0.92      0.92     10000
   macro avg       0.93      0.92      0.92     10000
weighted avg       0.93      0.92      0.92     10000

As well as the output from the Rectified Adam optimizer:

precision    recall  f1-score   support

         top       0.91      0.83      0.87      1000
     trouser       0.99      0.99      0.99      1000
    pullover       0.94      0.85      0.89      1000
       dress       0.96      0.86      0.90      1000
        coat       0.90      0.91      0.90      1000
      sandal       0.98      0.98      0.98      1000
       shirt       0.70      0.88      0.78      1000
     sneaker       0.97      0.96      0.96      1000
         bag       0.98      0.99      0.99      1000
  ankle boot       0.97      0.97      0.97      1000

   micro avg       0.92      0.92      0.92     10000
   macro avg       0.93      0.92      0.92     10000
weighted avg       0.93      0.92      0.92     10000

This time both optimizers obtain 93% accuracy, but what’s more interesting is to take a look at the training history plot in Figure 11.

Here we can see that training loss starts to diverge past epoch 30 for the Adam optimizer — this divergence grows wider and wider as we continue training. At this point, we should start to be concerned about overfitting using Adam.

On the other hand, Rectified Adam’s performance is stable with no signs of overfitting.

In this particular experiment, it’s clear that Rectified Adam is generalizing better, and had we wished to deploy this model to production, the Rectified Adam optimizer version would be the one to go with.

Fashion MNIST – ResNet

Figure 12: Which deep learning optimizer is better — Adam or Rectified Adam (RAdam) — using the ResNet CNN on the Fashion MNIST dataset?

Our final experiment compares Adam vs. Rectified Adam optimizer trained on the Fashion MNIST dataset using ResNet.

Below is the output of the Adam optimizer:

precision    recall  f1-score   support

         top       0.89      0.83      0.86      1000
     trouser       0.99      0.99      0.99      1000
    pullover       0.84      0.93      0.88      1000
       dress       0.94      0.83      0.88      1000
        coat       0.93      0.85      0.89      1000
      sandal       0.99      0.92      0.95      1000
       shirt       0.71      0.85      0.78      1000
     sneaker       0.88      0.99      0.93      1000
         bag       1.00      0.98      0.99      1000
  ankle boot       0.98      0.93      0.95      1000

   micro avg       0.91      0.91      0.91     10000
   macro avg       0.92      0.91      0.91     10000
weighted avg       0.92      0.91      0.91     10000

Here is the output of the Rectified Adam optimizer:

precision    recall  f1-score   support

         top       0.88      0.86      0.87      1000
     trouser       0.99      0.99      0.99      1000
    pullover       0.91      0.87      0.89      1000
       dress       0.96      0.83      0.89      1000
        coat       0.86      0.92      0.89      1000
      sandal       0.98      0.98      0.98      1000
       shirt       0.72      0.80      0.75      1000
     sneaker       0.95      0.96      0.96      1000
         bag       0.98      0.99      0.99      1000
  ankle boot       0.97      0.96      0.96      1000

   micro avg       0.92      0.92      0.92     10000
   macro avg       0.92      0.92      0.92     10000
weighted avg       0.92      0.92      0.92     10000

Both models obtain 92% accuracy, but take a look at the training history plot in Figure 12.

You can observe that Adam optimizer results in lower loss and that the validation loss follows the training curve.

The Rectified Adam loss is arguably more stable with fewer fluctuations (as compared to standard Adam).

Exactly which one is “better” in this experiment would be dependent on how well the model generalizes to images outside the training, validation, and testing set.

Further experiments would be required to mark the winner here, but my gut tells me that it’s Rectified Adam as (1) accuracy on the testing set is identical, and (2) lower loss doesn’t necessarily mean better generalization (in some cases it means that the model may fail to generalize well) — but at the same time, training/validation loss are near identical for Adam. Without further experiments it’s hard to make the call.

Adam vs. Rectified Adam Experiments with CIFAR-10

Figure 13: The CIFAR-10 benchmarking dataset has 10 classes. We will use it for Rectified Adam experimentation to evaluate if RAdam or Adam is the better choice (image source).

In these experiments, we’ll be comparing Adam vs. Rectified Adam performance using MiniVGGNet, GoogLeNet, and ResNet, all trained on the CIFAR-10 dataset.

CIFAR-10 – MiniVGGNet

Figure 14: Is the RAdam or Adam deep learning optimizer better using the MiniVGGNet CNN on the CIFAR-10 dataset?

Our next experiment compares Adam to Rectified Adam by training MiniVGGNet on the CIFAR-10 dataset.

Below is the output of training using the Adam optimizer:

precision    recall  f1-score   support

    airplane       0.90      0.79      0.84      1000
  automobile       0.90      0.93      0.91      1000
        bird       0.90      0.63      0.74      1000
         cat       0.78      0.68      0.73      1000
        deer       0.83      0.79      0.81      1000
         dog       0.81      0.76      0.79      1000
        frog       0.70      0.95      0.81      1000
       horse       0.85      0.91      0.88      1000
        ship       0.93      0.89      0.91      1000
       truck       0.77      0.95      0.85      1000

   micro avg       0.83      0.83      0.83     10000
   macro avg       0.84      0.83      0.83     10000
weighted avg       0.84      0.83      0.83     10000

And here is the output from Rectified Adam:

precision    recall  f1-score   support

    airplane       0.84      0.72      0.78      1000
  automobile       0.89      0.84      0.86      1000
        bird       0.80      0.41      0.54      1000
         cat       0.66      0.43      0.52      1000
        deer       0.66      0.65      0.66      1000
         dog       0.72      0.55      0.62      1000
        frog       0.48      0.96      0.64      1000
       horse       0.84      0.75      0.79      1000
        ship       0.87      0.88      0.88      1000
       truck       0.68      0.95      0.79      1000

   micro avg       0.71      0.71      0.71     10000
   macro avg       0.74      0.71      0.71     10000
weighted avg       0.74      0.71      0.71     10000

Here the Adam optimizer (84% accuracy) beats out Rectified Adam (74% accuracy).

Furthermore, validation loss is lower than training loss for the majority of training, implying that we can “train harder” by reducing our regularization strength and potentially increasing model capacity.

CIFAR-10 – GoogLeNet

Figure 15: Which is a better deep learning optimizer with the GoogLeNet CNN? The training accuracy/loss plot shows results from using Adam and RAdam as part of automated deep learning experiment data collection.

Next, let’s check out GoogLeNet trained on CIFAR-10 using Adam and Rectified Adam.

Here is the output of Adam:

precision    recall  f1-score   support

    airplane       0.89      0.92      0.91      1000
  automobile       0.92      0.97      0.94      1000
        bird       0.90      0.87      0.88      1000
         cat       0.79      0.86      0.82      1000
        deer       0.92      0.85      0.89      1000
         dog       0.92      0.81      0.86      1000
        frog       0.87      0.96      0.91      1000
       horse       0.95      0.91      0.93      1000
        ship       0.96      0.92      0.94      1000
       truck       0.90      0.94      0.92      1000

   micro avg       0.90      0.90      0.90     10000
   macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000

And here is the output of Rectified Adam:

precision    recall  f1-score   support

    airplane       0.88      0.88      0.88      1000
  automobile       0.93      0.95      0.94      1000
        bird       0.84      0.82      0.83      1000
         cat       0.79      0.75      0.77      1000
        deer       0.89      0.82      0.85      1000
         dog       0.89      0.77      0.82      1000
        frog       0.80      0.96      0.87      1000
       horse       0.89      0.92      0.91      1000
        ship       0.95      0.92      0.93      1000
       truck       0.88      0.95      0.91      1000

   micro avg       0.87      0.87      0.87     10000
   macro avg       0.87      0.87      0.87     10000
weighted avg       0.87      0.87      0.87     10000

The Adam optimizer obtains 90% accuracy, slightly beating out the 87% accuracy of Rectified Adam.

However, Figure 15 tells an interesting story — past epoch 20 there is quite the divergence between Adam’s training and validation loss.

While the Adam optimized model obtained higher accuracy, there are signs of overfitting as validation loss is essentially stagnant past epoch 30.

Additional experiments would be required to mark a true winner but I imagine it would be Rectified Adam after some additional hyperparameter tuning.

CIFAR-10 – ResNet

Figure 16: This Keras deep learning tutorial helps to answer the question: Is Rectified Adam or Adam the better deep learning optimizer? One of the 24 experiments uses the ResNet CNN and CIFAR-10 dataset.

Next, let’s check out ResNet trained using Adam and Rectified Adam on CIFAR-10.

Below you can find the output of the standard Adam optimizer:

precision    recall  f1-score   support

    airplane       0.80      0.92      0.86      1000
  automobile       0.92      0.96      0.94      1000
        bird       0.93      0.74      0.82      1000
         cat       0.93      0.63      0.75      1000
        deer       0.95      0.80      0.87      1000
         dog       0.77      0.88      0.82      1000
        frog       0.75      0.97      0.84      1000
       horse       0.90      0.92      0.91      1000
        ship       0.93      0.93      0.93      1000
       truck       0.91      0.93      0.92      1000

   micro avg       0.87      0.87      0.87     10000
   macro avg       0.88      0.87      0.87     10000
weighted avg       0.88      0.87      0.87     10000

As well as the output from Rectified Adam:

precision    recall  f1-score   support

    airplane       0.86      0.86      0.86      1000
  automobile       0.89      0.95      0.92      1000
        bird       0.85      0.72      0.78      1000
         cat       0.78      0.66      0.71      1000
        deer       0.83      0.81      0.82      1000
         dog       0.82      0.70      0.76      1000
        frog       0.72      0.95      0.82      1000
       horse       0.86      0.90      0.87      1000
        ship       0.94      0.90      0.92      1000
       truck       0.84      0.93      0.88      1000

   micro avg       0.84      0.84      0.84     10000
   macro avg       0.84      0.84      0.83     10000
weighted avg       0.84      0.84      0.83     10000

Adam is the winner here, obtaining 88% accuracy versus Rectified Adam’s 84%.

Adam vs. Rectified Adam Experiments with CIFAR-100

Figure 17: The CIFAR-100 classification dataset is the brother of CIFAR-10 and includes more classes of images. (image source)

The CIFAR-100 dataset is the bigger brother of the CIFAR-10 dataset. As the name suggests, CIFAR-100 includes 100 class labels versus the 10 class labels of CIFAR-10.

While there are more class labels in CIFAR-100, there are actually fewer images per class (CIFAR-10 has 6,000 images per class while CIFAR-100 only has 600 images per class).

CIFAR-100 is, therefore, a more challenging dataset than CIFAR-10.

In this section, we’ll investigate Adam vs. Rectified Adam’s performance on the CIFAR-100 dataset.

CIFAR-100 – MiniVGGNet

Figure 18: Will RAdam stand up to Adam as a preferable deep learning optimizer? How does Rectified Adam stack up to SGD? In this experiment (one of 24), we train MiniVGGNet on the CIFAR-100 dataset and analyze the results.

Let’s apply Adam and Rectified Adam to the MiniVGGNet architecture trained on CIFAR-100.

Below is the output from the Adam optimizer:

precision    recall  f1-score   support

        apple       0.94      0.76      0.84       100
aquarium_fish       0.69      0.66      0.67       100
         baby       0.56      0.45      0.50       100
         bear       0.45      0.22      0.30       100
       beaver       0.31      0.14      0.19       100
          bed       0.48      0.59      0.53       100
          bee       0.60      0.69      0.64       100
       beetle       0.51      0.49      0.50       100
      bicycle       0.50      0.65      0.57       100
       bottle       0.74      0.63      0.68       100
         bowl       0.51      0.38      0.44       100
          boy       0.45      0.37      0.41       100
       bridge       0.64      0.68      0.66       100
          bus       0.42      0.57      0.49       100
    butterfly       0.52      0.50      0.51       100
        camel       0.61      0.33      0.43       100
          can       0.44      0.68      0.54       100
       castle       0.74      0.71      0.72       100
  caterpillar       0.78      0.40      0.53       100
       cattle       0.58      0.48      0.52       100
        chair       0.72      0.80      0.76       100
   chimpanzee       0.74      0.64      0.68       100
        clock       0.39      0.62      0.48       100
        cloud       0.88      0.46      0.61       100
    cockroach       0.80      0.66      0.73       100
        couch       0.56      0.27      0.36       100
         crab       0.43      0.52      0.47       100
    crocodile       0.34      0.32      0.33       100
          cup       0.74      0.73      0.73       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.67      0.87      0.76       100
        whale       0.67      0.58      0.62       100
  willow_tree       0.52      0.44      0.48       100
         wolf       0.40      0.48      0.44       100
        woman       0.39      0.19      0.26       100
         worm       0.66      0.56      0.61       100

    micro avg       0.53      0.53      0.53     10000
    macro avg       0.58      0.53      0.53     10000
 weighted avg       0.58      0.53      0.53     10000

And here is the output from Rectified Adam:

precision    recall  f1-score   support

        apple       0.82      0.70      0.76       100
aquarium_fish       0.57      0.46      0.51       100
         baby       0.55      0.26      0.35       100
         bear       0.22      0.11      0.15       100
       beaver       0.17      0.18      0.17       100
          bed       0.47      0.37      0.42       100
          bee       0.49      0.47      0.48       100
       beetle       0.32      0.52      0.39       100
      bicycle       0.36      0.64      0.46       100
       bottle       0.74      0.40      0.52       100
         bowl       0.47      0.29      0.36       100
          boy       0.54      0.26      0.35       100
       bridge       0.38      0.43      0.40       100
          bus       0.34      0.35      0.34       100
    butterfly       0.40      0.34      0.37       100
        camel       0.37      0.19      0.25       100
          can       0.57      0.45      0.50       100
       castle       0.50      0.57      0.53       100
  caterpillar       0.50      0.21      0.30       100
       cattle       0.47      0.35      0.40       100
        chair       0.54      0.72      0.62       100
   chimpanzee       0.59      0.47      0.53       100
        clock       0.29      0.37      0.33       100
        cloud       0.77      0.60      0.67       100
    cockroach       0.57      0.64      0.60       100
        couch       0.42      0.18      0.25       100
         crab       0.25      0.50      0.33       100
    crocodile       0.30      0.28      0.29       100
          cup       0.71      0.60      0.65       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.61      0.82      0.70       100
        whale       0.57      0.39      0.46       100
  willow_tree       0.36      0.27      0.31       100
         wolf       0.32      0.39      0.35       100
        woman       0.35      0.09      0.14       100
         worm       0.62      0.32      0.42       100

    micro avg       0.41      0.41      0.41     10000
    macro avg       0.46      0.41      0.41     10000
 weighted avg       0.46      0.41      0.41     10000

The Adam optimizer is the clear winner (58% accuracy) over Rectified Adam (46% accuracy).

And just like in our CIFAR-10 experiments, we can likely improve our model performance further by relaxing regularization and increasing model capacity.

CIFAR-100 – GoogLeNet

Figure 19: Adam vs. RAdam optimizer on the CIFAR-100 dataset using GoogLeNet.

Let’s now perform the same experiment, only this time use GoogLeNet.

Here’s the output from the Adam optimizer:

precision    recall  f1-score   support

        apple       0.95      0.80      0.87       100
aquarium_fish       0.88      0.66      0.75       100
         baby       0.59      0.39      0.47       100
         bear       0.47      0.28      0.35       100
       beaver       0.20      0.53      0.29       100
          bed       0.79      0.56      0.65       100
          bee       0.78      0.69      0.73       100
       beetle       0.56      0.58      0.57       100
      bicycle       0.91      0.63      0.75       100
       bottle       0.80      0.71      0.75       100
         bowl       0.46      0.37      0.41       100
          boy       0.49      0.47      0.48       100
       bridge       0.80      0.61      0.69       100
          bus       0.62      0.60      0.61       100
    butterfly       0.34      0.64      0.44       100
        camel       0.93      0.37      0.53       100
          can       0.42      0.69      0.52       100
       castle       0.94      0.50      0.65       100
  caterpillar       0.28      0.77      0.41       100
       cattle       0.56      0.55      0.55       100
        chair       0.85      0.77      0.81       100
   chimpanzee       0.95      0.58      0.72       100
        clock       0.56      0.62      0.59       100
        cloud       0.88      0.68      0.77       100
    cockroach       0.82      0.74      0.78       100
        couch       0.66      0.40      0.50       100
         crab       0.40      0.72      0.52       100
    crocodile       0.36      0.47      0.41       100
          cup       0.65      0.68      0.66       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.86      0.82      0.84       100
        whale       0.40      0.80      0.53       100
  willow_tree       0.46      0.62      0.53       100
         wolf       0.86      0.37      0.52       100
        woman       0.56      0.31      0.40       100
         worm       0.79      0.57      0.66       100

    micro avg       0.56      0.56      0.56     10000
    macro avg       0.66      0.56      0.57     10000
 weighted avg       0.66      0.56      0.57     10000

And here is the output from Rectified Adam:

precision    recall  f1-score   support

        apple       0.93      0.76      0.84       100
aquarium_fish       0.72      0.77      0.74       100
         baby       0.53      0.54      0.53       100
         bear       0.47      0.26      0.34       100
       beaver       0.26      0.22      0.24       100
          bed       0.53      0.49      0.51       100
          bee       0.52      0.62      0.56       100
       beetle       0.50      0.55      0.52       100
      bicycle       0.67      0.79      0.72       100
       bottle       0.78      0.62      0.69       100
         bowl       0.41      0.42      0.41       100
          boy       0.45      0.41      0.43       100
       bridge       0.59      0.72      0.65       100
          bus       0.45      0.53      0.49       100
    butterfly       0.27      0.58      0.37       100
        camel       0.56      0.50      0.53       100
          can       0.58      0.68      0.63       100
       castle       0.81      0.73      0.77       100
  caterpillar       0.51      0.52      0.51       100
       cattle       0.56      0.59      0.58       100
        chair       0.68      0.76      0.72       100
   chimpanzee       0.83      0.73      0.78       100
        clock       0.46      0.56      0.50       100
        cloud       0.88      0.69      0.78       100
    cockroach       0.79      0.68      0.73       100
        couch       0.44      0.39      0.41       100
         crab       0.46      0.47      0.46       100
    crocodile       0.40      0.40      0.40       100
          cup       0.76      0.62      0.68       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.76      0.87      0.81       100
        whale       0.56      0.61      0.59       100
  willow_tree       0.65      0.30      0.41       100
         wolf       0.61      0.55      0.58       100
        woman       0.39      0.30      0.34       100
         worm       0.62      0.61      0.62       100

    micro avg       0.57      0.57      0.57     10000
    macro avg       0.59      0.57      0.57     10000
 weighted avg       0.59      0.57      0.57     10000

The Adam optimizer obtains 66% accuracy, better than Rectified Adam’s 59%.

However, looking at Figure 19 we can see that the validation loss from Adam is quite unstable — towards the end of training validation loss even starts to increase, a sign of overfitting.

CIFAR-100 – ResNet

Figure 20: Training a ResNet model on the CIFAR-100 dataset using both RAdam and Adam for comparison. Which deep learning optimizer is actually better for this experiment?

Below we can find the output of training ResNet using Adam on the CIFAR-100 dataset:

precision    recall  f1-score   support

        apple       0.80      0.89      0.84       100
aquarium_fish       0.86      0.75      0.80       100
         baby       0.75      0.40      0.52       100
         bear       0.71      0.29      0.41       100
       beaver       0.40      0.40      0.40       100
          bed       0.91      0.59      0.72       100
          bee       0.71      0.76      0.73       100
       beetle       0.82      0.42      0.56       100
      bicycle       0.54      0.89      0.67       100
       bottle       0.93      0.62      0.74       100
         bowl       0.75      0.36      0.49       100
          boy       0.43      0.49      0.46       100
       bridge       0.54      0.78      0.64       100
          bus       0.68      0.48      0.56       100
    butterfly       0.34      0.71      0.46       100
        camel       0.72      0.68      0.70       100
          can       0.69      0.60      0.64       100
       castle       0.96      0.69      0.80       100
  caterpillar       0.57      0.62      0.60       100
       cattle       0.91      0.51      0.65       100
        chair       0.79      0.82      0.80       100
   chimpanzee       0.80      0.79      0.79       100
        clock       0.41      0.86      0.55       100
        cloud       0.89      0.74      0.81       100
    cockroach       0.85      0.78      0.81       100
        couch       0.73      0.44      0.55       100
         crab       0.42      0.70      0.53       100
    crocodile       0.47      0.55      0.51       100
          cup       0.88      0.75      0.81       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.79      0.85      0.82       100
        whale       0.58      0.75      0.65       100
  willow_tree       0.71      0.37      0.49       100
         wolf       0.79      0.64      0.71       100
        woman       0.42      0.49      0.45       100
         worm       0.48      0.80      0.60       100

    micro avg       0.63      0.63      0.63     10000
    macro avg       0.68      0.63      0.63     10000
 weighted avg       0.68      0.63      0.63     10000

And here is the output of Rectified Adam:

precision    recall  f1-score   support

        apple       0.86      0.72      0.78       100
aquarium_fish       0.56      0.62      0.59       100
         baby       0.49      0.43      0.46       100
         bear       0.36      0.20      0.26       100
       beaver       0.27      0.17      0.21       100
          bed       0.45      0.42      0.43       100
          bee       0.54      0.61      0.57       100
       beetle       0.47      0.55      0.51       100
      bicycle       0.45      0.69      0.54       100
       bottle       0.64      0.54      0.59       100
         bowl       0.39      0.31      0.35       100
          boy       0.43      0.35      0.38       100
       bridge       0.52      0.67      0.59       100
          bus       0.34      0.47      0.40       100
    butterfly       0.33      0.39      0.36       100
        camel       0.47      0.37      0.41       100
          can       0.49      0.55      0.52       100
       castle       0.76      0.67      0.71       100
  caterpillar       0.43      0.43      0.43       100
       cattle       0.56      0.45      0.50       100
        chair       0.63      0.78      0.70       100
   chimpanzee       0.70      0.71      0.71       100
        clock       0.38      0.49      0.43       100
        cloud       0.80      0.61      0.69       100
    cockroach       0.73      0.72      0.73       100
        couch       0.49      0.36      0.42       100
         crab       0.27      0.45      0.34       100
    crocodile       0.32      0.26      0.29       100
          cup       0.63      0.49      0.55       100
..."d" - "t" classes omitted for brevity
     wardrobe       0.68      0.84      0.75       100
        whale       0.53      0.54      0.54       100
  willow_tree       0.60      0.29      0.39       100
         wolf       0.38      0.35      0.36       100
        woman       0.33      0.29      0.31       100
         worm       0.59      0.63      0.61       100

    micro avg       0.50      0.50      0.50     10000
    macro avg       0.51      0.50      0.49     10000
 weighted avg       0.51      0.50      0.49     10000

The Adam optimizer (68% accuracy) crushes Rectified Adam (51% accuracy) here, but we need to be careful of overfitting. As Figure 20 shows there is quite the divergence between training and validation loss when using the Adam optimizer.

But on the other hand, Rectified Adam really stagnates past epoch 20.

I would be inclined to go with the Adam optimized model here as it obtains significantly higher accuracy; however, I would suggest running some generalization tests using both the Adam and Rectified Adam versions of the model.

What can we take away from these experiments?

One of the first takeaways comes from looking at the training plots of the experiments — using the Rectified Adam optimizer can lead to more stable training.

When training with Rectified Adam we see there are significantly fewer fluctuations, spikes, and drops in validation loss (as compared to standard Adam).

Furthermore, the Rectified Adam validation loss is much more likely to follow training loss, in some cases near exactly.

Keep in mind that raw accuracy isn’t everything when it comes to training your own custom neural networks — stability matters as well as it goes hand-in-hand with generalization.

Whenever I’m training a custom CNN I’m not only looking for high accuracy models, I’m also looking for stability. Stability typically implies that a model is converging nicely and will ideally generalize well.

In this regard, Rectified Adam delivers on its promises from the Liu et al. paper.

Secondly, you should note that Adam obtains lower loss than Rectified Adam in every single experiment.

This behavior is not necessarily a bad thing — it could imply that Rectified Adam is generalizing better, but it’s hard to say without running further experiments using images outside the respective training and testing sets.

Again, keep in mind that lower loss is not necessarily a better model! When you encounter very low loss (especially loss near zero) your model may be overfitting to your training set.

You need to obtain mastery level experience operating these three optimizers

Figure 21: Mastering deep learning optimizers is like driving a car. You know your car and you drive it well no matter the road condition. On the other hand, if you get in an unfamiliar car, something doesn’t feel right until you have a few hours cumulatively behind the wheel. Optimizers are no different. I suggest that SGD be your daily driver until you are comfortable trying alternatives. Then you can mix in RMSprop and Adam. Learn how to use them before jumping into the latest deep learning optimizer.

Becoming familiar with a given optimization algorithm is similar to mastering how to drive a car — you drive your own car better than other people’s cars because you’ve spent so much time driving it; you understand your car and its intricacies.

Often times, a given optimizer is chosen to train a network on a dataset not because the optimizer itself is better, but because the driver (i.e., you, the deep learning practitioner) is more familiar with the optimizer and understands the “art” behind tuning its respective parameters.

As a deep learning practitioner you should gain experience operating a wide variety of optimizers, but in my opinion, you should focus your efforts on learning how to train networks using the three following optimizers:

  1. SGD
  2. RMSprop
  3. Adam

You might be surprised to see SGD is included in this list — isn’t SGD an older, less efficient optimizer than the newer adaptive methods, including Adam, Adagrad, Adadelta, etc.?

Yes, it absolutely is.

But here’s the thing — nearly every state-of-the-art computer vision model is trained using SGD.

Consider the ImageNet classification challenge for example:

  • AlexNet (there’s no mention in the paper but both the official implementation and CaffeNet used SGD)
  • VGGNet (Section 3.1, Training)
  • ResNet (Section 3.4, Implementation)
  • SqueezeNet (it’s not mentioned in the paper, but SGD was used in their solver.prototxt)

Every single one of those classification networks was trained using SGD.

Now let’s consider the object detection networks trained on the COCO dataset:

You guessed it — SGD was used to train all of them.

Yes, SGD may the “old, unsexy” optimizer compared to its younger counterparts, but here’s the thing, standard SGD just works.

That’s not to say that you shouldn’t learn how to use other optimizers — you absolutely should!

But before you go down that rabbit hole, obtain a mastery level of SGD first. From there, start exploring other optimizers — I typically recommend RMSprop and Adam.

And if you find Adam is working well, consider replacing Adam with Rectified Adam to see if you can get an additional boost in accuracy (sort of like how replacing ReLUs with ELUs can usually give you a small boost).

Once you understand how to use those optimizers on a variety of datasets, continue your studies and explore other optimizers as well.

All that said, if you’re new to deep learning, don’t immediately try jumping into the more “advanced” optimizers — you’ll only run into trouble later in your deep learning career.

What’s next?

Figure 4: My deep learning book, Deep Learning for Computer Vision with Python, is trusted by employees and students of top institutions.

If you’re interested in diving head-first into the world of computer vision/deep learning and discovering how to:

  • Understand, practice, and proficiently operate each of the “big three” optimizers
  • Select the best optimizer for the job to achieve state-of-the-art results
  • Train custom Convolutional Neural Networks on your own custom datasets
  • Learn my best practices, tips, and suggestions (leading you to becoming a deep learning expert)

…then be sure to take a look at my book, Deep Learning for Computer Vision with Python!

My complete, self-study deep learning book is trusted by members of top machine learning schools, companies, and organizations, including Microsoft, Google, Stanford, MIT, CMU, and more!

Readers of my book have gone on to win Kaggle competitions, secure academic grants, and start careers in CV and DL using the knowledge they gained through study and practice.

My book not only teaches the fundamentals, but also teaches advanced techniques, best practices, and tools to ensure that you are armed with practical knowledge and proven coding recipes to tackle nearly any computer vision and deep learning problem presented to you in school, research, or the modern workforce.

Be sure to take a look  — and while you’re at it, don’t forget to grab your (free) table of contents + sample chapters.

 

Summary

In this tutorial, we investigated the claims from Liu et al. that the Rectified Adam optimizer outperforms the standard Adam optimizer in terms of:

  1. Better accuracy (or at least identical accuracy when compared to Adam)
  2. And in fewer epochs than standard Adam

To evaluate those claims we trained three CNN models:

  1. ResNet
  2. GoogLeNet
  3. MiniVGGNet

These models were trained on four datasets:

  1. MNIST
  2. Fashion MNIST
  3. CIFAR-10
  4. CIFAR-100

Each combination of the model architecture and dataset were trained using two optimizers:

  • Adam
  • Rectified Adam

In total, we ran 3 x 4 x 2 = 24 different experiments used to compare standard Adam to Rectified Adam.

The result?

In each and every experiment Rectified Adam either performed worse or obtained identical accuracy compared to standard Adam.

That said, training with Rectified Adam was more stable than standard Adam, likely implying that Rectified Adam could generalize better (but additional experiments would be required to validate that claim).

Liu et al.’s study of warmup can be utilized in adaptive learning rate optimizers and will likely help future researchers build on their work and create even better optimizers.

For the time being, my personal opinion is that you’re better off sticking with standard Adam for your initial experiments. If you find that Adam is working well for your experiments, substitute in Rectified Adam to see if you can improve your accuracy.

You should especially try to use the Rectified Adam optimizer if you notice that Adam is working well, but you need better generalization.

The second takeaway from this guide is that you should obtain mastery level experience operating these three optimizers:

  1. SGD
  2. RMSprop
  3. Adam

You should especially learn how to operate SGD.

Yes, SGD is “less sexy” compared to the newer adaptive learning rate methods, but nearly every computer vision state-of-the-art architecture has been trained using it.

Learn how to operate these three optimizers first.

Once you have a good understanding of how they work and how to tune their respective hyperparameters, then move on to other optimizers.

If you need help learning how to use these optimizers and tune their hyperparameters, be sure to refer to Deep Learning for Computer Vision with Python where I cover my tips, suggestions, and best practices in detail.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Is Rectified Adam actually *better* than Adam? appeared first on PyImageSearch.


Why is my validation loss lower than my training loss?

$
0
0

In this tutorial, you will learn the three primary reasons your validation loss may be lower than your training loss when training your own custom deep neural networks.

I first became interested in studying machine learning and neural networks in late high school. Back then there weren’t many accessible machine learning libraries — and there certainly was no scikit-learn.

Every school day at 2:35 PM I would leave high school, hop on the bus home, and within 15 minutes I would be in front of my laptop, studying machine learning, and attempting to implement various algorithms by hand.

I rarely stopped for a break, more than occasionally skipping dinner just so I could keep working and studying late into the night.

During these late-night sessions I would hand-implement models and optimization algorithms (and in Java of all languages; I was learning Java at the time as well).

And since they were hand-implemented ML algorithms by a budding high school programmer with only a single calculus course under his belt, my implementations were undoubtedly prone to bugs.

I remember one night in particular.

The time was 1:30 AM. I was tired. I was hungry (since I skipped dinner). And I was anxious about my Spanish test the next day which I most certainly did not study for.

I was attempting to train a simple feedforward neural network to classify image contents based on basic color channel statistics (i.e., mean and standard deviation).

My network was training…but I was running into a very strange phenomenon:

My validation loss was lower than training loss!

How could that possibly be?

  • Did I accidentally switch the plot labels for training and validation loss? Potentially. I didn’t have a plotting library like matplotlib so my loss logs were being piped to a CSV file and then plotted in Excel. Definitely prone to human error.
  • Was there a bug in my code? Almost certainly. I was teaching myself Java and machine learning at the same time — there were definitely bugs of some sort in that code.
  • Was I just so tired that my brain couldn’t comprehend it? Also very likely. I wasn’t sleeping much during that time of my life and could have very easily missed something obvious.

But, as it turns out it was none of the above cases — my validation loss was legitimately lower than my training loss.

It took me until my junior year of college when I took my first formal machine learning course to finally understand why validation loss can be lower than training loss.

And a few months ago, brilliant author, Aurélien Geron, posted a tweet thread that concisely explains why you may encounter validation loss being lower than training loss.

I was inspired by Aurélien’s excellent explanation and wanted to share it here with my own commentary and code, ensuring that no students (like me many years ago) have to scratch their heads and wonder “Why is my validation loss lower than my training loss?!”.

To learn the three primary reasons your validation loss may be lower than your training loss, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Why is my validation loss lower than my training loss?

In the first part of this tutorial, we’ll discuss the concept of “loss” in a neural network, including what loss represents and why we measure it.

From there we’ll implement a basic CNN and training script, followed by running a few experiments using our freshly implemented CNN (which will result in our validation loss being lower than our training loss).

Given our results, I’ll then explain the three primary reasons your validation loss may be lower than your training loss.

What is “loss” when training a neural network?

Figure 1: What is the “loss” in the context of machine/deep learning? And why is my validation loss lower than my training loss? (image source)

At the most basic level, a loss function quantifies how “good” or “bad” a given predictor is at classifying the input data points in a dataset.

The smaller the loss, the better a job the classifier is at modeling the relationship between the input data and the output targets.

That said, there is a point where we can overfit our model — by modeling the training data too closely, our model loses the ability to generalize.

We, therefore, seek to:

  1. Drive our loss down, thereby improving our model accuracy.
  2. Do so as fast as possible and with as little hyperparameter updates/experiments.
  3. All without overfitting our network and modeling the training data too closely.

It’s a balancing act and our choice of loss function and model optimizer can dramatically impact the quality, accuracy, and generalizability of our final model.

Typical loss functions (also called “objective functions” or “scoring functions”) include:

  • Binary cross-entropy
  • Categorical cross-entropy
  • Sparse categorical cross-entropy
  • Mean Squared Error (MSE)
  • Mean Absolute Error (MAE)
  • Standard Hinge
  • Squared Hinge

A full review of loss functions is outside the scope of this post, but for the time being, just understand that for most tasks:

  • Loss measures the “goodness” of your model
  • The smaller the loss, the better
  • But you need to be careful not to overfit

To learn more about the role of loss functions when training your own custom neural networks, be sure to:

Additionally, if you would like a complete, step-by-step guide on the role of loss functions in machine learning/neural networks, make sure you read Deep Learning for Computer Vision with Python where I explain parameterized learning and loss methods in detail (including code and experiments).

Project structure

Go ahead and use the “Downloads” section of this post to download the source code. From there, inspect the project/directory structure via the

tree
  command:
$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── minivggnet.py
├── fashion_mnist.py
├── plot_shift.py
└── training.pickle

1 directory, 5 files

Today we’ll be using a smaller version of VGGNet called MiniVGGNet. The

pyimagesearch
  module includes this CNN.

Our

fashion_mnist.py
  script trains MiniVGGNet on the Fashion MNIST dataset. I’ve written about training MiniVGGNet on Fashion MNIST in a previous blog post, so today we won’t go into too much detail.

Today’s training script generates a

training.pickle
  file of the training accuracy/loss history. Inside the Reason #2 section below, we’ll use
plot_shift.py
  to shift the training loss plot half an epoch to demonstrate that the time at which loss is measured plays a role when validation loss is lower than training loss.

Let’s dive into the three reasons now to answer the question, “Why is my validation loss lower than my training loss?”.

Reason #1: Regularization applied during training, but not during validation/testing

Figure 2: Aurélien answers the question: “Ever wonder why validation loss > training loss?” on his twitter feed (image source). The first reason is that regularization is applied during training but not during validation/testing.

When training a deep neural network we often apply regularization to help our model:

  1. Obtain higher validation/testing accuracy
  2. And ideally, to generalize better to the data outside the validation and testing sets

Regularization methods often sacrifice training accuracy to improve validation/testing accuracy — in some cases that can lead to your validation loss being lower than your training loss.

Secondly, keep in mind that regularization methods such as dropout are not applied at validation/testing time.

As Aurélien shows in Figure 2, factoring in regularization to validation loss (ex., applying dropout during validation/testing time) can make your training/validation loss curves look more similar.

Reason #2: Training loss is measured during each epoch while validation loss is measured after each epoch

Figure 3: Reason #2 for validation loss sometimes being less than training loss has to do with when the measurement is taken (image source).

The second reason you may see validation loss lower than training loss is due to how the loss value are measured and reported:

  1. Training loss is measured during each epoch
  2. While validation loss is measured after each epoch

Your training loss is continually reported over the course of an entire epoch; however, validation metrics are computed over the validation set only once the current training epoch is completed.

This implies, that on average, training losses are measured half an epoch earlier.

If you shift the training losses half an epoch to the left you’ll see that the gaps between the training and losses values are much smaller.

For an example of this behavior in action, read the following section.

Implementing our training script

We’ll be implementing a simple Python script to train a small VGG-like network (called MiniVGGNet) on the Fashion MNIST dataset. During training, we’ll save our training and validation losses to disk. We’ll then create a separate Python script to compare both the unshifted and shifted loss plots.

Let’s get started by implementing the training script:

# import the necessary packages
from pyimagesearch.minivggnet import MiniVGGNet
from sklearn.metrics import classification_report
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical
import argparse
import pickle

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--history", required=True,
	help="path to output training history file")
args = vars(ap.parse_args())

Lines 2-8 import our required packages, modules, classes, and functions. Namely, we import

MiniVGGNet
  (our CNN),
fashion_mnist
  (our dataset), and
pickle
  (ensuring that we can serialize our training history for a separate script to handle plotting).

The command line argument,

--history
 , points to the separate
.pickle
  file which will soon contain our training history (Lines 11-14).

We then initialize a few hyperparameters, namely our number of epochs to train for, initial learning rate, and batch size:

# initialize the number of epochs to train for, base learning rate,
# and batch size
NUM_EPOCHS = 25
INIT_LR = 1e-2
BS = 32

We then proceed to load and preprocess our Fashion MNIST data:

# grab the Fashion MNIST dataset (if this is your first time running
# this the dataset will be automatically downloaded)
print("[INFO] loading Fashion MNIST...")
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

# we are using "channels last" ordering, so the design matrix shape
# should be: num_samples x rows x columns x depth
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))

# scale data to the range of [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# one-hot encode the training and testing labels
trainY = to_categorical(trainY, 10)
testY = to_categorical(testY, 10)

# initialize the label names
labelNames = ["top", "trouser", "pullover", "dress", "coat",
	"sandal", "shirt", "sneaker", "bag", "ankle boot"]

Lines 25-34 load and preprocess the training/validation data.

Lines 37 and 38 binarize our class labels, while Lines 41 and 42 list out the human-readable class label names for classification report purposes later.

From here we have everything we need to compile and train our MiniVGGNet model on the Fashion MNIST data:

# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / NUM_EPOCHS)
model = MiniVGGNet.build(width=28, height=28, depth=1, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
print("[INFO] training model...")
H = model.fit(trainX, trainY,
	validation_data=(testX, testY),
	 batch_size=BS, epochs=NUM_EPOCHS)

Lines 46-49 initialize and compile the

MiniVGGNet
  model.

Lines 53-55 then fit/train the

model
 .

From here we will evaluate our

model
  and serialize our training history:
# make predictions on the test set and show a nicely formatted
# classification report
preds = model.predict(testX)
print("[INFO] evaluating network...")
print(classification_report(testY.argmax(axis=1), preds.argmax(axis=1),
	target_names=labelNames))

# serialize the training history to disk
print("[INFO] serializing training history...")
f = open(args["history"], "wb")
f.write(pickle.dumps(H.history))
f.close()

Lines 59-62 make predictions on the test set and print a classification report to the terminal.

Lines 66-68 serialize our training accuracy/loss history to a

.pickle
  file. We’ll use the training history in a separate Python script to plot the loss curves, including one plot showing a one-half epoch shift.

Go ahead and use the “Downloads” section of this tutorial to download the source code.

From there, open up a terminal and execute the following command:

$ python fashion_mnist.py --history training.pickle
[INFO] loading Fashion MNIST...
[INFO] compiling model...
[INFO] training model...
Train on 60000 samples, validate on 10000 samples   
Epoch 1/25
60000/60000 [==============================] - 200s 3ms/sample - loss: 0.5433 - accuracy: 0.8181 - val_loss: 0.3281 - val_accuracy: 0.8815
Epoch 2/25
60000/60000 [==============================] - 194s 3ms/sample - loss: 0.3396 - accuracy: 0.8780 - val_loss: 0.2726 - val_accuracy: 0.9006
Epoch 3/25
60000/60000 [==============================] - 193s 3ms/sample - loss: 0.2941 - accuracy: 0.8943 - val_loss: 0.2722 - val_accuracy: 0.8970
Epoch 4/25
60000/60000 [==============================] - 193s 3ms/sample - loss: 0.2717 - accuracy: 0.9017 - val_loss: 0.2334 - val_accuracy: 0.9144
Epoch 5/25
60000/60000 [==============================] - 194s 3ms/sample - loss: 0.2534 - accuracy: 0.9086 - val_loss: 0.2245 - val_accuracy: 0.9194
...
Epoch 21/25
60000/60000 [==============================] - 195s 3ms/sample - loss: 0.1797 - accuracy: 0.9340 - val_loss: 0.1879 - val_accuracy: 0.9324
Epoch 22/25
60000/60000 [==============================] - 194s 3ms/sample - loss: 0.1814 - accuracy: 0.9342 - val_loss: 0.1901 - val_accuracy: 0.9313
Epoch 23/25
60000/60000 [==============================] - 193s 3ms/sample - loss: 0.1766 - accuracy: 0.9351 - val_loss: 0.1866 - val_accuracy: 0.9320
Epoch 24/25
60000/60000 [==============================] - 193s 3ms/sample - loss: 0.1770 - accuracy: 0.9347 - val_loss: 0.1845 - val_accuracy: 0.9337
Epoch 25/25
60000/60000 [==============================] - 194s 3ms/sample - loss: 0.1734 - accuracy: 0.9372 - val_loss: 0.1871 - val_accuracy: 0.9312
[INFO] evaluating network...
              precision    recall  f1-score   support

         top       0.87      0.91      0.89      1000
     trouser       1.00      0.99      0.99      1000
    pullover       0.91      0.91      0.91      1000
       dress       0.93      0.93      0.93      1000
        coat       0.87      0.93      0.90      1000
      sandal       0.98      0.98      0.98      1000
       shirt       0.83      0.74      0.78      1000
     sneaker       0.95      0.98      0.97      1000
         bag       0.99      0.99      0.99      1000
  ankle boot       0.99      0.95      0.97      1000

    accuracy                           0.93     10000
   macro avg       0.93      0.93      0.93     10000
weighted avg       0.93      0.93      0.93     10000

[INFO] serializing training history...

Checking the contents of your working directory you should have a file named

training.pickle
— this file contains our training history logs.
$ ls *.pickle
training.pickle

In the next section we’ll learn how to plot these values and shift our training information a half epoch to the left, thereby making our training/validation loss curves look more similar.

Shifting our training loss values

Our

plot_shift.py
script is used to plot the training history output from
fashion_mnist.py
. Using this script we can investigate how shifting our training loss a half epoch to the left can make our training/validation plots look more similar.

Open up the

plot_shift.py
file and insert the following code:
# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True,
	help="path to input training history file")
args = vars(ap.parse_args())

Lines 2-5 import

matplotlib
  (for plotting), NumPy (for a simple array creation operation),
argparse
  (command line arguments), and
pickle
  (to load our serialized training history).

Lines 8-11 parse the

--input
  command line argument which points to our
.pickle
  training history file on disk.

Let’s go ahead load our data and initialize our plot figure:

# load the training history
H = pickle.loads(open(args["input"], "rb").read())

# determine the total number of epochs used for training, then
# initialize the figure
epochs = np.arange(0, len(H["loss"]))
plt.style.use("ggplot")
(fig, axs) = plt.subplots(2, 1)

Line 14 loads our serialized training history 

.pickle
  file using the
--input
  command line argument.

Line 18 makes space for our x-axis which spans from zero to the number of

epochs
  in the training history.

Lines 19 and 20 set up our plot figure to be two stacked plots in the same image:

  • The top plot will contain loss curves as-is.
  • The bottom plot, on the other hand, will include a shift for the training loss (but not for the validation loss). The training loss will be shifted half an epoch to the left just as in Aurélien’s tweet. We’ll then be able to observe if the plots line up more closely.

Let’s generate our top plot:

# plot the *unshifted* training and validation loss
plt.style.use("ggplot")
axs[0].plot(epochs, H["loss"], label="train_loss")
axs[0].plot(epochs, H["val_loss"], label="val_loss")
axs[0].set_title("Unshifted Loss Plot")
axs[0].set_xlabel("Epoch #")
axs[0].set_ylabel("Loss")
axs[0].legend()

And then draw our bottom plot:

# plot the *shifted* training and validation loss
axs[1].plot(epochs - 0.5, H["loss"], label="train_loss")
axs[1].plot(epochs, H["val_loss"], label="val_loss")
axs[1].set_title("Shifted Loss Plot")
axs[1].set_xlabel("Epoch #")
axs[1].set_ylabel("Loss")
axs[1].legend()

# show the plots
plt.tight_layout()
plt.show()

Notice on Line 32 that the training loss is shifted

0.5
  epochs to the left — the heart of this example.

Let’s now analyze our training/validation plots.

Open up a terminal and execute the following command:

$ python plot_shift.py --input training.pickle

Figure 4: Shifting the training loss plot 1/2 epoch to the left yields more similar plots. Clearly the time of measurement answers the question, “Why is my validation loss lower than training loss?”.

As you can observe, shifting the training loss values a half epoch to the left (bottom) makes the training/validation curves much more similar versus the unshifted (top) plot.

Reason #3: The validation set may be easier than the training set (or there may be leaks)

Figure 5: Consider how your validation set was acquired/generated. Common mistakes could lead to validation loss being less than training loss. (image source)

The final most common reason for validation loss being lower than your training loss is due to the data distribution itself.

Consider how your validation set was acquired:

  • Can you guarantee that the validation set was sampled from the same distribution as the training set?
  • Are you certain that the validation examples are just as challenging as your training images?
  • Can you assure there was no “data leakage” (i.e., training samples getting accidentally mixed in with validation/testing samples)?
  • Are you confident your code created the training, validation, and testing splits properly?

Every single deep learning practitioner has made the above mistakes at least once in their career.

Yes, it is embarrassing when it happens — but that’s the point — it does happen, so take the time now to investigate your code.

BONUS: Are you training hard enough?

Figure 6: If you are wondering why your validation loss is lower than your training loss, perhaps you aren’t “training hard enough”.

One aspect that Aurélien didn’t touch on in his tweets is the concept of “training hard enough”.

When training a deep neural network, our biggest concern is nearly always overfitting — and in order to combat overfitting, we introduce regularization techniques (discussed in Reason #1 above). We apply regularization in the form of:

  • Dropout
  • L2 weight decay
  • Reducing model capacity (i.e., a more shallow model)

We also tend to be a bit more conservative with our learning rate to ensure our model doesn’t overshoot areas of lower loss in the loss landscape.

That’s all fine and good, but sometimes we end up over-regularizing our models.

If you go through all three reasons for validation loss being lower than training loss detailed above, you may have over-regularized your model. Start to relax your regularization constraints by:

  • Lowering your L2 weight decay strength.
  • Reducing the amount of dropout you’re applying.
  • Increasing your model capacity (i.e., make it deeper).

You should also try training with a larger learning rate as you may have become too conservative with it.

Do you have unanswered deep learning questions?

You can train your first neural network in minutes…with just a few lines of Python.

But if you are just getting started, you may have questions such as today’s: “Why is my validation loss lower than training loss?”

Similar questions can stump you for weeks — maybe months. You might find yourself searching online for answers only to be disappointed in the explanations. Or maybe you posted your burning question to Stack Overflow or Quora and you are still hearing crickets.

It doesn’t have to be like that.

What you need is a comprehensive book to jumpstart your education. Discover and study deep learning the right way in my book: Deep Learning for Computer Vision with Python.

Inside the book, you’ll find self-study tutorials and end-to-end projects on topics like:

  • Convolutional Neural Networks
  • Object Detection via Faster R-CNNs and SSDs
  • Generative Adversarial Networks (GANs)
  • Emotion/Facial Expression Recognition
  • Best practices, tips, and rules of thumb
  • …and much more!

Using the knowledge gained by reading this book you’ll finally be able to bring deep learning to your own projects.

What’s more is that you’ll learn the “art” of training neural networks, answering questions such as:

  1. Which deep learning CNN architecture is right for my task at hand?
  2. How can I spot underfitting and overfitting either after or during training?
  3. What is the most effective way to set my initial learning rate and to use a learning rate decay scheduler to improve accuracy?
  4. Which deep learning optimizer is the best one for the job and how do I evaluate new, state-of-the-art optimizers as they are published?
  5. How do I apply regularization techniques effectively ensuring that I am not over-regularizing my model?

You’ll find the answers to all of these questions inside my deep learning book.

Customers of mine attest that this is the best deep learning education you’ll find online — inside the book you’ll find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

So why wait?

The cost of fumbling around the internet looking for answers to your questions only to find sub-par resources is costing you time that you can’t get back. The value you receive by reading my book is far, far greater than the price you pay and will ensure you receive a positive return on your investment of time and finances, I guarantee that.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Summary

Today’s tutorial was heavily inspired by the following tweet thread from author, Aurélien Geron.

Inside the thread, Aurélien expertly and concisely explained the three reasons your validation loss may be lower than your training loss when training a deep neural network:

  1. Reason #1: Regularization is applied during training, but not during validation/testing. If you add in the regularization loss during validation/testing, your loss values and curves will look more similar.
  2. Reason #2: Training loss is measured during each epoch while validation loss is measured after each epoch. On average, the training loss is measured 1/2 an epoch earlier. If you shift your training loss curve a half epoch to the left, your losses will align a bit better.
  3. Reason #3: Your validation set may be easier than your training set or there is a leak in your data/bug in your code. Make sure your validation set is reasonably large and is sampled from the same distribution (and difficulty) as your training set.
  4. BONUS: You may be over-regularizing your model. Try reducing your regularization constraints, including increasing your model capacity (i.e., making it deeper with more parameters), reducing dropout, reducing L2 weight decay strength, etc.

Hopefully, this helps clear up any confusion on why your validation loss may be lower than your training loss!

It was certainly a head-scratcher for me when I first started studying machine learning and neural networks and it took me until mid-college to understand exactly why that happens — and none of the explanations back then were as clear and concise as Aurélien’s.

I hope you enjoyed today’s tutorial!

To download the source code (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Why is my validation loss lower than my training loss? appeared first on PyImageSearch.

Keras vs. tf.keras: What’s the difference in TensorFlow 2.0?

$
0
0

In this tutorial you’ll discover the difference between Keras and

tf.keras
 , including what’s new in TensorFlow 2.0.

Today’s tutorial is inspired from an email I received last Tuesday from PyImageSearch reader, Jeremiah.

Jeremiah asks:

Hi Adrian, I saw that TensorFlow 2.0 was released a few days ago.

TensorFlow developers seem to be promoting Keras, or rather, something called tf.keras, as the recommended high-level API for TensorFlow 2.0.

But I thought Keras was its own separate package?

I’m so confused on “which Keras package” I should be using when training my own networks.

Secondly, is TensorFlow 2.0 worth upgrading to?

I’ve seen a few tutorials in the deep learning blogosphere discussing TensorFlow 2.0 but with all the confusion regarding Keras,

tf.keras
 , and TensorFlow 2.0, I’m at a loss for where to start.

Could you shed some light on this area?

Great questions, Jeremiah.

Just in case you didn’t hear, the long-awaited TensorFlow 2.0 was officially released on September 30th.

And while it’s certainly a time for celebration, many deep learning practitioners such as Jeremiah are scratching their heads:

  • What does the TensorFlow 2.0 release mean for me as a Keras user?
  • Am I supposed to use the
    keras
    package for training my own neural networks?
  • Or should I be using the
    tf.keras
    submodule inside TensorFlow 2.0 instead?
  • Are there TensorFlow 2.0 features that I should care about as a Keras user?

The transition from TensorFlow 1.x to TensorFlow 2.0 is going to be a bit of a rocky one, at least to start, but with the right understanding, you’ll be able to navigate the migration with ease.

Inside the rest of this tutorial, I’ll be discussing the similarities between Keras,

tf.keras
 , and the TensorFlow 2.0 release, including the features you should care about.

To learn the difference between Keras, tf.keras, and TensorFlow 2.0, just keep reading!

Keras vs. tf.keras: What’s the difference in TensorFlow 2.0?

In the first part of this tutorial, we’ll discuss the intertwined history between Keras and TensorFlow, including how their joint popularities fed each other, growing and nurturing each other, leading us to where we are today.

I’ll then discuss why you should be using

tf.keras
for all your future deep learning projects and experiments.

Next, I’ll discuss the concept of a “computational backend” and how TensorFlow’s popularity enabled it to become Keras’ most prevalent backend, paving the way for Keras to be integrated into the

tf.keras
submodule of TensorFlow.

Finally, we’ll discuss some of the most popular TensorFlow 2.0 features you should care about as a Keras user, including:

  • Sessions and eager execution
  • Automatic differentiation
  • Model and layer subclassing
  • Better multi-GPU/distributed training support

Included in TensorFlow 2.0 is a complete ecosystem comprised of TensorFlow Lite (for mobile and embedded devices) and TensorFlow Extended for development production machine learning pipelines (for deploying production models).

Let’s get started!

The intertwined relationship between Keras and TensorFlow

Figure 1: Keras and TensorFlow have a complicated history together. Read this section for the Cliff’s Notes of their love affair. With TensorFlow 2.0, you should be using tf.keras rather than the separate Keras package.

Understanding the complicated, intertwined relationship between Keras and TensorFlow is like listening to the love story of two high school sweethearts who start dating, break up, and eventually find their way together — it’s long, detailed, and at some points even contradictory.

Instead of recalling the full love story for you, instead we’ll review the CliffsNotes:

  • Keras was originally created and developed by Google AI Developer/Researcher, Francois Chollet.
  • Francois committed and released the first version of Keras to his GitHub on March 27th, 2015.
  • Initially, Francois developed Keras to facilitate his own research and experiments.
  • However, with the explosion of deep learning popularity, many developers, programmers, and machine learning practitioners flocked to Keras due to its easy-to-use API.
  • Back then, there weren’t too many deep learning libraries available — the popular ones included Torch, Theano, and Caffe.
    • The problem with these libraries was that it was like trying to write assembly/C++ to perform your experiments — tedious, time-consuming, and inefficient.
    • Keras, on the other hand, was extremely easy to use, making it possible for researchers and developers to iterate on their experiments faster.
  • In order to train your own custom neural networks, Keras required a backend.
    • A backend is a computational engine — it builds the network graph/topology, runs the optimizers, and performs the actual number crunching.
    • To understand the concept of a backend, consider building a website from scratch. Here you may use the PHP programming language and a SQL database. Your SQL database is your backend. You could use MySQL, PostgreSQL, or SQL Server as your database; however, your PHP code used to interact with the database will not change (provided you’re using some sort of MVC paradigm that abstracts the database layer, of course). Essentially, PHP doesn’t care what database is being used, as long as it plays with PHP’s rules.
    • The same is true with Keras. You can think of the backend as your database and Keras as your programming language used to access the database. You can swap in whatever backend you like, and as long as it abides by certain rules, your code doesn’t have to change.
    • Therefore, you can think of Keras as a set of abstractions that makes it easier to perform deep learning (side note: While Keras always enabled rapid prototyping, it was not flexible enough for researchers. That’s changing in TensorFlow 2.0 — more on that later in this article).
  • Originally, Keras’ default backend was Theano and was the default until v1.1.0.
  • At the same time, Google had released TensorFlow, a symbolic math library used for machine learning and training neural networks.
    • Keras started supporting TensorFlow as a backend, and slowly but surely, TensorFlow became the most popular backend, resulting in TensorFlow being the default backend starting from the release of Keras v1.1.0.
  • Once TensorFlow became the default backend for Keras, by definition, both TensorFlow and Keras usage grew together — you could not have Keras without TensorFlow, and if you installed Keras on your system, you were also installing TensorFlow.
    • Similarly, TensorFlow users were becoming increasingly more drawn to the simplicity of the high-level Keras API.
  • The
    tf.keras
    submodule was introduced in TensorFlow v1.10.0,
    the first step in integrating Keras directly within the TensorFlow package itself.
    • The
      tf.keras
      package is/was separate from the
      keras
      package you would install via pip (i.e.,
      pip install keras
      ).
    • The original
      keras
      package was not subsumed into
      tensorflow
      to ensure compatibility and so that they could both organically develop.
  • However, that’s now changing — when Google announced TensorFlow 2.0 in June 2019, they declared that Keras is now the official high-level API of TensorFlow for quick and easy model design and training.
  • With the release of Keras 2.3.0, Francois has stated that:
    • This is the first release of Keras that brings the
      keras
      package in sync with
      tf.keras
    • It is the final release of Keras that will support multiple backends (i.e., Theano, CNTK, etc.).
    • And most importantly, going forward all deep learning practitioners should switch their code to TensorFlow 2.0 and the
      tf.keras
      package.
    • The original
      keras
      package will still receive bug fixes, but moving forward, you should be using
      tf.keras
      .

As you can tell, the history between Keras and TensorFlow is long, complicated, and intertwined.

But the most important takeaway for you, as a Keras user, is that you should be using TensorFlow 2.0 and

tf.keras
for future projects.

Start using tf.keras in all future projects

Figure 2: What’s the difference between Keras and tf.keras in TensorFlow 2.0?

On September 17th, 2019 Keras v2.3.0 was officially released — in the release Francois Chollet (the creator and chief maintainer of Keras), stated that:

Keras v2.3.0 is the first release of Keras that brings

keras
in sync with
tf.keras

It will be the the last major release to support backends other than TensorFlow (i.e., Theano, CNTK, etc.)

And most importantly, deep learning practitioners should start moving to TensorFlow 2.0 and the

tf.keras
package

For the majority of your projects, that’s as simple as changing your

import
lines from:
from keras... import ...

To prefacing the

import
with
tensorflow
:
from tensorflow.keras... import ...

If you are using custom training loops or using

Sessions
then you’ll have to update your code to use the new
GradientTape
feature, but overall, it’s fairly easy to update your code.

To help you in (automatically) updating your code from

keras
to
tf.keras
, Google has released a script named
tf_upgrade_v2
script, which, as the name suggests, analyzes your code and reports which lines need to be updated — the script can even perform the upgrade process for you.

You can refer here to learn more about automatically updating your code to TensorFlow 2.0.

Computational “backends” for Keras

Figure 3: What computational backends does Keras support? What does it mean to use Keras directly in TensorFlow via tf.keras?

As I mentioned earlier in this post, Keras relies on the concept of a computational backend.

The computational backend performs all the “heavy lifting” in terms of constructing a graph of the model, numeric computation, etc.

Keras then sits on top of this computational engine as an abstraction, making it easier for deep learning developers/practitioners to implement and train their models.

Originally, Keras supported Theano as its preferred computational backend — it then later supported other backends, including CNTK and mxnet, to name a few.

However, the most popular backend, by far, was TensorFlow which eventually became the default computation backend for Keras.

As more and more TensorFlow users started using Keras for its easy to use high-level API, the more TensorFlow developers had to seriously consider subsuming the Keras project into a separate module in TensorFlow called

tf.keras
.

TensorFlow v1.10 was the first release of TensorFlow to include a branch of

keras
inside
tf.keras
.

Now that TensorFlow 2.0 is released both

keras
and
tf.keras
are in sync
, implying that

keras
and
tf.keras
are still separate projects; however, developers should start using
tf.keras
moving forward as the
keras
package will only support bug fixes.

To quote Francois Chollet, the creator and maintainer of Keras:

This is also the last major release of multi-backend Keras. Going forward, we recommend that users consider switching their Keras code to tf.keras in TensorFlow 2.0.

It implements the same Keras 2.3.0 API (so switching should be as easy as changing the Keras import statements), but it has many advantages for TensorFlow users, such as support for eager execution, distribution, TPU training, and generally far better integration between low-level TensorFlow and high-level concepts like Layer and Model.

It is also better maintained.

If you’re a both a Keras and TensorFlow user, you should consider switching your code over to TensorFlow 2.0 and

tf.keras
.

Sessions and Eager Execution in TensorFlow 2.0

Figure 4: Eager execution is a more Pythonic way of working dynamic computational graphs. TensorFlow 2.0 supports eager execution (as does PyTorch). You can take advantage of eager execution and sessions with TensorFlow 2.0 and tf.keras. (image source)

TensorFlow 1.10+ users that utilize the Keras API within

tf.keras
will be familiar with creating a
Session
to train their model:
with tf.Session() as session:
	session.run(tf.global_variables_initializer())
	session.run(tf.tables_initializer())
	model.fit(X_train, y_train, validation_data=(X_valid, y_valid),
		epochs=10, batch_size=64)

Creating the

Session
object and requiring the entire model graph to be built ahead of time was a bit of a pain, so TensorFlow 2.0 introduced the concept of Eager Execution, thereby simplifying the code to:
model.fit(X_train, y_train, validation_data=(X_valid, y_valid),
	epochs=10, batch_size=64)

The benefit of Eager Execution is that the entire model graph does not have to be built.

Instead, operations are evaluated immediately, making it easier to get started building your models (as well as debugging them).

For more details on Eager Execution, including how to use it with TensorFlow 2.0, refer to this article.

And if you want a comparison on Eager Execution vs. Sessions and the impact it has on the speed of training a model, refer to this page.

Automatic differentiation and GradientTape with TensorFlow 2.0

Figure 5: How is TensorFlow 2.0 better at handling custom layers or loss functions? The answer lies in automatic differentiation and GradientTape. (image source)

If you’re a researcher who needed to implement custom layers or loss functions, you likely didn’t like TensorFlow 1.x (and rightfully so).

TensorFlow 1.x’s custom implementations were clunky to say the least — a lot was left to be desired.

With the release of TensorFlow 2.0 that is starting to change — it’s now far easier to implement your own custom losses.

One way it’s becoming easier is through automatic differentiation and the

GradientTape
implementation.

To utilize

GradientTape
all we need to do is implement our model architecture:
# Define our model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dropout(rate=0.2, input_shape=X.shape[1:]),
    tf.keras.layers.Dense(units=64, activation='relu'),
    tf.keras.layers.Dropout(rate=0.2),
    tf.keras.layers.Dense(units=1, activation='sigmoid')
])

Define our loss function and optimizer:

# Define loss and optimizer
loss_func = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()

Create the function responsible for performing a single batch update:

def train_loop(features, labels):
    # Define the GradientTape context
    with tf.GradientTape() as tape:
        # Get the probabilities
        predictions = model(features)
        # Calculate the loss
        loss = loss_func(labels, predictions)
    # Get the gradients
    gradients = tape.gradient(loss, model.trainable_variables)
    # Update the weights
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

And then train the model:

# Train the model
def train_model():
    start = time.time()
    for epoch in range(10):
        for step, (x, y) in enumerate(dataset):
            loss = train_loop(x, y)
            print('Epoch %d: last batch loss = %.4f' % (epoch, float(loss)))
    print("It took {} seconds".format(time.time() - start))

# Initiate training
train_model()

The

GradientTape
magic handles differentiation for us behind the scenes, making it far easier to work with custom losses and layers.

And speaking of custom layer and model implementations, be sure to refer to the next section.

Model and layer subclassing in TensorFlow 2.0

TensorFlow 2.0 and

tf.keras
provide us with three separate methods to implement our own custom models:

  1. Sequential
  2. Function
  3. Subclassing

Both the sequential and functional paradigms have been inside Keras for quite a while, but the subclassing feature is still unknown to many deep learning practitioners.

I’ll be doing a dedicated tutorial on the three methods next week, but for the time being, let’s take a look at how to implement a simple CNN based on the seminal LeNet architecture using (1) TensorFlow 2.0, (2) 

tf.keras
, and (3) the model subclassing feature:
class LeNet(tf.keras.Model):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv2d_1 = tf.keras.layers.Conv2D(filters=6, 
                           kernel_size=(3, 3), activation='relu', 
                           input_shape=(32,32,1))
        self.average_pool = tf.keras.layers.AveragePooling2D()
        self.conv2d_2 = tf.keras.layers.Conv2D(filters=16, 
                           kernel_size=(3, 3), activation='relu')
        self.flatten = tf.keras.layers.Flatten()
        self.fc_1 = tf.keras.layers.Dense(120, activation='relu')
        self.fc_2 = tf.keras.layers.Dense(84, activation='relu')
        self.out = tf.keras.layers.Dense(10, activation='softmax')
        
    def call(self, input):
        x = self.conv2d_1(input)
        x = self.average_pool(x)
        x = self.conv2d_2(x)
        x = self.average_pool(x)
        x = self.flatten(x)
        x = self.fc_2(self.fc_1(x))
        return self.out(x)
    
lenet = LeNet()

Notice how the

LeNet
class is a subclass of
Model
.

The constructor (i.e., the

init
) of
LeNet
defines each of the individual layers inside the model.

The

call
method then performs the forward-pass, enabling you to customize the forward pass as you see fit.

The benefit of using model subclassing is that your model:

  • Becomes fully-customizable.
  • Enables you to implement and utilize your own custom loss implementations.

And since your architecture inherits the

Model
class, you can still call methods like
.fit()
,
.compile()
, and
.evaluate()
, thereby maintaining the easy-to-use (and familiar) Keras API.

If you’re interested in learning more about LeNet, you can refer to this previous article.

TensorFlow 2.0 introduces better multi-GPU and distributed training support

Figure 6: Is TenorFlow 2.0 better with multiple GPU training? Yes, with the single worker MirroredStrategy. (image source)

TensorFlow 2.0 and

tf.keras
provide better multi-GPU and distributed training through their MirroredStrategy.

To quote the TensorFlow 2.0 documentation, “The

MirroredStrategy
supports synchronous distributed training on multiple GPUs on one machine”.

If you want to use multiple machines (each having potentially multiple GPUs), you should take a look at the MultiWorkerMirroredStrategy.

Or, if you are using Google’s cloud for training, check out the TPUStrategy.

For now though, let’s assume you are on a single machine that has multiple GPUs and you want to ensure all of your GPUs are used for training.

You can accomplish this by first creating your

MirroredStrategy
:
strategy = tf.distribute.MirroredStrategy()
print ('Number of devices: {}'.format(strategy.num_replicas_in_sync))

You then need to declare your model architecture and compile it within the scope of the

strategy
:
# Call the distribution scope context manager
with strategy.scope():
    # Define a model to fit the above data
    model = tf.keras.Sequential([
        tf.keras.layers.Dropout(rate=0.2, input_shape=X.shape[1:]),
        tf.keras.layers.Dense(units=64, activation='relu'),
        tf.keras.layers.Dropout(rate=0.2),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ])
    
    # Compile the model
    model.compile(loss='binary_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])

And from there you can call

.fit
to train the model:
# Train the model
model.fit(X, y, epochs=5)

Provided your machine has multiple GPUs, TensorFlow will take care of the multi-GPU training for you.

TensorFlow 2.0 is an ecosystem, including TF 2.0, TF Lite, TFX, quantization, and deployment

Figure 7: What is new in the TensorFlow 2.0 ecosystem? Should I use Keras separately or should I use tf.keras?

TensorFlow 2.0 is more than a computational engine and a deep learning library for training neural networks — it’s so much more.

With TensorFlow Lite (TF Lite) we can train, optimize, and quantize models that are designed to run on resource-constrained devices such as smartphones and other embedded devices (i.e., Raspberry Pi, Google Coral, etc.).

Or, if you need to deploy your model to production, you can use TensorFlow Extended (TFX), an end-to-end platform for model deployment.

Once your research and experiments are complete, you can leverage TFX to prepare the model for production and scale your model using Google’s ecosystem.

With TensorFlow 2.0 we are truly starting to see a better, more efficient bridge between research, experimentation, model preparation/quantization, and deployment to production.

I’m truly excited about the release of TensorFlow 2.0 and the impact it will have on the deep learning community.

Credits

All code examples from this post came from TensorFlow 2.0’s official examples. Be sure to refer to the complete code examples provided by Francois Chollet for more details.

Additionally, definitely check out Sayak Paul’s Ten Important Updates from TensorFlow 2.0 article which helped inspire today’s blog post.

Summary

In this tutorial, you learned about Keras,

tf.keras
, and TensorFlow 2.0.

The first important takeaway is that deep learning practitioners using the

keras
package should start using
tf.keras
inside TensorFlow 2.0.

Not only will you enjoy the added speed and optimization of TensorFlow 2.0, but you’ll also receive new feature updates — the latest release of the

keras
package (v2.3.0) will be the last release to support multiple backends and feature updates. Moving forward, the
keras
package will receive only bug fixes.

You should seriously consider moving to

tf.keras
and TensorFlow 2.0 in your future projects.

The second takeaway is that TensorFlow 2.0 is that it’s more than a GPU-accelerated deep learning library.

Not only do you have the ability to train your own models using TensorFlow 2.0 and

tf.keras
, but you can now:
  • Take those models and prepare them for mobile/embedded deployment using TensorFlow Lite (TF Lite).
  • Deploy the models to production using TensorFlow Extended (TF Extended).

From my perspective, I’ve already started porting my original

keras
code to
tf.keras
. I would suggest you start doing the same.

I hope you enjoyed today’s tutorial — I’ll be back with new TensorFlow 2.0 and

tf.keras
tutorials soon.

To be notified when future tutorials are published here on PyImageSearch (and receive my free 17-page Resource Guide PDF on Computer Vision, Deep Learning, and OpenCV), just enter your email address in the form below!

The post Keras vs. tf.keras: What’s the difference in TensorFlow 2.0? appeared first on PyImageSearch.

3 ways to create a Keras model with TensorFlow 2.0 (Sequential, Functional, and Model Subclassing)

$
0
0

Keras and TensorFlow 2.0 provide you with three methods to implement your own neural network architectures:

  1. Sequential API
  2. Functional API
  3. Model subclassing

Inside of this tutorial you’ll learn how to utilize each of these methods, including how to choose the right API for the job.

To learn more about Sequential, Functional, and Model subclassing with Keras and TensorFlow 2.0, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

3 ways to create a Keras model with TensorFlow 2.0 (Sequential, Functional, and Model subclassing)

In the first half of this tutorial, you will learn how to implement sequential, functional, and model subclassing architectures using Keras and TensorFlow 2.0. I’ll then show you how to train each of these model architectures.

Once our training script is implemented we’ll then train each of the sequential, functional, and subclassing models, and review the results.

Furthermore, all code examples covered here will be compatible with Keras and TensorFlow 2.0.

Project structure

Go ahead and grab the source code to this post by using the “Downloads” section of this tutorial. Then extract the files and inspect the directory contents with the

tree
  command:
$ tree --dirsfirst
.
├── output
│   ├── class.png
│   ├── functional.png
│   └── sequential.png
├── pyimagesearch
│   ├── __init__.py
│   └── models.py
└── train.py

2 directories, 6 files

Our

models.py
  contains three functions to build Keras/TensorFlow 2.0 models using the Sequential, Functional and Model subclassing APIs, respectively.

The training script,

train.py
 , will load a model depending on the provided command line arguments. The model will be trained on the CIFAR-10 dataset. An accuracy/loss curve plot will be output to a
.png
  file in the
output
  directory.

Implementing a Sequential model with Keras and TensorFlow 2.0

Figure 1: The “Sequential API” is one of the 3 ways to create a Keras model with TensorFlow 2.0.

A sequential model, as the name suggests, allows you to create models layer-by-layer in a step-by-step fashion.

Keras Sequential API is by far the easiest way to get up and running with Keras, but it’s also the most limited — you cannot create models that:

  • Share layers
  • Have branches (at least not easily)
  • Have multiple inputs
  • Have multiple outputs

Examples of seminal sequential architectures that you may have already used or implemented include:

  • LeNet
  • AlexNet
  • VGGNet

Let’s go ahead and implement a basic Convolutional Neural Network using TensorFlow 2.0 and Keras’ Sequential API.

Open up the

models.py
file in your project structure and insert the following code:
# import the necessary packages
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import concatenate

Notice how all of our Keras imports on Lines 2-13 come from

tensorflow.keras
  (also known as
tf.keras
 ).

To implement our sequential model, we need the

Sequential
  import on Line 3. Let’s go ahead and create the sequential model now:
def shallownet_sequential(width, height, depth, classes):
	# initialize the model along with the input shape to be
	# "channels last" ordering
	model = Sequential()
	inputShape = (height, width, depth)

	# define the first (and only) CONV => RELU layer
	model.add(Conv2D(32, (3, 3), padding="same",
		input_shape=inputShape))
	model.add(Activation("relu"))

	# softmax classifier
	model.add(Flatten())
	model.add(Dense(classes))
	model.add(Activation("softmax"))

	# return the constructed network architecture
	return model

Line 15 defines the 

shallownet_sequential
 model builder method.

Notice how on Line 18, we initialize the model as an instance of the

Sequential
  class. We’ll then add each layer to to the
Sequential
  class, one at a time.

ShallowNet contains one

CONV => RELU
  layer followed by a softmax classifier (Lines 22-29). Notice on each of these lines of code that we call
model.add
  to assemble our CNN with the appropriate building blocks. Order matters — you must call
model.add
  in the order in which you want to insert layers, normalization methods, softmax classifiers, etc.

Once your model has all of the components you desire, you can

return
  the object so that it can be be compiled later.

Line 32 returns our sequential model (we will use it in our training script).

Creating a Functional model with Keras and TensorFlow 2.0

Figure 2: The “Functional API” is one of the 3 ways to create a Keras model with TensorFlow 2.0.

Once you’ve had some practice implementing a few basic neural network architectures using Keras’ Sequential API, you’ll then want to gain experience working with the Functional API.

Keras’ Functional API is easy to use and is typically favored by most deep learning practitioners who use the Keras deep learning library.

Using the Functional API you can:

  • Create more complex models.
  • Have multiple inputs and multiple outputs.
  • Easily define branches in your architectures (ex., an Inception block, ResNet block, etc.).
  • Design directed acyclic graphs (DAGs).
  • Easily share layers inside the architecture.

Furthermore, any Sequential model can be implemented using Keras’ Functional API.

Examples of models that have Functional characteristics (such as layer branching) include:

  • ResNet
  • GoogLeNet/Inception
  • Xception
  • SqueezeNet

To gain experience using TensorFlow 2.0 and Keras’ Functional API, let’s implement a MiniGoogLeNet which includes a simplified version of the Inception module from Szegedy et al.’s seminal Going Deeper with Convolutions paper:

Figure 3: The “Functional API” is the best way to implement GoogLeNet to create a Keras model with TensorFlow 2.0. (image source)

As you can see, there are three modules inside the MiniGoogLeNet architecture:

  1. conv_module
    : Performs convolution on an input volume, utilizes batch normalization, and then applies a ReLU activation. We define this module out of simplicity and to make it reusable, ensuring we can easily apply a “convolutional block” inside our architecture using as few lines of code as possible, keeping our implementation tidy, organized, and easier to debug.
  2. inception_module
    : Instantiates two
    conv_module
    objects. The first CONV block applies 1×1 convolution while the second block performs 3×3 convolution with “same” padding, ensuring the output volume sizes for 1×1 and 3×3 convolutional are identical. The output volumes are then concatenated together along the channel dimension.
  3. downsample_module
    : This module is responsible for reducing the size of an input volume. Similar to the
    inception_module
    two branches are utilized here. The first branch performs 3×3 convolution but with (1) 2×2 stride and (2) “valid” padding, thereby reducing the volume size. The second branch applies 3×3 max-pooling with a 2×2 stride. The output volume dimensions for both branches are identical so they can be concatenated together along the channel axis.

Think of each of these modules as Legos — we implement each type of Lego and then stack them in a particular manner to define our model architecture.

Legos can be organized and fit together in a near-infinite number of possibilities; however, since form defines function, we need to take care and consider how these Legos should fit together.

Note: If you would like a detailed review of each of the modules inside the MiniGoogLeNet architecture, be sure to refer to Deep Learning for Computer Vision with Python where I cover them in detail.

As an example of piecing our “Lego modules” together, let’s go ahead and implement MiniGoogLeNet now:

def minigooglenet_functional(width, height, depth, classes):
	def conv_module(x, K, kX, kY, stride, chanDim, padding="same"):
		# define a CONV => BN => RELU pattern
		x = Conv2D(K, (kX, kY), strides=stride, padding=padding)(x)
		x = BatchNormalization(axis=chanDim)(x)
		x = Activation("relu")(x)

		# return the block
		return x

Line 34 defines the

minigooglenet_functional
 model builder method.

We’re going to define three reusable modules which are part of the GoogLeNet architecture:

  • conv_module
  • inception_module
  • downsample_module

Be sure to refer to the detailed descriptions of each above.

Defining the modules as sub-functions like this allows us to reuse the structure and save on lines of code, not to mention making it easier to read and make modifications.

Line 35 defines the

conv_module
  and its parameters. The most important parameter is
x
  — the input to this module. The other parameters pass through to
Conv2D
  and
BatchNormalization
 .

Lines 37-39 build a set of

CONV => BN => RELU
  layers.

Notice that the beginning of each line starts with

x =
  and the end of the lines finish with
(x)
 . This style is representative of the Keras Functional API. Layers are appended to one another where
x
  acts as the input to subsequent layers. This functional style will be present throughout the
minigooglenet_functional
  method.

Line 42 returns the built 

conv_module
  to the caller.

Let’s create our

inception_module
  which consists of two convolution modules:
def inception_module(x, numK1x1, numK3x3, chanDim):
		# define two CONV modules, then concatenate across the
		# channel dimension
		conv_1x1 = conv_module(x, numK1x1, 1, 1, (1, 1), chanDim)
		conv_3x3 = conv_module(x, numK3x3, 3, 3, (1, 1), chanDim)
		x = concatenate([conv_1x1, conv_3x3], axis=chanDim)

		# return the block
		return x

Line 44 defines our 

inception_module
and parameters.

The inception module contains two branches of the

conv_module
  that are concatenated together:
  1. In the first branch, we perform 1×1 convolutions (Line 47).
  2. In the second branch we perform 3×3 convolutions (Line 48).

The call to

concatenate
  on Line 49 brings the module branches together across the channel dimension. Since the padding is the
“same”
  for both branches the output volume dimensions are equal. Thus, they can be concatenated along the channel dimension.

Line 51 returns the

inception_module
  block to the caller.

Finally, we’ll implement our

downsample_module
 :
def downsample_module(x, K, chanDim):
		# define the CONV module and POOL, then concatenate
		# across the channel dimensions
		conv_3x3 = conv_module(x, K, 3, 3, (2, 2), chanDim,
			padding="valid")
		pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
		x = concatenate([conv_3x3, pool], axis=chanDim)

		# return the block
		return x

Line 54 defines our 

downsample_module
and parameters. The downsample module is responsible for reducing the input volume size and it also utilizes two branches:
  1. The first branch performs 3×3 convolution with 2×2 stride (Lines 57 and 58).
  2. The second branch performs 3×3 max-pooling with 2×2 stride (Line 59).

The outputs of the branches are then stacked along the channel dimension via a call to

concatenate
  (Line 60).

Line 63 returns the downsample block to the caller.

With each of our modules defined, we can now use them to build the entire MiniGoogLeNet architecture using the Functional API:

# initialize the input shape to be "channels last" and the
	# channels dimension itself
	inputShape = (height, width, depth)
	chanDim = -1

	# define the model input and first CONV module
	inputs = Input(shape=inputShape)
	x = conv_module(inputs, 96, 3, 3, (1, 1), chanDim)

	# two Inception modules followed by a downsample module
	x = inception_module(x, 32, 32, chanDim)
	x = inception_module(x, 32, 48, chanDim)
	x = downsample_module(x, 80, chanDim)

	# four Inception modules followed by a downsample module
	x = inception_module(x, 112, 48, chanDim)
	x = inception_module(x, 96, 64, chanDim)
	x = inception_module(x, 80, 80, chanDim)
	x = inception_module(x, 48, 96, chanDim)
	x = downsample_module(x, 96, chanDim)

	# two Inception modules followed by global POOL and dropout
	x = inception_module(x, 176, 160, chanDim)
	x = inception_module(x, 176, 160, chanDim)
	x = AveragePooling2D((7, 7))(x)
	x = Dropout(0.5)(x)

	# softmax classifier
	x = Flatten()(x)
	x = Dense(classes)(x)
	x = Activation("softmax")(x)

	# create the model
	model = Model(inputs, x, name="minigooglenet")

	# return the constructed network architecture
	return model

Lines 67-71 set up our

inputs
  to the CNN.

From there, we use the Functional API to assemble our model:

  1. First, we apply a single
    conv_module
    (Line 72).
  2. Two
    inception_module
    blocks are then stacked on top of each other before using the
    downsample_module
    to reduce volume size. (Lines 75-77).
  3. We then deepen the module by applying four
    inception_module
    blocks before reducing volume size via the
    downsample_module
     (Lines 80-84).
  4. We then stack two more
    inception_module
    blocks before applying average pooling and constructing the fully-connected layer head (Lines 87-90).
  5. A softmax classifier is then applied (Lines 93-95).
  6. Finally, the fully constructed
    Model
    is returned to the calling function (Lines 98-101).

Again, notice how we are using the Functional API in comparison to the Sequential API discussed in the previous section.

For a more detailed discussion of how to utilize Keras’ Functional API to implement your own custom model architectures, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I discuss the Functional API in more detail.

Additionally, I want to give credit to Zhang et al. who originally proposed the MiniGoogLeNet architecture in a beautiful visualization in their paper, Understanding deep learning requires rethinking generalization.

Model subclassing with Keras and TensorFlow 2.0

Figure 4: “Model Subclassing” is one of the 3 ways to create a Keras model with TensorFlow 2.0.

The third and final method to implement a model architecture using Keras and TensorFlow 2.0 is called model subclassing.

Inside of Keras the

Model
class is the root class used to define a model architecture. Since Keras utilizes object-oriented programming, we can actually subclass the
Model
class and then insert our architecture definition.

Model subclassing is fully-customizable and enables you to implement your own custom forward-pass of the model.

However, this flexibility and customization comes at a cost — model subclassing is way harder to utilize than the Sequential API or Functional API.

So, if the model subclassing method is so hard to use, why bother utilizing it all?

Exotic architectures or custom layer/model implementations, especially those utilized by researchers, can be extremely challenging, if not impossible, to implement using the standard Sequential or Functional APIs.

Instead, researchers wish to have control over every nuance of the network and training process — and that’s exactly what model subclassing provides them.

Let’s look at a simple example implementing MiniVGGNet, an otherwise sequential model, but converted to a model subclass:

class MiniVGGNetModel(Model):
	def __init__(self, classes, chanDim=-1):
		# call the parent constructor
		super(MiniVGGNetModel, self).__init__()

		# initialize the layers in the first (CONV => RELU) * 2 => POOL
		# layer set
		self.conv1A = Conv2D(32, (3, 3), padding="same")
		self.act1A = Activation("relu")
		self.bn1A = BatchNormalization(axis=chanDim)
		self.conv1B = Conv2D(32, (3, 3), padding="same")
		self.act1B = Activation("relu")
		self.bn1B = BatchNormalization(axis=chanDim)
		self.pool1 = MaxPooling2D(pool_size=(2, 2))

		# initialize the layers in the second (CONV => RELU) * 2 => POOL
		# layer set
		self.conv2A = Conv2D(32, (3, 3), padding="same")
		self.act2A = Activation("relu")
		self.bn2A = BatchNormalization(axis=chanDim)
		self.conv2B = Conv2D(32, (3, 3), padding="same")
		self.act2B = Activation("relu")
		self.bn2B = BatchNormalization(axis=chanDim)
		self.pool2 = MaxPooling2D(pool_size=(2, 2))

		# initialize the layers in our fully-connected layer set
		self.flatten = Flatten()
		self.dense3 = Dense(512)
		self.act3 = Activation("relu")
		self.bn3 = BatchNormalization()
		self.do3 = Dropout(0.5)

		# initialize the layers in the softmax classifier layer set
		self.dense4 = Dense(classes)
		self.softmax = Activation("softmax")

Line 103 defines our

MiniVGGNetModel
  class followed by Line 104 which defines our constructor.

Line 106 calls our parent constructor using the

super
  keyword.

From there, our layers are defined as instance attributes, each with its own name (Lines 110-137). Attributes in Python use the

self
  keyword and are typically (but not always) defined in a constructor. Let’s review them now:
  • The first
    (CONV => RELU) * 2 => POOL
      layer set (Lines 110-116).
  • The second
    (CONV => RELU) * 2 => POOL
      layer set (Lines 120-126).
  • Our fully-connected network head (
    Dense
     ) with
    "softmax"
      classifier (Line 129-138).

Notice how each layer is defined inside the constructor — this is on purpose!

Let’s say we had our own custom layer implementation that performed an exotic type of convolution or pooling. That layer could be defined elsewhere in the

MiniVGGNetModel
and then instantiated inside the constructor.

Once our Keras layers and custom implemented layers are defined, we can then define the network topology/graph inside the

call
function which is used to perform a forward-pass:
def call(self, inputs):
		# build the first (CONV => RELU) * 2 => POOL layer set
		x = self.conv1A(inputs)
		x = self.act1A(x)
		x = self.bn1A(x)
		x = self.conv1B(x)
		x = self.act1B(x)
		x = self.bn1B(x)
		x = self.pool1(x)

		# build the second (CONV => RELU) * 2 => POOL layer set
		x = self.conv2A(inputs)
		x = self.act2A(x)
		x = self.bn2A(x)
		x = self.conv2B(x)
		x = self.act2B(x)
		x = self.bn2B(x)
		x = self.pool2(x)

		# build our FC layer set
		x = self.flatten(x)
		x = self.dense3(x)
		x = self.act3(x)
		x = self.bn3(x)
		x = self.do3(x)

		# build the softmax classifier
		x = self.dense4(x)
		x = self.softmax(x)

		# return the constructed model
		return x

Notice how this model is essentially a Sequential model; however, we could just as easily define a model with multiple inputs/outputs, branches, etc.

The majority of deep learning practitioners won’t have to use the model subclassing method, but just know that it’s available to you if you need it!

Implementing the training script

Our three model architectures are implemented, but how are we going to train them?

The answer lies inside

train.py
 — let’s take a look:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# there seems to be an issue with TensorFlow 2.0 throwing non-critical
# warnings regarding gradients when using the model sub-classing
# feature -- I found that by setting the logging level I can suppress
# the warnings from showing up (likely won't be required in future
# releases of TensorFlow)
import logging
logging.getLogger("tensorflow").setLevel(logging.CRITICAL)

# import the necessary packages
from pyimagesearch.models import MiniVGGNetModel
from pyimagesearch.models import minigooglenet_functional
from pyimagesearch.models import shallownet_sequential
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse

Lines 2-24 import our packages:

  • For
    matplotlib
     , we set the backend to
    "Agg"
      so that we can export our plots to disk as
    .png
      files (Lines 2 and 3).
  • We import
    logging
      and set the log level to ignore anything but critical errors (Lines 10 and 11). TensorFlow was reporting (irrelevant) warning messages when training a model using Keras’ model subclassing feature so I updated the logging to only report critical messages. I believe that the warnings themselves are a bug inside TensorFlow 2.0 that will likely be removed in the next release.
  • Our three CNN models are imported: (1)
    MiniVGGNetModel
     , (2)
    minigooglenet_functional
     , and (3)
    shallownet_sequential
      (Lines 14-16).
  • We import our CIFAR-10 dataset (Line 21).

From here we’ll go ahead and parse command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, default="sequential",
	choices=["sequential", "functional", "class"],
	help="type of model architecture")
ap.add_argument("-p", "--plot", type=str, required=True,
	help="path to output plot file")
args = vars(ap.parse_args())

Our two command line arguments include:

  • --model
     : One of the following
    choices=["sequential", "functional", "class"]
      must be made to load our model using Keras’ APIs.
  • --plot
     : The path to the output plot image file. You may store your plots in the
    output/
      directory as I have done.

From here we’ll (1) initialize a number of hyperparameters, (2) prepare our data, and (3) construct our data augmentation object:

# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-2
BATCH_SIZE = 128
NUM_EPOCHS = 60

# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog",
	"frog", "horse", "ship", "truck"]

# load the CIFAR-10 dataset
print("[INFO] loading CIFAR-10 dataset...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()

# scale the data to the range [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=18, zoom_range=0.15,
	width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
	 horizontal_flip=True, fill_mode="nearest")

In this code block, we:

  • Initialize the (1) learning rate, (2) batch size, and (3) number of training epochs (Lines 37-39).
  • Set the CIFAR-10 dataset
    labelNames
     , load the dataset, and preprocess them (Lines 42-51).
  • Binarize our labels (Lines 54-56).
  • Instantiate our data augmentation object with settings for random rotations, zooms, shifts, shears, and flips (Lines 59-61).

The heart of the script lies in this next code block where we instantiate our model:

# check to see if we are using a Keras Sequential model
if args["model"] == "sequential":
	# instantiate a Keras Sequential model
	print("[INFO] using sequential model...")
	model = shallownet_sequential(32, 32, 3, len(labelNames))

# check to see if we are using a Keras Functional model
elif args["model"] == "functional":
	# instantiate a Keras Functional model
	print("[INFO] using functional model...")
	model = minigooglenet_functional(32, 32, 3, len(labelNames))

# check to see if we are using a Keras Model class
elif args["model"] == "class":
	# instantiate a Keras Model sub-class model
	print("[INFO] using model sub-classing...")
	model = MiniVGGNetModel(len(labelNames))

Here we check if either our Sequential, Functional, or Model Subclassing architecture should be instantiated. Following the

if/elif
  statements based on the command line arguments we initialize the appropriate
model
 .

From there, we are ready to compile the model and fit to our data:

# initialize the optimizer and compile the model
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / NUM_EPOCHS)
print("[INFO] training network...")
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // BATCH_SIZE,
	epochs=NUM_EPOCHS,
	verbose=1)

All of our models are compiled with Stochastic Gradient Descent (

SGD
 ) and learning rate decay (Lines 82-85).

Lines 88-93 kick off training using Keras’

.fit_generator
  method to handle data augmentation. You may read about the .fit_generator method in depth in this article.

We’ll wrap up by evaluating our model and plotting the training history:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=labelNames))

# determine the number of epochs and then construct the plot title
N = np.arange(0, NUM_EPOCHS)
title = "Training Loss and Accuracy on CIFAR-10 ({})".format(
	args["model"])

# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])

Lines 97-99 make predictions on the test set and evaluate the network. A classification report is printed to the terminal.

Lines 102-117 plot the training accuracy/loss curves and output the plot to disk.

Keras Sequential model results

We are now ready to use Keras and TensorFlow 2.0 to train our Sequential model!

Take a second now to use the “Downloads” section of this tutorial to download the source code to this guide.

From there, open up a terminal and execute the following command to train and evaluate a Sequential model:

$ python train.py --model sequential --plot output/sequential.png
[INFO] loading CIFAR-10 dataset...
[INFO] using sequential model...
[INFO] training network...
Epoch 1/60
390/390 [==============================] - 25s 63ms/step - loss: 1.9162 - accuracy: 0.3165 - val_loss: 1.6599 - val_accuracy: 0.4163
Epoch 2/60
390/390 [==============================] - 24s 61ms/step - loss: 1.7170 - accuracy: 0.3849 - val_loss: 1.5639 - val_accuracy: 0.4471
Epoch 3/60
390/390 [==============================] - 23s 59ms/step - loss: 1.6499 - accuracy: 0.4093 - val_loss: 1.5228 - val_accuracy: 0.4668
...
Epoch 58/60
390/390 [==============================] - 24s 61ms/step - loss: 1.3343 - accuracy: 0.5299 - val_loss: 1.2767 - val_accuracy: 0.5655
Epoch 59/60
390/390 [==============================] - 24s 61ms/step - loss: 1.3276 - accuracy: 0.5334 - val_loss: 1.2461 - val_accuracy: 0.5755
Epoch 60/60
390/390 [==============================] - 24s 61ms/step - loss: 1.3280 - accuracy: 0.5342 - val_loss: 1.2555 - val_accuracy: 0.5715
[INFO] evaluating network...
              precision    recall  f1-score   support   

    airplane       0.73      0.52      0.60      1000   
  automobile       0.62      0.80      0.70      1000   
        bird       0.58      0.30      0.40      1000   
         cat       0.51      0.24      0.32      1000   
        deer       0.69      0.32      0.43      1000   
         dog       0.53      0.51      0.52      1000   
        frog       0.47      0.84      0.60      1000   
       horse       0.55      0.73      0.62      1000   
        ship       0.69      0.69      0.69      1000   
       truck       0.52      0.77      0.62      1000   

    accuracy                           0.57     10000   
   macro avg       0.59      0.57      0.55     10000   
weighted avg       0.59      0.57      0.55     10000

Figure 5: Using TensorFlow 2.0’s Keras Sequential API (one of the 3 ways to create a Keras model with TensorFlow 2.0), we have trained ShallowNet on CIFAR-10.

Here we are obtaining 59% accuracy on the CIFAR-10 dataset.

Looking at our training history plot in Figure 5, we notice that our validation loss is less than our training loss for nearly the entire training process — we can improve our accuracy by increasing the model complexity which is exactly what we’ll do in the next section.

Keras Functional model results

Our Functional model implementation is far deeper and more complex than our Sequential example.

Again, make sure you’ve used the “Downloads” section of this guide to download the source code.

Once you have the source code, execute the following command to train our Functional model:

$ python train.py --model functional --plot output/functional.png
[INFO] loading CIFAR-10 dataset...
[INFO] using functional model...
[INFO] training network...
Epoch 1/60
390/390 [==============================] - 69s 178ms/step - loss: 1.6112 - accuracy: 0.4091 - val_loss: 2.2448 - val_accuracy: 0.2866
Epoch 2/60
390/390 [==============================] - 60s 153ms/step - loss: 1.2376 - accuracy: 0.5550 - val_loss: 1.3850 - val_accuracy: 0.5259
Epoch 3/60
390/390 [==============================] - 59s 151ms/step - loss: 1.0665 - accuracy: 0.6203 - val_loss: 1.4964 - val_accuracy: 0.5370
...
Epoch 58/60
390/390 [==============================] - 59s 151ms/step - loss: 0.2498 - accuracy: 0.9141 - val_loss: 0.4282 - val_accuracy: 0.8756
Epoch 59/60
390/390 [==============================] - 58s 149ms/step - loss: 0.2398 - accuracy: 0.9184 - val_loss: 0.4874 - val_accuracy: 0.8643
Epoch 60/60
390/390 [==============================] - 61s 156ms/step - loss: 0.2442 - accuracy: 0.9155 - val_loss: 0.4981 - val_accuracy: 0.8649
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.94      0.84      0.89      1000
  automobile       0.95      0.94      0.94      1000
        bird       0.70      0.92      0.80      1000
         cat       0.85      0.64      0.73      1000
        deer       0.77      0.92      0.84      1000
         dog       0.91      0.70      0.79      1000
        frog       0.88      0.94      0.91      1000
       horse       0.95      0.85      0.90      1000
        ship       0.89      0.96      0.92      1000
       truck       0.89      0.95      0.92      1000

    accuracy                           0.86     10000
   macro avg       0.87      0.86      0.86     10000
weighted avg       0.87      0.86      0.86     10000

Figure 6: Using TensorFlow 2.0’s Keras Functional API (one of the 3 ways to create a Keras model with TensorFlow 2.0), we have trained MiniGoogLeNet on CIFAR-10.

This time we’ve been able to boost our accuracy all the way up to 87%!

Keras Model subclassing results

Our final experiment evaluates our implementation of model subclassing using Keras.

The model we’re using here is a variation of VGGNet — an essentially sequential model consisting of 3×3 CONVs and 2×2 max-pooling for volume dimension reduction.

We used Keras model subclassing here (rather than the Sequential API) as a simple example of how you may take an existing model and convert it to subclassed architecture.

Note: Implementing your own custom layer types and training procedures for the model subclassing API is outside the scope of this post but I will cover it in a future guide.

To see Keras model subclassing in action make sure you’ve used the “Downloads” section of this guide to grab the code — from there you can execute the following command:

$ python train.py --model class --plot output/class.png
[INFO] loading CIFAR-10 dataset...
[INFO] using model sub-classing...
[INFO] training network...
Epoch 1/60
Epoch 58/60
390/390 [==============================] - 30s 77ms/step - loss: 0.9100 - accuracy: 0.6799 - val_loss: 0.8620 - val_accuracy: 0.7057
Epoch 59/60
390/390 [==============================] - 30s 77ms/step - loss: 0.9100 - accuracy: 0.6792 - val_loss: 0.8783 - val_accuracy: 0.6995
Epoch 60/60
390/390 [==============================] - 30s 77ms/step - loss: 0.9036 - accuracy: 0.6785 - val_loss: 0.8960 - val_accuracy: 0.6955
[INFO] evaluating network...
              precision    recall  f1-score   support

    airplane       0.76      0.77      0.77      1000
  automobile       0.80      0.90      0.85      1000
        bird       0.81      0.46      0.59      1000
         cat       0.63      0.36      0.46      1000
        deer       0.68      0.57      0.62      1000
         dog       0.78      0.45      0.57      1000
        frog       0.45      0.96      0.62      1000
       horse       0.74      0.81      0.77      1000
        ship       0.90      0.79      0.84      1000
       truck       0.73      0.89      0.80      1000

    accuracy                           0.70     10000
   macro avg       0.73      0.70      0.69     10000
weighted avg       0.73      0.70      0.69     10000

Figure 7: Using TensorFlow 2.0’s Keras Subclassing (one of the 3 ways to create a Keras model with TensorFlow 2.0), we have trained MiniVGGNet on CIFAR-10.

Here we obtain 73% accuracy, not quite as good as our MiniGoogLeNet implementation, but it still serves as an example of how to implement an architecture using Keras’ model subclassing feature.

In general, I do not recommend using Keras’ model subclassing:

  • It’s harder to use.
  • It adds more code complexity
  • It’s harder to debug.

…but it does give you full control over the model.

Typically I would only recommend you use Keras’ model subclassing if you are a:

  • Deep learning researcher implementing custom layers, models, and training procedures.
  • Deep learning practitioner trying to replicate the results of a researcher/paper.

The majority of deep learning practitioners are not going to need Keras’ model subclassing feature.

How can I learn Deep Learning?

Figure 8: My deep learning book, Deep Learning for Computer Vision with Python, is trusted by employees and students of top institutions. It is regularly updated to keep pace with the fast-moving AI industry.

If you’re interested in diving head-first into the world of computer vision/deep learning and discovering how to:

  • Design and train Convolutional Neural Networks for your project on your own custom datasets.
  • Learn deep learning fundamentals, rules of thumb, and best practices.
  • Replicate the results of state-of-the-art papers, including ResNet, SqueezeNet, VGGNet, and others.
  • Train your own custom Faster R-CNN, Single Shot Detectors (SSDs), and RetinaNet object detectors.
  • Use Mask R-CNN to train your own instance segmentation networks.

…then be sure to take a look at my book, Deep Learning for Computer Vision with Python!

My complete, self-study deep learning book is trusted by members of top machine learning schools, companies, and organizations, including Microsoft, Google, Stanford, MIT, CMU, and more!

Readers of my book have gone on to win Kaggle competitions, secure academic grants, and start careers in CV and DL using the knowledge they gained through study and practice.

My book not only teaches the fundamentals, but also teaches advanced techniques, best practices, and tools to ensure that you are armed with practical knowledge and proven coding recipes to tackle nearly any computer vision and deep learning problem presented to you in school, research, or the modern workforce.

Be sure to take a look  — and while you’re at it, don’t forget to grab your (free) table of contents + sample chapters.

Summary

In this tutorial you learned the three ways to implement a neural network architecture using Keras and TensorFlow 2.0:

  • Sequential: Used for implementing simple layer-by-layer architectures without multiple inputs, multiple outputs, or layer branches. Typically the first model API you use when getting started with Keras.
  • Functional: The most popular Keras model implementation API. Allows everything inside the Sequential API, but also facilitates substantially more complex architectures which include multiple inputs and outputs, branching, etc. Best of all, the syntax for Keras’ Functional API is clean and easy to use.
  • Model subclassing: Utilized when a deep learning researcher/practitioner needs full control over model, layer, and training procedure implementation. Code is verbose, harder to write, and even harder to debug. Most deep learning practitioners won’t need to subclass models using Keras, but if you’re doing research or custom implementation, model subclassing is there if you need it!

If you’re interested in learning more about the Sequential, Functional, and Model Subclassing APIs, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I cover them in more detail.

I hope you enjoyed today’s tutorial!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post 3 ways to create a Keras model with TensorFlow 2.0 (Sequential, Functional, and Model Subclassing) appeared first on PyImageSearch.

Traffic Sign Classification with Keras and Deep Learning

$
0
0

In this tutorial, you will learn how to train your own traffic sign classifier/recognizer capable of obtaining over 95% accuracy using Keras and Deep Learning.

Last weekend I drove down to Maryland to visit my parents. As I pulled into their driveway I noticed something strange — there was a car I didn’t recognize sitting in my dad’s parking spot.

I parked my car, grabbed my bags out of the trunk, and before I could even get through the front door, my dad came out, excited and enlivened, exclaiming that he had just gotten back from the car dealership and traded in his old car for a brand new 2020 Honda Accord.

Most everyone enjoys getting a new car, but for my dad, who puts a lot of miles on his car each year for work, getting a new car is an especially big deal.

My dad wanted the family to go for a drive and check out the car, so my dad, my mother, and I climbed into the vehicle, the “new car scent” hitting you like bad cologne that you’re ashamed to admit that you like.

As we drove down the road my mother noticed that the speed limit was automatically showing up on the car’s dashboard — how was that happening?

The answer?

Traffic sign recognition.

In the 2020 Honda Accord models, a front camera sensor is mounted to the interior of the windshield behind the rearview mirror.

That camera polls frames, looks for signs along the road, and then classifies them.

The recognized traffic sign is then shown on the LCD dashboard as a reminder to the driver.

It’s admittedly a pretty neat feature and the rest of the drive quickly turned from a vehicle test drive into a lecture on how computer vision and deep learning algorithms are used to recognize traffic signs (I’m not sure my parents wanted that lecture but they got it anyway).

When I returned from visiting my parents I decided it would be fun (and educational) to write a tutorial on traffic sign recognition — you can use this code as a starting point for your own traffic sign recognition projects.

To learn more about traffic sign classification with Keras and Deep Learning, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Traffic Sign Classification with Keras and Deep Learning

In the first part of this tutorial, we’ll discuss the concept of traffic sign classification and recognition, including the dataset we’ll be using to train our own custom traffic sign classifier.

From there we’ll review our directory structure for the project.

We’ll then implement

TrafficSignNet
, a Convolutional Neural Network which we’ll train on our dataset.

Given our trained model we’ll evaluate its accuracy on the test data and even learn how to make predictions on new input data as well.

What is traffic sign classification?

Figure 1: Traffic sign recognition consists of object detection: (1) detection/localization and (2) classification. In this blog post we will only focus on classification of traffic signs with Keras and deep learning.

Traffic sign classification is the process of automatically recognizing traffic signs along the road, including speed limit signs, yield signs, merge signs, etc. Being able to automatically recognize traffic signs enables us to build “smarter cars”.

Self-driving cars need traffic sign recognition in order to properly parse and understand the roadway. Similarly, “driver alert” systems inside cars need to understand the roadway around them to help aid and protect drivers.

Traffic sign recognition is just one of the problems that computer vision and deep learning can solve.

Our traffic sign dataset

Figure 2: The German Traffic Sign Recognition Benchmark (GTSRB) dataset will be used for traffic sign classification with Keras and deep learning. (image source)

The dataset we’ll be using to train our own custom traffic sign classifier is the German Traffic Sign Recognition Benchmark (GTSRB).

The GTSRB dataset consists of 43 traffic sign classes and nearly 50,000 images.

A sample of the dataset can be seen in Figure 2 above — notice how the traffic signs have been pre-cropped for us, implying that the dataset annotators/creators have manually labeled the signs in the images and extracted the traffic sign Region of Interest (ROI) for us, thereby simplifying the project.

In the real-world, traffic sign recognition is a two-stage process:

  1. Localization: Detect and localize where in an input image/frame a traffic sign is.
  2. Recognition: Take the localized ROI and actually recognize and classify the traffic sign.

Deep learning object detectors can perform localization and recognition in a single forward-pass of the network — if you’re interested in learning more about object detection and traffic sign localization using Faster R-CNNs, Single Shot Detectors (SSDs), and RetinaNet, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I cover the topic in detail.

Challenges with the GTSRB dataset

There are a number of challenges in the GTSRB dataset, the first being that images are low resolution, and worse, have poor contrast (as seen in Figure 2 above). These images are pixelated, and in some cases, it’s extremely challenging, if not impossible, for the human eye and brain to recognize the sign.

The second challenge with the dataset is handling class skew:

Figure 3: The German Traffic Sign Recognition Benchmark (GTSRB) dataset is an example of an unbalanced dataset. We will account for this when training our traffic sign classifier with Keras and deep learning. (image source)

The top class (Speed limit 50km/h) has over 2,000 examples while the least represented class (Speed limit 20km/h) has under 200 examples — that’s an order of magnitude difference!

In order to successfully train an accurate traffic sign classifier we’ll need to devise an experiment that can:

  • Preprocess our input images to improve contrast.
  • Account for class label skew.

Project structure

Go ahead and use the “Downloads” section of this article to download the source code. Once downloaded, unzip the files on your machine.

From here we’ll download the GTSRB dataset from Kaggle. Simply click the “Download (300MB)” button in the Kaggle menubar and follow the prompts to sign into Kaggle using one of the third party authentication partners or with your email address. You may then click the “Download (300MB)” button once more and your download will commence as shown:

Figure 4: How to download the GTSRB dataset from Kaggle for traffic sign recognition with Keras and deep learning.

I extracted the dataset into my project directory as you can see here:

$ tree --dirsfirst --filelimit 10
.
├── examples [25 entries]
├── gtsrb-german-traffic-sign
│   ├── Meta [43 entries]
│   ├── Test [12631 entries]
│   ├── Train [43 entries]
│   ├── meta-1 [43 entries]
│   ├── test-1 [12631 entries]
│   ├── train-1 [43 entries]
│   ├── Meta.csv
│   ├── Test.csv
│   └── Train.csv
├── output
│   ├── trafficsignnet.model
│   │   ├── assets
│   │   ├── variables
│   │   │   ├── variables.data-00000-of-00002
│   │   │   ├── variables.data-00001-of-00002
│   │   │   └── variables.index
│   │   └── saved_model.pb
│   └── plot.png
├── pyimagesearch
│   ├── __init__.py
│   └── trafficsignnet.py
├── train.py
├── signnames.csv
└── predict.py

13 directories, 13 files

Our project contains three main directories and one Python module:

  • gtsrb-german-traffic-sign/
     : Our GTSRB dataset.
  • output/
     : Contains our output model and training history plot generated by
    train.py
     .
  • examples/
     : Contains a random sample of 25 annotated images generated by
    predict.py
     .
  • pyimagesearch
     : A module that comprises our TrafficSignNet CNN.

We will also walkthrough

train.py
  and
predict.py
 . Our training script loads the data, compiles the model, trains, and outputs the serialized model and plot image to disk. From there, our prediction script generates annotated images for visual validation purposes.

Configuring your development environment

For this article, you’ll need to have the following packages installed:

  • OpenCV
  • NumPy
  • scikit-learn
  • scikit-image
  • imutils
  • matplotlib
  • TensorFlow 2.0 (CPU or GPU)

Luckily each of these is easily installed with pip, a Python package manager.

Let’s install the packages now, ideally into a virtual environment as shown (you’ll need to create the environment):

$ workon traffic_signs
$ pip install opencv-contrib-python
$ pip install numpy
$ pip install scikit-learn
$ pip install scikit-image
$ pip install imutils
$ pip install matplotlib
$ pip install tensorflow==2.0.0 # or tensorflow-gpu

Using pip to install OpenCV is hands-down the fastest and easiest way to get started with OpenCV. Instructions on how to create your virtual environment are included in the tutorial at this link. This method (as opposed to compiling from source) simply checks prerequisites and places a precompiled binary that will work on most systems into your virtual environment site-packages. Optimizations may or may not be active. Just keep in mind that the maintainer has elected not to include patented algorithms for fear of lawsuits. Sometimes on PyImageSearch, we use patented algorithms for educational and research purposes (there are free alternatives that you can use commercially). Nevertheless, the pip method is a great option for beginners — just remember that you don’t have the full install. If you need the full install, refer to my install tutorials page.

If you are curious about (1) why we are using TensorFlow 2.0, and (2) wondering why I didn’t instruct you to install Keras, you may be surprised to know that Keras is actually included as part of TensorFlow now. Admittedly, the marriage of TensorFlow and Keras is built upon an interesting past. Be sure to read Keras vs. tf.keras: What’s the difference in TensorFlow 2.0? if you are curious about why TensorFlow now includes Keras.

Once your environment is ready to go, it is time to work on recognizing traffic signs with Keras!

Implementing TrafficSignNet, our CNN traffic sign classifier

Figure 5: The Keras deep learning framework is used to build a Convolutional Neural Network (CNN) for traffic sign classification.

Let’s go ahead and implement a Convolutional Neural Network to classify and recognize traffic signs.

Note: Be sure to review my Keras Tutorial if this is your first time building a CNN with Keras.

I have decided to name this classifier

TrafficSignNet
— open up the
trafficsignnet.py
file in your project directory and then insert the following code:
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense

class TrafficSignNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

Our

tf.keras
  imports are listed on Lines 2-9. We will be taking advantage of Keras’ Sequential API to build our
TrafficSignNet
 CNN (Line 2).

Line 11 defines our

TrafficSignNet
  class followed by Line 13 which defines our
build
  method. The
build
  method accepts four parameters: the image dimensions,
depth
 , and number of
classes
  in the dataset.

Lines 16-19 initialize our

Sequential
  model and specify the CNN’s
inputShape
 .

Let’s define our

CONV => RELU => BN => POOL
  layer set:
# CONV => RELU => BN => POOL
		model.add(Conv2D(8, (5, 5), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))

This set of layers uses a 5×5 kernel to learn larger features — it will help to distinguish between different traffic sign shapes and color blobs on the traffic signs themselves.

From there we define two sets of

(CONV => RELU => CONV => RELU) * 2 => POOL
 layers:
# first set of (CONV => RELU => CONV => RELU) * 2 => POOL
		model.add(Conv2D(16, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(16, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))

		# second set of (CONV => RELU => CONV => RELU) * 2 => POOL
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))

These sets of layers deepen the network by stacking two sets of

CONV => RELU => BN
  layers before applying max-pooling to reduce volume dimensionality.

The head of our network consists of two sets of fully connected layers and a softmax classifier:

# first set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(128))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# second set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(128))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

Dropout is applied as a form of regularization which aims to prevent overfitting. The result is often a more generalizable model.

Line 54 returns our

model
 ; we will compile and train the model in our
train.py
  script next.

If you struggled to understand the terms in this class, be sure to refer to Deep Learning for Computer Vision with Python for conceptual knowledge on the layer types. My Keras Tutorial also provides a brief overview.

Implementing our training script

Now that our

TrafficSignNet
architecture has been implemented, let’s create our Python training script that will be responsible for:
  • Loading our training and testing split from the GTSRB dataset
  • Preprocessing the images
  • Training our model
  • Evaluating our model’s accuracy
  • Serializing the model to disk so we can later use it to make predictions on new traffic sign data

Let’s get started — open up the

train.py
file in your project directory and add the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.trafficsignnet import TrafficSignNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.metrics import classification_report
from skimage import transform
from skimage import exposure
from skimage import io
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import os

Lines 2-18 import our necessary packages:

  • matplotlib
     : The de facto plotting package for Python. We use the
    "Agg"
      backend ensuring that we are able to export our plots as image files to disk (Lines 2 and 3).
  • TrafficSignNet
     : Our traffic sign Convolutional Neural Network that we coded with Keras in the previous section (Line 6).
  • tensorflow.keras
     : Ensures that we can handle data augmentation,
    Adam
      optimization, and one-hot encoding (Lines 7-9).
  • classification_report
     : A scikit-learn method for printing a convenient evaluation for training (Line 10).
  • skimage
     : We will use scikit-image for preprocessing our dataset in lieu of OpenCV as scikit-image provides some additional preprocessing algorithms that OpenCV does not (Lines 11-13).
  • numpy
     : For array and numerical operations (Line 15).
  • argparse
     : Handles parsing command line arguments (Line 16).
  • random
     : For shuffling our dataset randomly (Line 17).
  • os
     : We’ll use this module for grabbing our operating system’s path separator (Line 18).

Let’s go ahead and define a function to load our data from disk:

def load_split(basePath, csvPath):
	# initialize the list of data and labels
	data = []
	labels = []

	# load the contents of the CSV file, remove the first line (since
	# it contains the CSV header), and shuffle the rows (otherwise
	# all examples of a particular class will be in sequential order)
	rows = open(csvPath).read().strip().split("\n")[1:]
	random.shuffle(rows)

The GTSRB dataset is pre-split into training/testing splits for us. Line 20 defines

load_split
  to load each training split respectively. It accepts a path to the base of the dataset as well as a
.csv
  file path which contains the class label for each image.

Lines 22 and 23 initialize our

data
  and
labels
  lists which this function will soon populate and return.

Line 28 loads our

.csv
  file, strips whitespace, and grabs each row via the newline delimiter, skipping the first header row. The result is a list of
rows
  which Line 29 then shuffles randomly.

The result of Lines 28 and 29 can be seen here (i.e. if you were to print the first three rows in the list via 

print(rows[:3])
 ):
['33,35,5,5,28,29,13,Train/13/00013_00001_00009.png',
 '36,36,5,5,31,31,38,Train/38/00038_00049_00021.png',
 '75,77,6,7,69,71,35,Train/35/00035_00039_00024.png']

The format of the data is:

Width, Height, X1, Y1, X2, Y2, ClassID, Image Path
 .

Let’s go ahead and loop over the

rows
  now and extract + preprocess the data that we need:
# loop over the rows of the CSV file
	for (i, row) in enumerate(rows):
		# check to see if we should show a status update
		if i > 0 and i % 1000 == 0:
			print("[INFO] processed {} total images".format(i))

		# split the row into components and then grab the class ID
		# and image path
		(label, imagePath) = row.strip().split(",")[-2:]

		# derive the full path to the image file and load it
		imagePath = os.path.sep.join([basePath, imagePath])
		image = io.imread(imagePath)

Line 32 loops over the

rows
 . Inside the loop, we proceed to:
  • Display a status update to the terminal for every 1000th image processed (Lines 34 and 35).
  • Extract the ClassID (
    label
    ) and
    imagePath
      from the
    row
      (Line 39).
  • Derive the full path to the image file + load the image with scikit-image (Lines 42 and 43).

As mentioned in the “Challenges with the GTSRB dataset” section above, one of the biggest issues with the dataset is that many images have low contrast, making it challenging for the human eye to recognize a given sign (let alone a computer vision/deep learning model).

We can automatically improve image contrast by applying an algorithm called Contrast Limited Adaptive Histogram Equalization (CLAHE), the implementation of which can be found in the scikit-image library.

Using CLAHE we can improve the contrast of our traffic sign images:

Figure 6: As part of preprocessing for our GTSRB dataset for deep learning classification of traffic signs, we apply a method known as Contrast Limited Adaptive Histogram Equalization (CLAHE) to improve image contrast. Original images input images can be seen on the left — notice how contrast is very low and some signs cannot be recognize. By applying CLAHE (right) we can improve image contrast.

While our images may seem a bit “unnatural” to the human eye, the improvement in contrast will better aid our computer vision algorithms in automatically recognizing our traffic signs.

Note: A big thanks to Thomas Tracey who proposed using CLAHE to improve traffic sign recognition in his 2017 article.

Let’s preprocess our images by applying CLAHE now:

# resize the image to be 32x32 pixels, ignoring aspect ratio,
		# and then perform Contrast Limited Adaptive Histogram
		# Equalization (CLAHE)
		image = transform.resize(image, (32, 32))
		image = exposure.equalize_adapthist(image, clip_limit=0.1)

		# update the list of data and labels, respectively
		data.append(image)
		labels.append(int(label))

	# convert the data and labels to NumPy arrays
	data = np.array(data)
	labels = np.array(labels)

	# return a tuple of the data and labels
	return (data, labels)

To complete our loop over the

rows
 , we:
  • Resize the image to 32×32 pixels (Line 48).
  • Apply CLAHE image contrast correction (Line 49).
  • Update
    data
      and
    labels
      lists with the
    image
      itself and the class
    label
      (Lines 52 and 53).

Then, Lines 56-60 convert the

data
  and
labels
  into NumPy arrays and
return
  them to the calling function.

With our

load_split
  function defined, now we can move on to parsing command line arguments:
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input GTSRB")
ap.add_argument("-m", "--model", required=True,
	help="path to output model")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to training history plot")
args = vars(ap.parse_args())

Our three command line arguments consist of:

  • --dataset
     : The path to our GTSRB dataset.
  • --model
     : The desired path/filename of our output model.
  • --plot
     : The path to our training history plot.

Let’s initialize a few hyperparameters and load our class label names:

# initialize the number of epochs to train for, base learning rate,
# and batch size
NUM_EPOCHS = 30
INIT_LR = 1e-3
BS = 64

# load the label names
labelNames = open("signnames.csv").read().strip().split("\n")[1:]
labelNames = [l.split(",")[1] for l in labelNames]

Lines 74-76 initialize the number of epochs to train for, our initial learning rate, and batch size.

Lines 79 and 80 load the class

labelNames
  from a
.csv
  file. Unnecessary markup in the file is automatically discarded.

Now let’s go ahead and load + preprocess our data:

# derive the path to the training and testing CSV files
trainPath = os.path.sep.join([args["dataset"], "Train.csv"])
testPath = os.path.sep.join([args["dataset"], "Test.csv"])

# load the training and testing data
print("[INFO] loading training and testing data...")
(trainX, trainY) = load_split(args["dataset"], trainPath)
(testX, testY) = load_split(args["dataset"], testPath)

# scale data to the range of [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# one-hot encode the training and testing labels
numLabels = len(np.unique(trainY))
trainY = to_categorical(trainY, numLabels)
testY = to_categorical(testY, numLabels)

# account for skew in the labeled data
classTotals = trainY.sum(axis=0)
classWeight = classTotals.max() / classTotals

In this block we:

  • Derive paths to the training and testing splits (Lines 83 and 84).
  • Use our
    load_split
    function to load each of the training/testing splits, respectively (Lines 88 and 89).
  • Preprocess the images by scaling them to the range [0, 1] (Lines 92 and 93).
  • One-hot encode the training/testing class labels (Lines 96-98).
  • Account for skew in our dataset (i.e. the fact that we have significantly more images for some classes than others). Lines 101 and 102 assign a weight to each class for use during training.

From here, we’ll prepare + train our

model
 :
# construct the image generator for data augmentation
aug = ImageDataGenerator(
	rotation_range=10,
	zoom_range=0.15,
	width_shift_range=0.1,
	height_shift_range=0.1,
	shear_range=0.15,
	horizontal_flip=False,
	vertical_flip=False,
	fill_mode="nearest")

# initialize the optimizer and compile the model
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / (NUM_EPOCHS * 0.5))
model = TrafficSignNet.build(width=32, height=32, depth=3,
	classes=numLabels)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
print("[INFO] training network...")
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BS),
	validation_data=(testX, testY),
	steps_per_epoch=trainX.shape[0] // BS,
	epochs=NUM_EPOCHS,
	class_weight=classWeight,
	verbose=1)

Lines 105-113 initialize our data augmentation object with random rotation, zoom, shift, shear, and flip settings. Notice how we’re not applying horizontal or vertical flips here as traffic signs in the wild will not be flipped.

Lines 117-121 compile our

TraffigSignNet
  model with the
Adam
  optimizer and learning rate decay.

Lines 125-131 train the

model
  using Keras’ fit_generator method. Notice the
class_weight
  parameter is passed to accommodate the skew in our dataset.

Next, we will evaluate the

model
  and serialize it to disk:
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BS)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=labelNames))

# save the network to disk
print("[INFO] serializing network to '{}'...".format(args["model"]))
model.save(args["model"])

Line 135 evaluates the

model
  on the testing set. From there, Lines 136 and 137 print a classification report in the terminal.

Line 141 serializes the Keras

model
  to disk so that we can later use it for inference in our prediction script.

Finally, the following code block plots the training accuracy/loss curves and exports the plot to an image file on disk:

# plot the training loss and accuracy
N = np.arange(0, NUM_EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Take special note here that TensorFlow 2.0 has renamed the training history keys:

  • H.history["acc"]
      is now
    H.history["accuracy"]
     .
  • H.history["val_acc"]
      is now
    H.history["val_accuracy"]
     .

At this point, you should be using TensorFlow 2.0 (with Keras built-in), but if you aren’t, you can adjust the key names (Lines 149 and 150).

Personally, I still haven’t figured out why the TensorFlow developers made the change to spell out “accuracy” but did not spell out “validation”. It seems counterintuitive to me. That said, all frameworks and codebases have certain nuances that we need to learn to deal with.

Training TrafficSignNet on the traffic sign dataset

To train our traffic sign classification model make sure you have:

  1. Used the “Downloads” section of this tutorial to download the source code.
  2. Followed the “Project structure” section above to download our traffic sign dataset.

From there, open up a terminal and execute the following command:

$ python train.py --dataset gtsrb-german-traffic-sign \
	--model output/trafficsignnet.model --plot output/plot.png
[INFO] loading training and testing data...
[INFO] compiling model...
[INFO] training network...
Epoch 1/30
612/612 [==============================] - 49s 81ms/step - loss: 2.6584 - accuracy: 0.2951 - val_loss: 2.1152 - val_accuracy: 0.3513
Epoch 2/30
612/612 [==============================] - 47s 77ms/step - loss: 1.3989 - accuracy: 0.5558 - val_loss: 0.7909 - val_accuracy: 0.7417
Epoch 3/30
612/612 [==============================] - 48s 78ms/step - loss: 0.9402 - accuracy: 0.6989 - val_loss: 0.5147 - val_accuracy: 0.8302
Epoch 4/30
612/612 [==============================] - 47s 76ms/step - loss: 0.6940 - accuracy: 0.7759 - val_loss: 0.4559 - val_accuracy: 0.8515
Epoch 5/30
612/612 [==============================] - 47s 76ms/step - loss: 0.5521 - accuracy: 0.8219 - val_loss: 0.3004 - val_accuracy: 0.9055
...
Epoch 26/30
612/612 [==============================] - 46s 75ms/step - loss: 0.1213 - accuracy: 0.9627 - val_loss: 0.7386 - val_accuracy: 0.8274
Epoch 27/30
612/612 [==============================] - 46s 75ms/step - loss: 0.1175 - accuracy: 0.9633 - val_loss: 0.1931 - val_accuracy: 0.9505
Epoch 28/30
612/612 [==============================] - 46s 75ms/step - loss: 0.1101 - accuracy: 0.9664 - val_loss: 0.1553 - val_accuracy: 0.9575
Epoch 29/30
612/612 [==============================] - 46s 76ms/step - loss: 0.1098 - accuracy: 0.9662 - val_loss: 0.1642 - val_accuracy: 0.9581
Epoch 30/30
612/612 [==============================] - 47s 76ms/step - loss: 0.1063 - accuracy: 0.9684 - val_loss: 0.1778 - val_accuracy: 0.9495
[INFO] evaluating network...
                               precision    recall  f1-score   support
         Speed limit (20km/h)       0.94      0.98      0.96        60
         Speed limit (30km/h)       0.96      0.97      0.97       720
         Speed limit (50km/h)       0.95      0.98      0.96       750
         Speed limit (60km/h)       0.98      0.92      0.95       450
         Speed limit (70km/h)       0.98      0.96      0.97       660
         Speed limit (80km/h)       0.92      0.93      0.93       630
  End of speed limit (80km/h)       0.96      0.87      0.91       150
        Speed limit (100km/h)       0.93      0.94      0.93       450
        Speed limit (120km/h)       0.90      0.99      0.94       450
                   No passing       1.00      0.97      0.98       480
 No passing veh over 3.5 tons       1.00      0.96      0.98       660
 Right-of-way at intersection       0.95      0.93      0.94       420
                Priority road       0.99      0.99      0.99       690
                        Yield       0.98      0.99      0.99       720
                         Stop       1.00      1.00      1.00       270
                  No vehicles       0.99      0.90      0.95       210
    Veh > 3.5 tons prohibited       0.97      0.99      0.98       150
                     No entry       1.00      0.94      0.97       360
              General caution       0.98      0.77      0.86       390
         Dangerous curve left       0.75      0.60      0.67        60
        Dangerous curve right       0.69      1.00      0.81        90
                 Double curve       0.76      0.80      0.78        90
                   Bumpy road       0.99      0.78      0.87       120
                Slippery road       0.66      0.99      0.79       150
    Road narrows on the right       0.80      0.97      0.87        90
                    Road work       0.94      0.98      0.96       480
              Traffic signals       0.87      0.95      0.91       180
                  Pedestrians       0.46      0.55      0.50        60
            Children crossing       0.93      0.94      0.94       150
            Bicycles crossing       0.92      0.86      0.89        90
           Beware of ice/snow       0.88      0.75      0.81       150
        Wild animals crossing       0.98      0.95      0.96       270
   End speed + passing limits       0.98      0.98      0.98        60
             Turn right ahead       0.97      1.00      0.98       210
              Turn left ahead       0.98      1.00      0.99       120
                   Ahead only       0.99      0.97      0.98       390
         Go straight or right       1.00      1.00      1.00       120
          Go straight or left       0.92      1.00      0.96        60
                   Keep right       0.99      1.00      0.99       690
                    Keep left       0.97      0.96      0.96        90
         Roundabout mandatory       0.90      0.99      0.94        90
            End of no passing       0.90      1.00      0.94        60
End no passing veh > 3.5 tons       0.91      0.89      0.90        90

                     accuracy                           0.95     12630
                    macro avg       0.92      0.93      0.92     12630
                 weighted avg       0.95      0.95      0.95     12630

[INFO] serializing network to 'output/trafficsignnet.model'...

Note: Some class names have been shortened for readability in the terminal output block.

Figure 7: Keras and deep learning is used to train a traffic sign classifier.

Here you can see we are obtaining 95% accuracy on our testing set!

Implementing our prediction script

Now that our traffic sign recognition model is trained, let’s learn how to:

  1. Load the model from disk
  2. Load sample images from disk
  3. Preprocess the sample images in the same manner as we did for training
  4. Pass our images through our traffic sign classifier
  5. Obtain our final output predictions

To accomplish these goals we’ll need to inspect the contents of

predict.py
:
# import the necessary packages
from tensorflow.keras.models import load_model
from skimage import transform
from skimage import exposure
from skimage import io
from imutils import paths
import numpy as np
import argparse
import imutils
import random
import cv2
import os

Lines 2-12 import our necessary packages, modules, and functions. Most notably we import

load_model
  from
tensorflow.keras.models
 , ensuring that we can load our serialized model from disk. You can learn more about saving and loading Keras models here.

We’ll use scikit-image to preprocess our images, just like we did in our training script.

But unlike in our training script, we’ll utilize OpenCV to annotate and write our output image to disk.

Let’s parse our command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
	help="path to pre-trained traffic sign recognizer")
ap.add_argument("-i", "--images", required=True,
	help="path to testing directory containing images")
ap.add_argument("-e", "--examples", required=True,
	help="path to output examples directory")
args = vars(ap.parse_args())

Lines 15-22 parse three command line arguments:

  • --model
     : The path to the serialized traffic sign recognizer Keras model on disk (we trained the model in the previous section).
  • --images
     : The path to a directory of testing images.
  • --examples
     : Our path to the directory where our annotated output images will be stored.

With each of these paths in the

args
  dictionary, we’re ready to proceed:
# load the traffic sign recognizer model
print("[INFO] loading model...")
model = load_model(args["model"])

# load the label names
labelNames = open("signnames.csv").read().strip().split("\n")[1:]
labelNames = [l.split(",")[1] for l in labelNames]

# grab the paths to the input images, shuffle them, and grab a sample
print("[INFO] predicting...")
imagePaths = list(paths.list_images(args["images"]))
random.shuffle(imagePaths)
imagePaths = imagePaths[:25]

Line 26 loads our trained traffic sign

model
 from disk into memory.

Lines 29 and 30 load and parse the class

labelNames
 .

Lines 34-36 grab the paths to the input images,

shuffle
  them, and grab
25
  sample images.

We’ll now loop over the samples:

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
	# load the image, resize it to 32x32 pixels, and then apply
	# Contrast Limited Adaptive Histogram Equalization (CLAHE),
	# just like we did during training
	image = io.imread(imagePath)
	image = transform.resize(image, (32, 32))
	image = exposure.equalize_adapthist(image, clip_limit=0.1)

	# preprocess the image by scaling it to the range [0, 1]
	image = image.astype("float32") / 255.0
	image = np.expand_dims(image, axis=0)

	# make predictions using the traffic sign recognizer CNN
	preds = model.predict(image)
	j = preds.argmax(axis=1)[0]
	label = labelNames[j]

	# load the image using OpenCV, resize it, and draw the label
	# on it
	image = cv2.imread(imagePath)
	image = imutils.resize(image, width=128)
	cv2.putText(image, label, (5, 15), cv2.FONT_HERSHEY_SIMPLEX,
		0.45, (0, 0, 255), 2)

	# save the image to disk
	p = os.path.sep.join([args["examples"], "{}.png".format(i)])
	cv2.imwrite(p, image)

Inside our loop over the

imagePaths
 (beginning on Line 39), we:
  • Load the input image with scikit-image (Line 43).
  • Preprocess the image in same manner as we did for training data (Lines 44-48). It is absolutely crucial to preprocess our images in the same way we did for training, including, (1) resizing, (2) CLAHE contrast adjustment, and (3) scaling to the range [0, 1]If we don’t preprocess our testing data in the same manner as our training data then our model predictions won’t make sense.
  • Add a dimension to the image — we will perform inference on a batch size of 1 (Line 49).
  • Make a prediction and grab the class label with the highest probability (Lines 52-54).
  • Using OpenCV we load, resize, annotate the image with the label, and write the output image to disk (Lines 58-65).

This process is repeated for all 25 image samples.

Make predictions on traffic sign data

To make predictions on traffic sign data using our trained

TrafficSignNet
model, make sure you have used the “Downloads” section of this tutorial to download the source code and pre-trained model.

From there, open up a terminal and execute the following command:

$ python predict.py --model output/trafficsignnet.model \
	--images gtsrb-german-traffic-sign/Test \
	--examples examples
[INFO] loading model...
[INFO] predicting...

Figure 8: Keras deep learning traffic sign classification results.

As you can see, our traffic sign classifier is correctly recognizing our input traffic signs!

Where can I learn more about traffic sign recognition?

Figure 9: In my deep learning book, I cover multiple object detection methods. I actually cover how to build the CNN, train the CNN, and make inferences. Not to mention deep learning fundamentals, best practices, and my personal set of rules of thumb. Grab your copy now so you can start learning new skills.

This tutorial frames traffic sign recognition as a classification problem, meaning that the traffic signs have been pre-cropped from the input image — this process was done when the dataset curators manually annotated and created the dataset.

However, in the real-world, traffic sign recognition is actually an object detection problem.

Object detection enables you to not only recognize the traffic sign but also localize where in the input frame the traffic sign is.

The process of object detection is not as simple and straightforward as image classification. It is actually far, far more complicated — the details and intricacies are outside the scope of blog post. They are, however, within the scope of my deep learning book.

If you’re interested in learning how to:

  1. Prepare and annotate your own image datasets for object detection
  2. Fine-tune and train your own custom object detectors, including Faster R-CNNs, SSDs, and RetinaNet on your own datasets
  3. Uncover my best practices, techniques, and procedures to utilize when training your own deep learning object detectors

…then you’ll want to be sure to take a look at my new deep learning book. Inside Deep Learning for Computer Vision with Python, I will guide you, step-by-step, on building your own deep learning object detectors.

You will learn to replicate my very own experiments by:

  • Training a Faster R-CNN from scratch to localize and recognize 47 types of traffic signs.
  • Training a Single Shot Detector (SSD) on a dataset of front and rear views of vehicles.
  • Recognizing familiar product logos in images using a custom RetinaNet model.
  • Building a weapon detection system using RetinaNet that is capable of real-time video object detection.

Be sure to take a look — and don’t forget to grab your free sample chapters + table of contents PDF while you’re there!

Summary

In this tutorial, you learned how to perform traffic sign classification and recognition with Keras and Deep Learning.

To create our traffic sign classifier, we:

  • Utilized the popular German Traffic Sign Recognition Benchmark (GTSRB) as our dataset.
  • Implemented a Convolutional Neural Network called
    TrafficSignNet
    using the Keras deep learning library.
  • Trained
    TrafficSignNet
    on the GTSRB dataset, obtaining 95% accuracy.
  • Created a Python script that loads our trained
    TrafficSignNet
    model and then classifies new input images.

I hope you enjoyed today’s post on traffic sign classification with Keras!

If you’re interested in learning more about training your own custom deep learning models for traffic sign recognition and detection, be sure to refer to Deep Learning for Computer Vision with Python where I cover the topic in more detail.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Traffic Sign Classification with Keras and Deep Learning appeared first on PyImageSearch.

Detecting Natural Disasters with Keras and Deep Learning

$
0
0

In this tutorial, you will learn how to automatically detect natural disasters (earthquakes, floods, wildfires, cyclones/hurricanes) with up to 95% accuracy using Keras, Computer Vision, and Deep Learning.

I remember the first time I ever experienced a natural disaster — I was just a kid in kindergarten, no more than 6-7 years old.

We were outside for recess, playing on the jungle gym, running around like the wild animals that young children are.

Rain was in the forecast. It was cloudy. And very humid.

My mother had given me a coat to wear outside, but I was hot and unconformable — the humidity made the cotton/polyester blend stick to my skin. The coat, just like the air around me, was suffocating.

All of a sudden the sky changed from “normal rain clouds” to an ominous green.

The recess monitor reached into her pocket, grabbed her whistle, and blew it, indicating it was time for us to settle our wild animal antics and come inside for schooling.

After recess we would typically sit in a circle around the teacher’s desk for show-and-tell.

But not this time.

We were immediately rushed into the hallway and were told to cover our heads with our hands — a tornado had just touched down near our school.

Just the thought of a tornado is enough to scare a kid.

But to actually experience one?

That’s something else entirely.

The wind picked up dramatically, an angry tempest howling and berating our school with tree branches, rocks, and whatever loose debris that was not tied down.

The entire ordeal couldn’t have lasted more than 5-10 minutes — but it felt like a terrifying eternity.

It turned out that we were safe the entire time. After the tornado had touched down it started carving a path through the cornfields away from our school, not towards it.

We were lucky.

It’s interesting how experiences as a young kid, especially the ones that scare you, shape you and mold you after you grow up.

A few days after the event my mom took me to the local library. I picked out every book on tornados and hurricanes that I could find. Even though I only had a basic reading level at the time, I devoured them, studying the pictures intently until I could recreate them in mind — imagining what it would be like to be inside one of those storms.

Later, in graduate school, I experienced the historic June 29th, 2012 derecho that delivered 60+ MPH sustained winds and gusts of over 100 MPH, knocking down power lines and toppling large trees.

That storm killed 29 people, injured hundreds of others, and caused loss of electricity and power in parts of the United States east coast for over 6 days, an unprecedented amount of time in the modern-day United States.

Natural disasters cannot be prevented — but they can be detected, giving people precious time to get to safety.

In this tutorial, you’ll learn how we can use Computer Vision and Deep Learning to help detect natural disasters.

To learn how to detect natural disasters with Keras, Computer Vision, and Deep Learning, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Detecting Natural Disasters with Keras and Deep Learning

In the first part of this tutorial, we’ll discuss how computer vision and deep learning algorithms can be used to automatically detect natural disasters in images and video streams.

From there we’ll review our natural disaster dataset which consists of four classes:

  • Cyclone/hurricane
  • Earthquake
  • Flood
  • Wildfire

We’ll then design a set of experiments that will:

  • Help us fine-tune VGG16 (pre-trained on ImageNet) on our dataset.
  • Find optimal learning rates.
  • Train our model and obtain > 95% accuracy!

Let’s get started!

How can computer vision and deep learning detect natural disasters?

Figure 1: We can detect natural disasters with Keras and Deep Learning using a dataset of natural disaster images. (image source)

Natural disasters cannot be prevented — but they can be detected.

All around the world we use sensors to monitor for natural disasters:

  • Seismic sensors (seismometers) and vibration sensors (seismoscopes) are used to monitor for earthquakes (and downstream tsunamis).
  • Radar maps are used to detect the signature “hook echo” of a tornado (i.e., a hook that extends from the radar echo).
  • Flood sensors are used to measure moisture levels while water level sensors monitor the height of water along a river, stream, etc.
  • Wildfire sensors are still in their infancy but hope to be able to detect trace amounts of smoke and fire.

Each of these sensors is highly specialized to the task at hand — detect a natural disaster early, alert people, and allow them to get to safety.

Using computer vision we can augment existing sensors, thereby increasing the accuracy of natural disaster detectors, and most importantly, allow people to take precautions, stay safe, and prevent/reduce the number of deaths and injuries that happen due to these disasters.

Our natural disasters image dataset

Figure 2: A dataset of natural disaster images. We’ll use this dataset to train a natural disaster detector with Keras and Deep Learning.

The dataset we are using here today was curated by PyImageSearch reader, Gautam Kumar.

Gautam used Google Images to gather a total of 4,428 images belonging to four separate classes:

  • Cyclone/Hurricane: 928 images
  • Earthquake: 1,350
  • Flood: 1,073
  • Wildfire: 1,077

He then trained a Convolutional Neural Network to recognize each of the natural disaster cases.

Gautam shared his work on LinkedIn profile, gathering the attention of many deep learning practitioners (myself included). I asked him if he would be willing to (1) share his dataset with the PyImageSearch community and (2) allow me to write a tutorial using the dataset. Gautam agreed, and here we are today!

I again want to give a big, heartfelt thank you to Gautam for his hard work and contribution — be sure to thank him if you have the chance!

Downloading the natural disasters dataset

Figure 3: Gautam Kumar’s dataset for detecting natural disasters with Keras and deep learning.

You can use this link to download the original natural disasters dataset via Google Drive.

After you download the archive you should unzip it and inspect the contents:

$ tree --dirsfirst --filelimit 10 Cyclone_Wildfire_Flood_Earthquake_Database
Cyclone_Wildfire_Flood_Earthquake_Database
├── Cyclone [928 entries]
├── Earthquake [1350 entries]
├── Flood [1073 entries]
├── Wildfire [1077 entries]
└── readme.txt

4 directories, 1 file

Here you can see that each of the natural disasters has its own directory with examples of each class residing inside its respective parent directory.

Project structure

Using the

tree
  command, let’s review today’s project available via the “Downloads” section of this tutorial:
$ tree --dirsfirst --filelimit 10
.
├── Cyclone_Wildfire_Flood_Earthquake_Database
│   ├── Cyclone [928 entries]
│   ├── Earthquake [1350 entries]
│   ├── Flood [1073 entries]
│   ├── Wildfire [1077 entries]
│   └── readme.txt
├── output
│   ├── natural_disaster.model
│   │   ├── assets
│   │   ├── variables
│   │   │   ├── variables.data-00000-of-00002
│   │   │   ├── variables.data-00001-of-00002
│   │   │   └── variables.index
│   │   └── saved_model.pb
│   ├── clr_plot.png
│   ├── lrfind_plot.png
│   └── training_plot.png
├── pyimagesearch
│   ├── __init__.py
│   ├── clr_callback.py
│   ├── config.py
│   └── learningratefinder.py
├── videos
│   ├── floods_101_nat_geo.mp4
│   ├── fort_mcmurray_wildfire.mp4
│   ├── hurricane_lorenzo.mp4
│   ├── san_andreas.mp4
│   └── terrific_natural_disasters_compilation.mp4
├── Cyclone_Wildfire_Flood_Earthquake_Database.zip
├── train.py
└── predict.py

11 directories, 20 files

Our project contains:

  • The natural disaster dataset. Refer to the previous two sections.
  • An
    output/
      directory where our model and plots will be stored. The results from my experiment are included.
  • Our
    pyimagesearch
      module containing our Cyclical Learning Rate Keras callback, a configuration file, and Keras Learning Rate Finder.
  • A selection of
    videos/
      for testing the video classification prediction script.
  • Our training script,
    train.py
     . This script will perform fine-tuning on a VGG16 model pre-trained on the ImageNet dataset.
  • Our video classification prediction script,
    predict.py
     , which performs a rolling average prediction to classify the video in real-time.

Our configuration file

Our project is going to span multiple Python files, so to keep our code tidy and organized (and ensure that we don’t have a multitude of command line arguments), let’s instead create a configuration file to store all important paths and variables.

Open up the

config.py
file inside the
pyimagesearch
module and insert the following code:
# import the necessary packages
import os

# initialize the path to the input directory containing our dataset
# of images
DATASET_PATH = "Cyclone_Wildfire_Flood_Earthquake_Database"

# initialize the class labels in the dataset
CLASSES = ["Cyclone", "Earthquake", "Flood", "Wildfire"]

The

os
  module import allows us to build OS-agnostic paths directly in this config file (Line 2).

Line 6 specifies the root path to our natural disaster dataset.

Line 7 provides the names of class labels (i.e. the names of the subdirectories in the dataset).

Let’s define our dataset splits:

# define the size of the training, validation (which comes from the
# train split), and testing splits, respectively
TRAIN_SPLIT = 0.75
VAL_SPLIT = 0.1
TEST_SPLIT = 0.25

Lines 13-15 house our training, testing, and validation split sizes. Take note that the validation split is 10% of the training split (not 10% of all the data).

Next, we’ll define our training parameters:

# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-6
MAX_LR = 1e-4
BATCH_SIZE = 32
STEP_SIZE = 8
CLR_METHOD = "triangular"
NUM_EPOCHS = 48

Lines 19 and 20 contain the minimum and maximum learning rate for Cyclical Learning Rates (CLR).We’ll learn how to set these learning rate values in the “Finding our initial learning rate” section below.

Lines 21-24 define the batch size, step size, CLR method, and the number of training epochs.

From there we’ll define the output paths:

# set the path to the serialized model after training
MODEL_PATH = os.path.sep.join(["output", "natural_disaster.model"])

# define the path to the output learning rate finder plot, training
# history plot and cyclical learning rate plot
LRFIND_PLOT_PATH = os.path.sep.join(["output", "lrfind_plot.png"])
TRAINING_PLOT_PATH = os.path.sep.join(["output", "training_plot.png"])
CLR_PLOT_PATH = os.path.sep.join(["output", "clr_plot.png"])

Lines 27-33 define the following output paths:

  • Serialized model after training
  • Learning rate finder plot
  • Training history plot
  • CLR plot

Implementing our training script with Keras

Our training procedure will consist of two steps:

  1. Step #1: Use our learning rate finder to find optimal learning rates to fine-tune our VGG16 CNN on our dataset.
  2. Step #2: Use our optimal learning rates in conjunction with Cyclical Learning Rates (CLR) to obtain a high accuracy model.

Our

train.py
file will handle both of these steps.

Go ahead and open up

train.py
in your favorite code editor and insert the following code:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.learningratefinder import LearningRateFinder
from pyimagesearch.clr_callback import CyclicLR
from pyimagesearch import config
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import cv2
import sys
import os

Lines 2-27 import necessary packages including:

  • matplotlib
     : For plotting (using the
    "Agg"
      backend so plot images can be saved to disk).
  • tensorflow
     : Imports including our
    VGG16
      CNN, data augmentation, layer types, and
    SGD
      optimizer.
  • scikit-learn
     : Imports including a label binarizer, dataset splitting function, and an evaluation reporting tool.
  • LearningRateFinder
     : Our Keras Learning Rate Finder class.
  • CyclicLR
     : A Keras callback that oscillates learning rates, known as Cyclical Learning Rates. CLRs lead to faster convergence and typically require fewer experiments for hyperparameter updates.
  • config
     : The custom configuration settings we reviewed in the previous section.
  • paths
     : Includes a function for listing the image paths in a directory tree.
  • cv2
     : OpenCV for preprocessing and display.

Let’s parse command line arguments and grab our image paths:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-f", "--lr-find", type=int, default=0,
	help="whether or not to find optimal learning rate")
args = vars(ap.parse_args())

# grab the paths to all images in our dataset directory and initialize
# our lists of images and class labels
print("[INFO] loading images...")
imagePaths = list(paths.list_images(config.DATASET_PATH))
data = []
labels = []

Recall that most of our settings are in

config.py
 . There is one exception. The
--lr-find
  command line argument tells our script whether or not to find the optimal learning rate (Lines 30-33).

Line 38 grabs paths to all images in our dataset.

We then initialize two synchronized lists to hold our image 

data
  and
labels
  (Lines 39 and 40).

Let’s populate the

data
  and
labels
  lists now:
# loop over the image paths
for imagePath in imagePaths:
	# extract the class label
	label = imagePath.split(os.path.sep)[-2]

	# load the image, convert it to RGB channel ordering, and resize
	# it to be a fixed 224x224 pixels, ignoring aspect ratio
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image = cv2.resize(image, (224, 224))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

# convert the data and labels to NumPy arrays
print("[INFO] processing data...")
data = np.array(data, dtype="float32")
labels = np.array(labels)
 
# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)

Lines 43-55 loop over

imagePaths
, while:
  • Extracting the class
    label
      from the path (Line 45).
  • Loading and preprocessing the
    image
      (Lines 49-51). Images are converted to RGB channel ordering and resized to 224×224 for VGG16.
  • Adding the preprocessed
    image
      to the
    data
      list (Line 54).
  • Adding the
    label
      to the
    labels
      list (Lines 55).

Line 59 performs a final preprocessing step by converting the

data
  to a
"float32"
  datatype NumPy array.

Similarly, Line 60 converts

labels
  to an array so that Lines 63 and 64 can perform one-hot encoding.

From here, we’ll partition our data and set up data augmentation:

# partition the data into training and testing splits
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=config.TEST_SPLIT, random_state=42)

# take the validation split from the training split
(trainX, valX, trainY, valY) = train_test_split(trainX, trainY,
	test_size=config.VAL_SPLIT, random_state=84)

# initialize the training data augmentation object
aug = ImageDataGenerator(
	rotation_range=30,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

Lines 67-72 construct training, testing, and validation splits.

Lines 75-82 instantiate our data augmentation object. Read more about data augmentation in my previous posts as well as in the Practitioner Bundle of Deep Learning for Computer Vision with Python.

At this point we’ll set up our VGG16 model for fine-tuning:

# load the VGG16 network, ensuring the head FC layer sets are left
# off
baseModel = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(512, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(len(config.CLASSES), activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=config.MIN_LR, momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

Lines 86 and 87 load

VGG16
  using pre-trained ImageNet weights (but without the fully-connected layer head).

Lines 91-95 create a new fully-connected layer head followed by Line 99 which adds the new FC layer to the body of VGG16.

Lines 103 and 104 mark the body of VGG16 as not trainable — we will be training (i.e. fine-tuning) only the FC layer head.

Lines 109-111 then

compile
  our model with the Stochastic Gradient Descent (
SGD
 ) optimizer and our specified minimum learning rate.

The first time you run the script, you should set the

--lr-find
  command line argument to use the Keras Learning Rate Finder to determine the optimal learning rate. Let’s see how that works:
# check to see if we are attempting to find an optimal learning rate
# before training for the full number of epochs
if args["lr_find"] > 0:
	# initialize the learning rate finder and then train with learning
	# rates ranging from 1e-10 to 1e+1
	print("[INFO] finding learning rate...")
	lrf = LearningRateFinder(model)
	lrf.find(
		aug.flow(trainX, trainY, batch_size=config.BATCH_SIZE),
		1e-10, 1e+1,
		stepsPerEpoch=np.ceil((trainX.shape[0] / float(config.BATCH_SIZE))),
		epochs=20,
		batchSize=config.BATCH_SIZE)
 
	# plot the loss for the various learning rates and save the
	# resulting plot to disk
	lrf.plot_loss()
	plt.savefig(config.LRFIND_PLOT_PATH)
 
	# gracefully exit the script so we can adjust our learning rates
	# in the config and then train the network for our full set of
	# epochs
	print("[INFO] learning rate finder complete")
	print("[INFO] examine plot and adjust learning rates before training")
	sys.exit(0)

Line 115 checks to see if we should attempt to find optimal learning rates. Assuming so, we:

  • Initialize
    LearningRateFinder
      (Line 119).
  • Start training with a
    1e-10
      learning rate and exponentially increase it until we hit
    1e+1
      (Lines 120-125).
  • Plot the loss vs. learning rate and save the resulting figure (Lines 129 and 130).
  • Gracefully
    exit
      the script after printing a message instructing the user to inspect the learning rate finder plot (Lines 135-137).

After this code executes we now need to:

  1. Step #1: Review the generated plot.
  2. Step #2: Update
    config.py
    with our
    MIN_LR
      and
    MAX_LR
    , respectively.
  3. Step #3: Train the network on our full dataset.

Assuming we have completed Steps #1 and #2, let’s now handle Step #3 where our minimum and maximum learning rate have already been found and updated in the config.

In this case, it is time to initialize our Cyclical Learning Rate class and commence training:

# otherwise, we have already defined a learning rate space to train
# over, so compute the step size and initialize the cyclic learning
# rate method
stepSize = config.STEP_SIZE * (trainX.shape[0] // config.BATCH_SIZE)
clr = CyclicLR(
	mode=config.CLR_METHOD,
	base_lr=config.MIN_LR,
	max_lr=config.MAX_LR,
	step_size=stepSize)

# train the network
print("[INFO] training network...")
H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=config.BATCH_SIZE),
	validation_data=(valX, valY),
	steps_per_epoch=trainX.shape[0] // config.BATCH_SIZE,
	epochs=config.NUM_EPOCHS,
	callbacks=[clr],
	verbose=1)

Lines 142-147 initialize our

CyclicLR
 .

Lines 151-157 then train our

model
  using .fit_generator with our
aug
  data augmentation object and our
clr
  callback.

Upon training completion, we proceed to evaluate and save our

model
 :
# evaluate the network and show a classification report
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=config.BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=config.CLASSES))

# serialize the model to disk
print("[INFO] serializing network to '{}'...".format(config.MODEL_PATH))
model.save(config.MODEL_PATH)

Line 161 makes predictions on our test set. Those predictions are passed into Lines 162 and 163 which print a classification report summary.

Line 167 serializes and saves the fine-tuned model to disk.

Finally, let’s plot both our training history and CLR history:

# construct a plot that plots and saves the training history
N = np.arange(0, config.NUM_EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(config.TRAINING_PLOT_PATH)

# plot the learning rate history
N = np.arange(0, len(clr.history["lr"]))
plt.figure()
plt.plot(N, clr.history["lr"])
plt.title("Cyclical Learning Rate (CLR)")
plt.xlabel("Training Iterations")
plt.ylabel("Learning Rate")
plt.savefig(config.CLR_PLOT_PATH)

Lines 170-181 generate a plot of our training history and save the plot to disk.

Note: In TensorFlow 2.0, the history dictionary keys have changed from

acc
  to
accuracy
  and
val_acc
  to
val_accuracy
 . It is especially confusing since “accuracy” is spelled out now, but “validation” is not. Take special care with this nuance depending on your TensorFlow version.

Lines 184-190 plot our Cyclical Learning Rate history and save the figure to disk.

Finding our initial learning rate

Before we attempt to fine-tune our model to recognize natural disasters, let’s first use our learning rate finder to find an optimal set of learning rate ranges. Using this optimal learning rate range we’ll then be able to apply Cyclical Learning Rates to improve our model accuracy.

Make sure you have both:

  1. Used the “Downloads” section of this tutorial to download the source code.
  2. Downloaded the dataset using the “Downloading the natural disasters dataset” section above.

From there, open up a terminal and execute the following command:

$ python train.py --lr-find 1
[INFO] loading images...
[INFO] processing data...
[INFO] compiling model...
[INFO] finding learning rate...
Epoch 1/20
94/94 [==============================] - 29s 314ms/step - loss: 9.7411 - accuracy: 0.2664
Epoch 2/20
94/94 [==============================] - 28s 295ms/step - loss: 9.5912 - accuracy: 0.2701
Epoch 3/20
94/94 [==============================] - 27s 291ms/step - loss: 9.4601 - accuracy: 0.2731
...
Epoch 12/20
94/94 [==============================] - 27s 290ms/step - loss: 2.7111 - accuracy: 0.7764
Epoch 13/20
94/94 [==============================] - 27s 286ms/step - loss: 5.9785 - accuracy: 0.6084
Epoch 14/20
47/94 [==============>...............] - ETA: 13s - loss: 10.8441 - accuracy: 0.3261
[INFO] learning rate finder complete
[INFO] examine plot and adjust learning rates before training

Provided the

train.py
script exited without error, you should now have a file named
lrfind_plot.png
in your output directory.

Take a second now to inspect this image:

Figure 4: Using a Keras Learning Rate Finder to find the optimal learning rates to fine tune our CNN on our natural disaster dataset. We will use the dataset to train a model for detecting natural disasters with the Keras deep learning framework.

Examining the plot you can see that our model initially starts to learn and gain traction around

1e-6
 .

Our loss continues to drop until approximately

1e-4
  where it starts to rise again, a sure sign of overfitting.

Our optimal learning rate range is, therefore,

1e-6
  to
1e-4
 .

Update our learning rates

Now that we know our optimal learning rates, let’s go back to our

config.py
file and update them accordingly:
# define the minimum learning rate, maximum learning rate, batch size,
# step size, CLR method, and number of epochs
MIN_LR = 1e-6
MAX_LR = 1e-4
BATCH_SIZE = 32
STEP_SIZE = 8
CLR_METHOD = "triangular"
NUM_EPOCHS = 48

Notice on Lines 19 and 20 (highlighted) of our configuration file that the

MIN_LR
  and
MAX_LR
  learning rate values are freshly updated. These values were found by inspecting our Keras Learning Rate Finder plot in the section above.

Training the natural disaster detection model with Keras

We can now fine-tune our model to recognize natural disasters!

Execute the following command which will train our network over the full set of epochs:

$ python train.py
[INFO] loading images...
[INFO] processing data...
[INFO] compiling model...
[INFO] training network...
Epoch 1/48
93/93 [==============================] - 32s 343ms/step - loss: 8.5819 - accuracy: 0.3254 - val_loss: 2.5915 - val_accuracy: 0.6829
Epoch 2/48
93/93 [==============================] - 30s 320ms/step - loss: 4.2144 - accuracy: 0.6194 - val_loss: 1.2390 - val_accuracy: 0.8573
Epoch 3/48
93/93 [==============================] - 29s 316ms/step - loss: 2.5044 - accuracy: 0.7605 - val_loss: 1.0052 - val_accuracy: 0.8862
Epoch 4/48
93/93 [==============================] - 30s 322ms/step - loss: 2.0702 - accuracy: 0.8011 - val_loss: 0.9150 - val_accuracy: 0.9070
Epoch 5/48
93/93 [==============================] - 29s 313ms/step - loss: 1.5996 - accuracy: 0.8366 - val_loss: 0.7397 - val_accuracy: 0.9268
...
Epoch 44/48
93/93 [==============================] - 28s 304ms/step - loss: 0.2180 - accuracy: 0.9275 - val_loss: 0.2608 - val_accuracy: 0.9476
Epoch 45/48
93/93 [==============================] - 29s 315ms/step - loss: 0.2521 - accuracy: 0.9178 - val_loss: 0.2693 - val_accuracy: 0.9449
Epoch 46/48
93/93 [==============================] - 29s 312ms/step - loss: 0.2330 - accuracy: 0.9284 - val_loss: 0.2687 - val_accuracy: 0.9467
Epoch 47/48
93/93 [==============================] - 29s 310ms/step - loss: 0.2120 - accuracy: 0.9322 - val_loss: 0.2646 - val_accuracy: 0.9476
Epoch 48/48
93/93 [==============================] - 29s 311ms/step - loss: 0.2237 - accuracy: 0.9318 - val_loss: 0.2664 - val_accuracy: 0.9485
[INFO] evaluating network...
              precision    recall  f1-score   support

     Cyclone       0.99      0.97      0.98       205
  Earthquake       0.96      0.93      0.95       362
       Flood       0.90      0.94      0.92       267
    Wildfire       0.96      0.97      0.96       273

    accuracy                           0.95      1107
   macro avg       0.95      0.95      0.95      1107
weighted avg       0.95      0.95      0.95      1107

[INFO] serializing network to 'output/natural_disaster.model'...

Here you can see that we are obtaining 95% accuracy when recognizing natural disasters in the testing set!

Examining our training plot we can see that our validation loss follows our training loss, implying there is little overfitting within our dataset itself:

Figure 5: Training history accuracy/loss curves for creating a natural disaster classifier using Keras and deep learning.

Finally, we have our learning rate plot which shows our our CLR callback oscillates the learning rate between our

MIN_LR
and
MAX_LR
, respectively:

Figure 6: Cyclical learning rates are used with Keras and deep learning for detecting natural disasters.

Implementing our natural disaster prediction script

Now that our model has been trained, let’s see how we can use it to make predictions on images/video it has never been before — and thereby pave the way for an automatic natural disaster detection system.

To create this script we’ll take advantage of the temporal nature of videos, specifically the assumption that subsequent frames in a video will have similar semantic contents.

By performing rolling prediction accuracy we’ll be able to “smoothen out” the predictions and avoid “prediction flickering”.

I have already covered this near-identical script in-depth in my Video Classification with Keras and Deep Learning article. Be sure to refer to that article for the full background and more-detailed code explanations.

To accomplish natural disaster video classification let’s inspect

predict.py
:
# import the necessary packages
from tensorflow.keras.models import load_model
from pyimagesearch import config
from collections import deque
import numpy as np
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True,
	help="path to our input video")
ap.add_argument("-o", "--output", required=True,
	help="path to our output video")
ap.add_argument("-s", "--size", type=int, default=128,
	help="size of queue for averaging")
ap.add_argument("-d", "--display", type=int, default=-1,
	help="whether or not output frame should be displayed to screen")
args = vars(ap.parse_args())

Lines 2-7 load necessary packages and modules. In particular, we’ll be using

deque
  from Python’s
collections
  module to assist with our rolling average algorithm.

Lines 10-19 parse command line arguments including the path to our input/output videos, size of our rolling averaging queue, and whether we will display the output frame to our screen while the video is being generated.

Let’s go ahead and load our natural disaster classification model and initialize our queue + video stream:

# load the trained model from disk
print("[INFO] loading model and label binarizer...")
model = load_model(config.MODEL_PATH)

# initialize the predictions queue
Q = deque(maxlen=args["size"])

# initialize the video stream, pointer to output video file, and
# frame dimensions
print("[INFO] processing video...")
vs = cv2.VideoCapture(args["input"])
writer = None
(W, H) = (None, None)

With our

model
 ,
Q
 , and
vs
  ready to go, we’ll begin looping over frames:
# loop over frames from the video file stream
while True:
	# read the next frame from the file
	(grabbed, frame) = vs.read()
 
	# if the frame was not grabbed, then we have reached the end
	# of the stream
	if not grabbed:
		break
 
	# if the frame dimensions are empty, grab them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

	# clone the output frame, then convert it from BGR to RGB
	# ordering and resize the frame to a fixed 224x224
	output = frame.copy()
	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frame = cv2.resize(frame, (224, 224))
	frame = frame.astype("float32")

Lines 38-47 grab a

frame
  and store its dimensions.

Lines 51-54 duplicate our

frame
  for
output
  purposes and then preprocess it for classification. The preprocessing steps are, and must be, the same as those that we performed for training.

Now let’s make a natural disaster prediction on the frame:

# make predictions on the frame and then update the predictions
	# queue
	preds = model.predict(np.expand_dims(frame, axis=0))[0]
	Q.append(preds)

	# perform prediction averaging over the current history of
	# previous predictions
	results = np.array(Q).mean(axis=0)
	i = np.argmax(results)
	label = config.CLASSES[i]

Lines 58 and 59 perform inference and add the predictions to our queue.

Line 63 performs a rolling average prediction of the predictions available in the

Q
 .

Lines 64 and 65 then extract the highest probability class label so that we can annotate our frame:

# draw the activity on the output frame
	text = "activity: {}".format(label)
	cv2.putText(output, text, (35, 50), cv2.FONT_HERSHEY_SIMPLEX,
		1.25, (0, 255, 0), 5)
 
	# check if the video writer is None
	if writer is None:
		# initialize our video writer
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 30,
			(W, H), True)
 
	# write the output frame to disk
	writer.write(output)
 
	# check to see if we should display the output frame to our
	# screen
	if args["display"] > 0:
		# show the output image
		cv2.imshow("Output", output)
		key = cv2.waitKey(1) & 0xFF
	 
		# if the `q` key was pressed, break from the loop
		if key == ord("q"):
			break
 
# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

Lines 68-70 annotate the natural disaster activity in the corner of the

output
  frame.

Lines 73-80 handle writing the

output
  frame to a video file.

If the

--display
  flag is set, Lines 84-91 display the frame to the screen and capture keypresses.

Otherwise, processing continues until completion at which point the loop is finished and we perform cleanup (Lines 95 and 96).

Predicting natural disasters with Keras

For the purposes of this tutorial, I downloaded example natural disaster videos via YouTube — the exact videos are listed in the “Credits” section below. You can either use your own example videos or download the videos via the credits list.

Either way, make sure you have used the “Downloads” section of this tutorial to download the source code and pre-trained natural disaster prediction model.

Once downloaded you can use the following command to launch the

predict.py
script:
$ python predict.py --input videos/terrific_natural_disasters_compilation.mp4 \
	--output output/natural_disasters_output.avi
[INFO] processing video...
[INFO] cleaning up...

Here you can see a sample result of our model correctly classifying this video clip as “flood”:

Figure 7: Natural disaster “flood” classification with Keras and Deep Learning.

The following example comes from the 2016 Fort McMurray wildfire:

Figure 8: Detecting “wildfires” and other natural disasters with Keras, deep learning, and computer vision.

For fun, I then tried applying the natural disaster detector to the movie San Andreas (2015):

Figure 9: Detecting “earthquake” damage with Keras, deep learning, and Python.

Notice how our model was able to correctly label the video clip as an (overly dramatized) earthquake.

You can find a full demo video below:

Where to next?

Figure 10: My deep learning book is the go-to resource for deep learning students, developers, researchers, and hobbyists, alike. Use the book to build your skillset from the bottom up, or read it to gain a deeper understanding. Don’t be left in the dust as the fast paced AI revolution continues to accelerate.

Today’s tutorial helped us solve a real-world classification problem for classifying natural disaster videos.

Such an application could be:

  • Deployed along riverbeds and streams to monitor water levels and detect floods early.
  • Utilized by park rangers to monitor for wildfires.
  • Employed by meteorologists to automatically detect hurricanes/cyclones.
  • Used by television news companies to sort their archives of video footage.

We created our natural disaster detector by utilizing a number of important deep learning techniques:

  • Fine-tuning a Convolutional Neural Network that was trained on ImageNet
  • Using a Keras Learning Rate Finder
  • Implementing the Cyclical Learning Rate callback into our training process to improve model accuracy
  • Performing video classification with a rolling frame classification average approach

Admittedly, these are advanced concepts in the realm of deep learning and computer vision. If you have your own real-world project you’re trying to solve, you need a strong deep learning foundation in addition to familiarity with advanced concepts.

To jumpstart your education, including discovering my tips, suggestions, and best practices when training deep neural networks, be sure to refer to my book, Deep Learning for Computer Vision with Python.

Inside the book I cover:

  1. Deep learning fundamentals and theory without unnecessary mathematical fluff. I present the basic equations and back them up with code walkthroughs that you can implement and easily understand. You don’t need a degree in advanced mathematics to understand this book.
  2. More details on learning rates, tuning them, and how a solid understanding of the concept dramatically impacts the accuracy of your model.
  3. How to spot underfitting and overfitting on-the-fly, saving you days of training time.
  4. How to perform fine-tuning on pre-trained models, which is often the place I start to obtain a baseline result to beat.
  5. My tips/tricks, suggestions, and best practices for training CNNs.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Credits

Dataset curator:

Video sources for the demo:

Audio for the demo video:

Summary

In this tutorial, you learned how to use computer vision and the Keras deep learning library to automatically detect natural disasters from images.

To create our natural disaster detector we fine-tuned VGG16 (pre-trained on ImageNet) on a dataset of 4,428 images belonging to four classes:

  • Cyclone/hurricane
  • Earthquake
  • Flood
  • Wildfire

After our model was trained we evaluated it on the testing set, finding that it obtained 95% classification accuracy.

Using this model you can continue to perform research in natural disaster detection, ultimately helping save lives and reduce injury.

I hope you enjoyed this post!

To download the source code to the post (and be notified when future tutorials are published on PyImageSearch), just enter your email address in the form below.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

The post Detecting Natural Disasters with Keras and Deep Learning appeared first on PyImageSearch.

Viewing all 432 articles
Browse latest View live