Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report
Extra: visualize some layers of the neural netwotk

Rubric Points

Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

1. Data Set Exploration

Summary

I used python to calculate summary statistics of the traffic signs data set:

The size of training set is 34,799
The size of the validation set is 4,410
The size of test set is 12,630
The shape of a traffic sign image is (32, 32, 3) -> 32x32 pixels with 3 color channels
The number of unique classes/labels in the data set is 43

Visualization

Here is an exploratory visualization of the data set. I used numpy and matplotlib to calculate the statistics of the data and visualize it. The following image is a bar chart showing the fraction of images of each traffic-sign class out of the total images it the relevant set. For example: traffic-sign number 0: the training set contains ~0.005 = 0.5% images of this traffic-sign. The green, red and blue bars represent the training, validation and testing sets correspondingly.

It is clear that the fractions of each class is simillar in all sets. However, there are classes that apears a lot more compared to others. This will probably affect the trained neural network to identify the larger classes with greater success.

2. Design and Test a Model Architecture

Image processing

In this secion I'll describe the image processing method I used. I'll show an example of 5 processed images. Here are the he original images:

Grayscale and Normalization First, I converted all images to grayscale. After experimenting with both archituctures for 3 color channels and grayscale channel, I concluded that there is no benefit for using the 3 color channels as input. The predictions of the validation set weren't improved for the 3 color channels as input. Furthermore, as written in this paper, the grayscale images teilded better results. In order to convert to grayscale, I converted the images to YUV fromat and used the Y value. Second, I normalized the images to the values between -1 and 1. This was done in order to prevent scaling factors between different data samples. It also helps preventing numerical errors and making the traning process faster. Here are the 5 grayscaled and normalized images:

Generating Additional Data (was not implemented) This paper suggests to add more training data by using the current data and applying some processing such as: translating, rotating, scaling and adding noise to the grayscaled images. This is done so the neural network will be robust and won't be affected by these characteristics. I tried using this approach but it yeilded no better results. Insted I processed the training data similarly to processing of the added data.

Translating, Rotating, Scaling and Adding Noise The training images were translated, rotated, scaled and added noise to. The scale of translation, rotation, scale and added noise was randomized with a truncated gaussian distribution. The max values were chosen similarly to the values used in the paper.

The translation max is 3px (width and height each) with std of 1
The rotaion max is 15deg with std of 5
The scaling max is 1+-0.1 with std of 0.04
The noise wasn't truncated, with std of 0.01

Here are the processed images:

Model Architecture

My final model consisted of the following layers:

Layer	Description
Input	32x32x1 grayscale image
Convolution 5x5	1x1 stride, valid padding, outputs 28x28x16
Pooling 2x2	2x2 stride, valid padding, outputs 14x14x16
RELU	outputs 14x14x16
Convolution 5x5	1x1 stride, valid padding, outputs 10x10x32
RELU	outputs 10x10x32
Perception	Combination of: 1x1 conv, pooling with 1x1 conv, 1x1 conv followed by 6x6 conv, 1x1 conv followed by 2x2 conv. outputs 5x5x64
Flattening	width 1600
Fully connected	width 533
Fully connected with dropout = 0.5	width 266
Fully connected with dropout = 0.5	width 43
SoftMax	width 43

I chose this model (also using the perception model) after drawing inspiration from google's papaer. I tried several archituctures with different depth of layers, and this one performed the best for this task.

Training Process

To train the model, I used the one hot encoding method, and then calculated the cross entropy. Then the neural network's weights and biases were optimized using the Adam algorithm optimizer. The parameters for the training process were chosen after trial and error as follows:

batch size: 512
Number of epochs: 10
Learning rate: 0.001
Dropout (where mentiond): 0.5

Solution Approach

First, I looked into this paper paper and used its stated guidlines. However, the neural network architucture was chosen as a combination af LaNet with a perception layer (the perception laer was inspired from this paper: google's papaer). I tried several other archituctures until setteling on the current one, examaning them by the training and validation accuracy. I started with LaNet; then a similar architucture to the first paper; then I tried the current achitucture; then I tried using the RGB data as input (3 feature maps insted of one); and lastly I went back to the current architucture. At first, the architucture training set loss was not good enough, so I added more depth (more feature maps). Now the neural network was slightly over fitting the training data, so I added the dropout step which solved the problem. After choosing the architucture, I further tuned the parameters of the training process such that good results were obtained without over fitting the training data. After achieving satisfying results in the training process, the test data was evaluated.

My final model results were:

training set loss of 0.017
validation set accuracy of 0.967
test set accuracy of 0.949

3. Test a Model on New Images

Chosen 5 images

Here are five German traffic signs that I found on the web:

Each image has some difficulties to be classified as follows:

stop sign: the angle of the sign
roundabout: noisy image
road work: the angle of the sign
no entry: the text in the middle of the sign
double curve: not an original german traffic sign

The Model's Predictions

Here are the results of the prediction:

Image	Prediction
Stop sign	Stop sign
Round about	Priority road
Road work	Road work
No entry	No entry
Double curve	Double curve

The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. This compares similarly to the accuracy on the test set, but it cannot be compared since this set is very small and is not representative.

Model's Certainty

For the first image, the model is not quite sure that this is a stop sign (probability of 0.616), and the image does contain a stop sign. The top five soft max probabilities were

Probability	Prediction
.616	Stop sign
.384	Speed limit (30km/h)
.0	Turn left ahead
.0	Turn right ahead
.0	Roundabout

For the second image, the model is not quite sure that this is a priority road (probability of 0.597), but the image contains a roundabout sign. The second probability (0.388) was of a roundabout. The top five soft max probabilities were

Probability	Prediction
.597	Priority Road
.388	Roundabout
.011	Speed limit (100km/h)
.002	Speed limit (50km/h)
.001	Yeild

For the third image, the model is very sure that this is a road work sign (probability of 0.989), and the image does contain a road work sign. The top five soft max probabilities were

Probability	Prediction
.989	Road work
.007	Bicycles crossing
.004	Bumpy road
.0	Road nurrows on the right
.0	Wild animals crossing

For the fourth image, the model is very sure that this is a no entry sign (probability of 1.0), and the image does contain a no entry sign. The top five soft max probabilities were

Probability	Prediction
1.	No entry
.0	Stop sign
.0	Turn left ahead
.0	Turn right ahead
.0	Roundabout

For the fifth image, the model is very sure that this is a double curve sign (probability of 0.913), and the image does contain a double curve sign. The top five soft max probabilities were

Probability	Prediction
0.913	double curve
.083	Road nurrows on the right
.003	Dangerous curve to the left
.0	Wild animals crossing
.0	General caution

Visualizing the Neural Network

The first Convolution Layer

No Entry Sign The results for the no entry sign from the 5 images: Road Work Sign The results for the road work sign from the 5 images: It seems that the first layer looks for outlines. It recognizes staright and curved lines in the image. One can clearly see the circle and middle line for the no entry sign, and the triangle lines of the road work sign.

The second Convolution Layer

No Entry Sign The results for the no entry sign from the 5 images: Road Work Sign The results for the road work sign from the 5 images:

It is less clear what does the second convolution layer recognizes, but it seems to be looking for some features inside the sign. For the no entry sing, there are some clear contrast images of the midle line and surroundings. For the road work sign, it seems to emphasize the working man in the middle of the sign in some feature maps.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
examples		examples
images		images
new_images		new_images
savings		savings
README.md		README.md
Traffic_Sign_Classifier.html		Traffic_Sign_Classifier.html
Traffic_Sign_Classifier.ipynb		Traffic_Sign_Classifier.ipynb
signnames.csv		signnames.csv
visualize_cnn.png		visualize_cnn.png
writeup.md		writeup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Traffic Sign Recognition

Rubric Points

Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

1. Data Set Exploration

Summary

Visualization

2. Design and Test a Model Architecture

Image processing

Model Architecture

Training Process

Solution Approach

3. Test a Model on New Images

Chosen 5 images

The Model's Predictions

Model's Certainty

Visualizing the Neural Network

The first Convolution Layer

The second Convolution Layer

About

Releases

Packages

Contributors 13

Languages

ManMan88/SDCND_term1_proj2

Folders and files

Latest commit

History

Repository files navigation

Traffic Sign Recognition

Rubric Points

Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

1. Data Set Exploration

Summary

Visualization

2. Design and Test a Model Architecture

Image processing

Model Architecture

Training Process

Solution Approach

3. Test a Model on New Images

Chosen 5 images

The Model's Predictions

Model's Certainty

Visualizing the Neural Network

The first Convolution Layer

The second Convolution Layer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages