Note: this was first published in early 2019. Between then and early 2020 Keras and/or TensorFlow changed their default configurations, and the neural net built in this post now offers radically better accuracy even before the manual tuning described in Step 3. I’ve left the post unaltered because tuning is an important concept to understand even if it’s no longer strictly needed for this example. The pace with which neural net libraries are improving is mind-blowing!
Introduction
Remember the Fisher-Price “My First…” toys from the 70s and 80s? They were super-simple versions of common toys or household objects.
Let’s build a Fisher-Price-style My First Neural Net: the simplest possible piece of software that qualifies as a full-fledged neural net. Even though it will be as stripped down as possible, it will be capable of actual classification work just like an industrial-grade neural net.
Although no experience building neural nets is required to get the code up and running, this project will make more sense if you have some understanding of the basic concepts of neural nets:
- nodes
- weighted connections between nodes
- hidden layers
- output layers
- prediction through forward-propagation
- training through back-propagation
If you need an introduction or refresher to any of these, see the first section of Wikipedia’s Artificial Neural Net entry.
It will also help to be familiar with the basics of Python, and with writing and running Python scripts in an editor and the command line, in iPython, or in a Jupyter notebook.
This rest of this post will lead you through three tasks:
- Setting up our development environment
- Building the simplest neural net possible
- Making the neural net more accurate
The code for making our Fisher-Price-style My First Neural Net is spread throughout this post, but it’s also presented at the end of Steps 2 and 3 to make it easy to copy onto your own computer.
Step 1: Setting up our development environment
I’ll assume that you’re starting from a reasonably clean Linux or macOS machine. I haven’t tested these steps on Windows, but they’ll probably work with very few modifications.
This step collects the lego blocks that can be snapped together to make a neural net. We won’t actually assemble those lego blocks until Step 2.
a. Install Miniconda
Thanks to the magic of the Miniconda package manager for Python, setting up our development environment is trivial.
Use the official instructions to install Miniconda.
Note that we could use Miniconda’s more full-featured cousin Anaconda instead, but Miniconda does everything we need and its stripped-down feature set makes it easier to use.
b. Make a new Python environment
In a terminal, use Miniconda to make a new Python environment to play around in so we don’t corrupt the rest of our system.
$ conda create -n fisher-price $ conda activate fisher-price
Accept the default values for any prompts.
c. Install the Keras neural net library
We’ll build our neural net using the Python Keras library, which is a user-friendly wrapper on top of Google’s TensorFlow library. As of early 2019, Keras is probably the most accessible neural net package for newbies. It’s mature, robust, and has reasonable documentation. Install it with a single command in the terminal.
$ conda install keras
Answer the default values for any prompts, watch Miniconda install twenty or so dependencies, and we’re done!
Step 2: Building the simplest neural net possible
First, an important note on the accuracy rates discussed in this post. When we create a new neural net, all of its weights are set to random values. As we train it, those weights change and ultimately converge on values that give the neural net its predictive power. But because different neural net instances are initialized with different random weights, even if we train both of them with the same data, they’ll end up with slightly different final weights and will have slightly different predictive accuracy. That’s just the nature of the beast when it comes to neural nets.
This means that if you run this code in your own environment, you can expect similar, but not identical accuracy to what I see in mine.
Now, let’s see how quickly we can make a real neural net with a Python script. Thanks to Keras, this takes remarkably little code. Let’s walk through each line.
At the very top of a new Python script, load Keras.
import keras
Next, load some data to train our neural net to train on. The most common “hello world” dataset for learning about neural nets is MNIST, which comprises grayscale images of handwritten digits. A neural net can be trained to categorize each picture as a handwritten 0, 1, 2, or whatever. The MNIST dataset provides 60,000 images to train the neural net and 10,000 images to test the neural net’s accuracy.
But you don’t want to use the same dataset every other new data scientist uses! Instead, let’s use the Fashion MNIST dataset. This is exactly the same as MNIST in format (same number of pictures, same size of pictures, same grayscale), but consists of articles of clothing instead of digits. A neural net trained on Fashion MNIST learns to identify what category of clothing (t-shirt, dress, handbag, etc.) each picture belongs to.
Keras provides a helper method for importing the Fashion MNIST training data. We’re actually importing four separate sets of data with one command:
- images to train the neural net on
- category labels for those training images (“t-shirt”, “dress”, etc.)
- images to test the neural net with once it’s been trained
- category labels for those test images
Add this code to our script:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
Python represents the training data as a three-dimensional matrix of size 60,000 x 28 x 28. To feed this data to our neural net, we have to first convert it into a two-dimensional matrix of size 60,000 x 784 (note that 784 = 28 x 28). The technical reasons that we need to “reshape” the data are unimportant for this post—it’s just easier for Keras to process that way.
We need to do exactly the same thing for the 10,000 images that we’ll use for testing the trained neural net.
Here’s more code to add. (For the rest of this post, any code presented is meant to be added to our script.)
train_images = train_images.reshape(60000, 784) test_images = test_images.reshape(10000, 784)
Now let’s create what’s called the “model.” This is the neural net itself, with nodes arranged in layers and weighted interconnections between various nodes.
We’ll use Keras to make a sequential model, which is the simplest kind of neural net. In a sequential model, signals flow from input nodes to one or more layers of hidden nodes and finally to output nodes. This model doesn’t include any fancy bells or whistles: it’s a plain vanilla neural net architecture.
neural_net = keras.models.Sequential()
It’s time to add some layers of nodes to our network.
First, let’s add a hidden layer of 100 nodes.
neural_net.add(keras.layers.Dense(100, input_dim=784))
There’s a lot going on in this line, so let’s step through it.
Dense is a type of layer that connects every node that it contains to every node in the next layer. This is the simplest and most common kind of layer. It’s the basic building block of most neural nets.
100 specifies the number of nodes in this layer. I picked this number more or less at random; we’ll experiment with tweaking it later.
input_dim specifies how many pieces of data will feed into the neural net. Typically only the first layer in a neural net uses this parameter. The images that serve as input to our neural net are 28 pixels by 28 pixels. Keras automatically flattens this two-dimensional image data into a 1-dimensional list of 784 numbers and feeds that list to each of the 100 nodes in this layer. Each of these 784 numbers ranges from 0-255, representing grayscale values.
neural_net.add(keras.layers.Dense(10, activation='softmax'))
This second layer is the last layer in our network, so will serve as the neural net’s output layer. Output layers are traditionally dense. I’m not sure why this is so considering that there are no more layers after it for it to connect to, but this seems to be standard practice.
This layer has ten nodes, each representing one clothing category label (t-shirt, dress, etc.).
The activation
parameter scales the weights of the nodes in this layer so they all add up to 1.0. This means that if the neural net thinks a given image is a dress, the output node that represents “dress” might have an activation value of 0.9, whereas the output nodes that represent “handbag” and “t-shirt” might have activation values of 0.05.
Next we “compile” the network, which converts it from a blueprint of a network into a runnable network. We also pass a few “hyperparameters” to the network, which tune its operation.
neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
The optimizer
parameter tells the network what algorithm to use when training. “Adam” is a general-purpose algorithm that’s usually a good place to start.
The loss
parameter tells the network how to measure its accuracy, which not only makes it possible for us to understand how successful the training has been, but also helps with the training process itself. The exact details are unimportant, but “sparse categorical cross-entropy” works well for this type of classification task.
The final parameter, metrics
, is optional. The way we’re using it here gives us ongoing reports of how the network’s accuracy improves as it is trained.
Now that we’ve specified and compiled the network, we need to train it using the Fashion MNIST training data. Once again, Keras makes this simple.
Although a modern CPU should be able to run this in less than a minute, this step could take much longer (hours or even days) if we had a more complicated neural net or more training data.
neural_net.fit(train_images, train_labels)
Now it’s time for the big payoff! Let’s feed test data to the trained neural net and see how accurately it classifies fashion images it’s never seen before.
print(neural_net.evaluate(test_images, test_labels))
Unless you’re following along in Jupyter or iPython, you’ll need to run the whole script at this point.
$ python neural-net.py # replace "neural-net.py" with your Python script name
Your output should look similar to this:
Epoch 1/1 60000/60000 [==============================] - 5s 78us/step - loss: 1.0321 - acc: 0.6291 10000/10000 [==============================] - 0s 25us/step [14.50628568725586, 0.1]
This is cryptic, but the important part is the second number in the last line of output: 0.1. That means that our trained neural net correctly identified… 10% of the fashion images in the test set.
The bad news: that’s pathetic.
The good news: there are several simple ways to modify our basic neural net to improve its accuracy. We’ll see how high we can get the accuracy with some simple tweaks.
But first, let’s catch our breath and take a look at all the code we have so far, all in one place.
# load the neural net library import keras # load the images and category labels we'll use to train the neural net, and then to test its accuracy (train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data() # flatten the images from two-dimensional arrays into one-dimensional arrays train_images = train_images.reshape(60000, 784) test_images = test_images.reshape(10000, 784) # specify the neural net's architecture: one hidden layer and one output layer neural_net = keras.models.Sequential() neural_net.add(keras.layers.Dense(100, input_dim=784)) neural_net.add(keras.layers.Dense(10, activation='softmax')) # convert the neural net blueprint into a runnable neural net neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # using our dataset of training images, train the neural net by adjusting the weights between its connections neural_net.fit(train_images, train_labels) # using our dataset of test images, check our neural net's accuracy print(neural_net.evaluate(test_images, test_labels))
Step 3: Making the neural net more accurate
For complicated technical reasons, neural nets often improve dramatically when each of their layers has something called an activation function, which modifies the activation level of each node. Remember that our neural net has two layers. We’ve already specified the softmax activation function for the second (output) layer, but we didn’t specify an activation function for the first (hidden) layer. Let’s add one to the first layer and see if that gives us better results. We’ll use tanh, a common activation function, and a good default option. I’ve highlighted the changed line below.
Replace this line:
neural_net.add(keras.layers.Dense(100, input_dim=784))
With this:
neural_net.add(keras.layers.Dense(100, input_dim=784, activation='tanh'))
Let’s re-run and see how it does.
Epoch 1/1 60000/60000 [==============================] - 5s 78us/step - loss: 1.0321 - acc: 0.6291 10000/10000 [==============================] - 0s 28us/step [0.9260841958999634, 0.6585]
Wow. Just adding the tanh activation function to the hidden layer catapulted accuracy to 66%!
Here’s something else to try: neural nets often work best when each piece of input data (in this case, the grayscale value of each pixel in an image) ranges from 0 to 1.0. Currently our image data ranges from 0 to 255. Let’s scale our data (for both the training and test images) so it fits in the 0 – 1.0 range, and see how that affects our accuracy.
Right after this line:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
Add these lines:
train_images = train_images / 255.0 test_images = test_images / 255.0
And re-run.
Epoch 1/1 60000/60000 [==============================] - 5s 82us/step - loss: 0.4803 - acc: 0.8281 10000/10000 [==============================] - 0s 31us/step [0.4426993363380432, 0.8401]
Better still: we’re up to 84%!
Next, let’s see what happens if we train the system on the training images not just once, but multiple times. Since training adjusts the weights of the connections only a little bit with each batch of training data, maybe it will continue to improve its accuracy if we let it take several passes at the same set of training data.
We do this by using the epochs parameter during training. One epoch is a single pass through all the training data, so specifying five epochs means we’ll train the neural net on the same training data five times.
Replace this line:
neural_net.fit(train_images, train_labels)
With this:
neural_net.fit(train_images, train_labels, epochs=5)
And re-run.
Epoch 1/5 60000/60000 [==============================] - 5s 83us/step - loss: 0.4766 - acc: 0.8290 Epoch 2/5 60000/60000 [==============================] - 5s 79us/step - loss: 0.3699 - acc: 0.8661 Epoch 3/5 60000/60000 [==============================] - 5s 80us/step - loss: 0.3374 - acc: 0.8765 Epoch 4/5 60000/60000 [==============================] - 5s 81us/step - loss: 0.3138 - acc: 0.8854 Epoch 5/5 60000/60000 [==============================] - 5s 80us/step - loss: 0.2966 - acc: 0.8910 10000/10000 [==============================] - 0s 33us/step [0.36925499482154844, 0.8619]
More improvement: we’re at 86%, meaning that our neural net can correctly identify more than eight out of ten fashion images from the test set of 10,000 images.
It’s important to understand that the higher the accuracy gets, the harder it becomes to eke out even better accuracy, and the more important even tiny gains become. That’s why a 2% improvement from 84% to 86% is nothing to sneeze at.
Let’s declare victory and stop here.
In the interest of science, I did try a few other tweaks, but none of them improved the accuracy of our Fisher-Price neural net above what we’ve already achieved:
- more nodes in the hidden layer
- more hidden layers
- different activation functions
- more training epochs
- different training batch sizes (where batch size is the number of images we feed through the neural net during training before adjusting the weights of the neural net’s connections)
I suspect we could get even better accuracy with a more complicated neural net architecture, but that’s a topic for another blog post.
Here’s all the code we ended up with after tuning.
# load the neural net library import keras # load the images and category labels we'll use to train the neural net, and then to test its accuracy (train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data() train_images = train_images / 255.0 test_images = test_images / 255.0 # flatten the images from two-dimensional arrays into one-dimensional arrays train_images = train_images.reshape(60000, 784) test_images = test_images.reshape(10000, 784) # specify the neural net's architecture: one hidden layer and one output layer neural_net = keras.models.Sequential() neural_net.add(keras.layers.Dense(100, input_dim=784, activation='tanh')) neural_net.add(keras.layers.Dense(10, activation='softmax')) # convert the neural net blueprint into a runnable neural net neural_net.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # using our dataset of training images, train the neural net by adjusting the weights between its connections neural_net.fit(train_images, train_labels, epochs=5) # using our dataset of test images, check our neural net's accuracy print(neural_net.evaluate(test_images, test_labels))
Conclusion
Think about what we just did: with a dozen lines of Python code we created a neural net that categorizes pictures of clothing with 86% accuracy. This would have been unthinkable just ten years ago.
Although none of the concepts involved with neural nets are terribly difficult to understand, there are a bewildering number of ways you can build and tune a neural net to perform optimally for your particular categorization or prediction task. One unusual aspect of neural net engineering is that it’s as much art as it is science. In many cases, we don’t fully understand why certain neural net architectures or tunings perform better than others. The standard neural net development workflow consists of starting with a good general-purpose architecture and a best-guess set of hyperparameters, and then experimenting with variations as you watch the system’s accuracy move up and down. Once you hit on a combination that achieves the accuracy you need, you’re done. It’s hard to think of another branch of computer science that works in exactly this way (although maybe performance optimization comes close).
If you want to explore further, I recommend the official Keras documentation and tutorials, or the excellent book Deep Learning with Python by Francois Chollet, the lead designer of Keras.