Instructor Note: The code in this notebook is in the file MNIST_CNN.ipynb
in the student downloads
MNIST
database of handwritten digitstf_env
Anaconda environmentch15
examples folderfrom tensorflow.keras.datasets import mnist
load_data
function loads training and testing sets(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train
), training set labels (y_train
), testing set images (X_test
) and testing set labels (y_test
):X_train.shape
y_train.shape
X_test.shape
y_test.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
# sns.set(font_scale=2)
import numpy as np
index = np.random.choice(np.arange(len(X_train)), 24, replace=False) # 24 indices
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(16, 9))
for item in zip(axes.ravel(), X_train[index], y_train[index]):
axes, image, target = item
axes.imshow(image, cmap=plt.cm.gray_r)
axes.set_xticks([]) # remove x-axis tick marks
axes.set_yticks([]) # remove y-axis tick marks
axes.set_title(target)
plt.tight_layout()
(
width,
height,
channels)
(28, 28, 1)
reshape
receives a tuple representing the new shapeX_train = X_train.reshape((60000, 28, 28, 1))
X_train.shape
X_test = X_test.reshape((10000, 28, 28, 1))
X_test.shape
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
tensorflow.keras.utils
function to_categorical
performs one-hot encodingy_train
and y_test
into two-dimensional arrays of categorical datafrom tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train)
y_train.shape
y_train[0] # one sample’s categorical data
y_test = to_categorical(y_test)
y_test.shape
Sequential
model stacks layers to execute sequentiallyfrom tensorflow.keras.models import Sequential
cnn = Sequential()
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
Conv2D
implements the convolution layercnn.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',
input_shape=(28, 28, 1)))
filters=64
—The number of filters in the resulting feature map.kernel_size=(3, 3)
—The size of the kernel used in each filteractivation='relu'
—Rectified Linear Unit activation function is used to produce this layer’s outputinput_shape=(28, 28,1)
Conv2D
layer, which is actually the first hidden layerinput_shape
from previous layer’s output shapez
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
Flatten
layer's output will be 1-by-3200 (5 × 5 × 128)cnn.add(Flatten())
Flatten
layer learned digit featuresDense
layersDense
layer creates 128 neurons (units
) that learn from the 3200 outputs of the previous layercnn.add(Dense(units=128, activation='relu'))
Dense
layer Dense
layers, commonly with 4096 neuronsDense
layer classifies inputs into neurons representing the classes 0-9softmax
activation function converts values of these 10 neurons into classification probabilitiescnn.add(Dense(units=10, activation='softmax'))
summary
methodOutput Shape
column, None
means the model does not know in advance how many training samples you’re going to providecnn.summary()
plot_model
function from module tensorflow.keras.utils
from tensorflow.keras.utils import plot_model
from IPython.display import Image
plot_model(cnn, to_file='convnet.png', show_shapes=True,
show_layer_names=True)
Image(filename='convnet.png') # display resulting image in notebook
compile
methodcnn.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
optimizer='adam'
—The optimizer this model uses to adjust the weights throughout the neural network as it learns'adam'
performs well across a wide variety of models [1],[2]loss='categorical_crossentropy'
—The loss function used by the optimizer in multi-classification networks (ours predicts 10 classes)'binary_crossentropy'
, and for regression, 'mean_squared_error'
metrics=['accuracy']
—List of metrics the network will produce to help you evaluate the modelfit
methodcnn.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
epochs=5
—train neural networks iteratively over timeepoch
processes every training dataset sample oncebatch_size=64
—number of samples to process at a timevalidation_split=0.1
—model should reserve the last 10% of the training samples for validation fit
method’s hyperparameters, or possibly change the layer composition of your modelvalidation_data
argument cnn.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
fit
shows the progress of each epoch, how long the epoch took to execute, and the evaluation metrics for that epochacc
) and validation accurracy (acc
), given that we have not yet tried to tune the hyperparameters or tweak the number and types of the layersevaluate
Method¶loss, accuracy = cnn.evaluate(X_test, y_test)
loss
accuracy
predict
Method¶predictions = cnn.predict(X_test)
1.
at index 7)y_test[0]
predict
for first test samplefor index, probability in enumerate(predictions[0]):
print(f'{index}: {probability:.10%}')
predictions[0]
to the index of the element containing 1.0
in y_test[0]
(28, 28, 1)
that Keras required for learning back to (28, 28)
, which Matplotlib requires to display the imagesimages = X_test.reshape((10000, 28, 28))
incorrect_predictions = []
p
is the predicted value array, and e
is the expected value arrayargmax
function determines index of an array’s highest valued elementfor i, (p, e) in enumerate(zip(predictions, y_test)):
predicted, expected = np.argmax(p), np.argmax(e)
if predicted != expected: # prediction was incorrect
incorrect_predictions.append(
(i, images[i], predicted, expected))
len(incorrect_predictions) # number of incorrect predictions
p
) and expected value (e
)figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(16, 12))
for axes, item in zip(axes.ravel(), incorrect_predictions):
index, image, predicted, expected = item
axes.imshow(image, cmap=plt.cm.gray_r)
axes.set_xticks([]) # remove x-axis tick marks
axes.set_yticks([]) # remove y-axis tick marks
axes.set_title(f'index: {index}\np: {predicted}; e: {expected}')
plt.tight_layout()
def display_probabilities(prediction):
for index, probability in enumerate(prediction):
print(f'{index}: {probability:.10%}')
display_probabilities(predictions[340])
display_probabilities(predictions[740])
display_probabilities(predictions[1260])
https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751), [2]
cnn.save('mnist_cnn.h5')
from tensorflow.keras.models import load_model cnn = load_model('mnist_cnn.h5')
predict
to make additional predictions on new datafit
to train with additional data©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.