TensorFlow is inevitably the package to use for Deep Learning, if you are doing any sort of business. Keras is the standard API in TensorFlow and the easiest way to implement neural networks. Deployment is much easier, compared to PyTorch – so unless you are doing research, TensorFlow is most likely the way to go.

And even then, you should go with TensorFlow because your models will be easier for the industry to adopt in production.

The most important parts of this article is at the end, so stick around! I will show you how to use TensorFlow functions and also how to make a custom training and testing class.

TensorFlow also seem to be much more popular than PyTorch:

It's possible to find all the documentation for TensorFlow on this link.

This article is possible to follow solely based on the Colab notebook provided here. I wish for you to comment on this post if there is any confusion. With that said, let's jump right into TensorFlow version 2.0.

Table of Contents (Click To Scroll)

  1. New Features in TensorFlow 2.0
  2. Verify Eager Execution and Find GPU Devices
  3. Common Use Operations
  4. Linear Algebra Operations
  5. Calculating Gradients with Gradient Tape
  6. Functions in TensorFlow with tf.function
  7. Custom Train and Test Functions for Neural Network

New Features in TensorFlow 2.0

TensorFlow 2.0 is mostly a marketing move and some cleanup in the TensorFlow API. Nevertheless, whenever you consider doing deep learning and want to deploy a model, you will find yourself using TensorFlow.

Let's start off with a simple way to install / upgrade both the CPU and GPU version of TensorFlow in one line of code. This is not default in the popular Google Colab app yet, but it's rumored to arrive soon.

!pip install --upgrade tensorflow-gpu

All of the upcoming code in this article presumes that you have imported the tensorflow package in your Python program.

import tensorflow as tf

You should verify that you are running the correct version, TensorFlow 2.0, by the first line of code. All it does is call __version__ from TensorFlow.

print(('Your TensorFlow version: {0}').format(tf.__version__))
> Your TensorFlow version: 2.0.0

Default Eager Execution

Eager execution means that the interpreter executes line by line, making it much better at and faster when debugging. There is also some cleanup in how graphs are made, which makes it fairly simple – in previous TensorFlow versions, you needed to manually make a graph.

This is actually huge, because you reduce the training code from this.

with tf.Session() as session:
  session.run(tf.global_variables_initializer())
  session.run(tf.tables_initializer())
  model.fit(X_train, Y_train, 
            validation_data=(X_val, Y_val),
            epochs=50, batch_size=32)

To that.

model.fit(X_train, Y_train, 
          validation_data=(X_val, Y_val),
          epochs=50, batch_size=32)

There is no need for sessions or any of those TensorFlow variables, this is just regular Python code executing. It's nice.

Here is the official word on the new version of TensorFlow with regards to Eager Execution:

TensorFlow 1.X requires users to manually stitch together an abstract syntax tree (the graph) by making tf.* API calls. It then requires users to manually compile the abstract syntax tree by passing a set of output tensors and input tensors to a session.run() call. TensorFlow 2.0 executes eagerly (like Python normally does) and in 2.0, graphs and sessions should feel like implementation details.
One notable byproduct of eager execution is that tf.control_dependencies() is no longer required, as all lines of code execute in order (within a tf.function, code with side effects execute in the order written).

The new eager execution feature is actually a great move for TensorFlow, as it gets confusing when you can't immediately evaluate your code, just like in all your other Python code.

Verify Eager Execution and GPU Devices

Eager execution is this big new feature, that allows for many things, as explained earlier – but let's just make sure that we are actually running in eager execution mode.

And while we are at it, we should check for which devices we want to run our code on – after all, GPUs are way faster than CPUs when it comes to Deep Learning tasks.

Eager Execution Check

To verify whether you are running eager execution or not, I have made a small if-else statement that will tell you if you are.

  1. Are you running eager execution? And how can you turn it off, if you wish to.
  2. If you are not running eager execution, then there is a way to manually do it, or you could just try upgrading your TensorFlow version.
if(tf.executing_eagerly()):
    print('Eager execution is enabled (running operations immediately)\n')
    print(('Turn eager execution off by running: \n{0}\n{1}').format('' \
        'from tensorflow.python.framework.ops import disable_eager_execution', \
        'disable_eager_execution()'))
else:
    print('You are not running eager execution. TensorFlow version >= 2.0.0' \
          'has eager execution enabled by default.')
    print(('Turn on eager execution by running: \n\n{0}\n\nOr upgrade '\
           'your tensorflow version by running:\n\n{1}').format(
           'tf.compat.v1.enable_eager_execution()',
           '!pip install --upgrade tensorflow\n' \
           '!pip install --upgrade tensorflow-gpu'))

This should print the following, if you are running eager execution and followed this article along. If you have TensorFlow 2.0, then you are running eager execution by default.

Eager execution is enabled (running operations immediately)

Turn eager execution off by running: 
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

Using Specific Devices (GPUs/CPUs)

Let's say we are interested in knowing if we have a GPU device available – or if we know there is a GPU in our machine, we can test if TensorFlow recognizes that it exists. If not, then perhaps you should try and reinstall CUDA and cuDNN.

print(('Is your GPU available for use?\n{0}').format(
    'Yes, your GPU is available: True' if tf.test.is_gpu_available() == True else 'No, your GPU is NOT available: False'
))

print(('\nYour devices that are available:\n{0}').format(
    [device.name for device in tf.config.experimental.list_physical_devices()]
))

# A second method for getting devices:
#from tensorflow.python.client import device_lib
#print([device.name for device in device_lib.list_local_devices() if device.name != None])

My expected output would be that there should at least be a CPU available, and a GPU if you are running it in Google Colab – if no GPU shows up in Google Colab then you need to go to Edit > Notebook Settings > Hardware Accelerator and pick GPU.

As expected, we indeed have a CPU and GPU available in Google Colab:

Is your GPU available for use?
Yes, your GPU is available: True

Your devices that are available:
['/physical_device:CPU:0', '/physical_device:XLA_CPU:0', '/physical_device:XLA_GPU:0', '/physical_device:GPU:0']

Great, we know we have a GPU available called GPU:0.

But how do we explicitly use it? First, you should know that TensorFlow by default uses your GPU where it can (not every operation can use the GPU).

But if you want to be absolute certain that your code is executed on the GPU, here is a code piece comparing time spent using the CPU versus GPU.

The simple operation here is creating a constant with tf.constant and an identity matrix with tf.eye, which we will discuss later in the Linear Algebra section.

import time

cpu_slot = 0
gpu_slot = 0

# Using CPU at slot 0
with tf.device('/CPU:' + str(cpu_slot)):
    # Starting a timer
    start = time.time()

    # Doing operations on CPU
    A = tf.constant([[3, 2], [5, 2]])
    print(tf.eye(2,2))

    # Printing how long it took with CPU
    end = time.time() - start
    print(end)

# Using the GPU at slot 0
with tf.device('/GPU:' + str(gpu_slot)):
    # Starting a timer
    start = time.time()

    # Doing operations on CPU
    A = tf.constant([[3, 2], [5, 2]])
    print(tf.eye(2,2))

    # Printing how long it took with CPU
    end = time.time() - start
    print(end)

For a small operation like this, we get that the CPU version ran for $0.00235$ seconds, while the GPU version ran for $0.0018$ seconds.

tf.Tensor(
[[1. 0.]
 [0. 1.]], shape=(2, 2), dtype=float32)
0.0023527145385742188
tf.Tensor(
[[1. 0.]
 [0. 1.]], shape=(2, 2), dtype=float32)
0.0018095970153808594

Note that how long it takes will vary each time, but the GPU should always outperform in these types of tasks. We could easily imagine how much this would help us with larger computations. In particular, when there is millions/billions of operations executed on a GPU, we do see a significant speed up of neural networks – always use a GPU, if available.

Common Use Operations

Let me introduce the bread and butter of TensorFlow, the most commonly used operations. We are going to take a look at the following

  • Making tensors with tf.constant and tf.Variable
  • Concatenation of two tensors with tf.concat
  • Making tensors with tf.zeros or tf.ones
  • Reshaping data with tf.reshape
  • Casting tensors to other data types with tf.cast

How to make tensors with tf.contant and tf.Variable

Perhaps one of the simplest operations in tensorflow is making a constant or variable. You simply call the tf.constant or tf.Variable function and specify an array of arrays.

# Making a constant tensor A, that does not change
A = tf.constant([[3, 2],
                 [5, 2]])

# Making a Variable tensor VA, which can change. Notice it's .Variable
VA = tf.Variable([[3, 2],
                 [5, 2]])

# Making another tensor B
B = tf.constant([[9, 5],
                 [1, 3]])

This code piece gives us three tensors; the constant A, the variable VA and the constant B.

How to concatenate two tensors with tf.concat

Let's say that we have two tensors, perhaps it could be two observations. We want to concat the two tensors A and B into a single variable in Python – how do we do it?

We simply use the tf.concat, and specify the values and axis.

# Making a constant tensor A, that does not change
A = tf.constant([[3, 2],
                 [5, 2]])

# Making a Variable tensor VA, which can change. Notice it's .Variable
VA = tf.Variable([[3, 2],
                 [5, 2]])

# Making another tensor B
B = tf.constant([[9, 5],
                 [1, 3]])

# Concatenate columns
AB_concatenated = tf.concat(values=[A, B], axis=1)
print(('Adding B\'s columns to A:\n{0}').format(
    AB_concatenated.numpy()
))

# Concatenate rows
AB_concatenated = tf.concat(values=[A, B], axis=0)
print(('\nAdding B\'s rows to A:\n{0}').format(
    AB_concatenated.numpy()
))

The first output will be concatenating column-wise by axis=1 and the second will be concatenating row-wise by axis=0 – meaning we add the data either rightwards (columns) or downwards (rows).

Adding B's columns to A:
[[3 2 9 5]
 [5 2 1 3]]

Adding B's rows to A:
[[3 2]
 [5 2]
 [9 5]
 [1 3]]

How to make tensors with tf.zeros and tf.ones

Creating tensors with just tf.constant and tf.Variable can be tedious if you want to create big tensors. Imagine you want to create random noise – well, you could do that by making a tensor with tf.zeros or tf.ones.

All we need to specify is the shape in the format shape=[rows, columns] and a dtype, if it matters at all. The number of rows and columns are arbitrary, and you could in principle create 4K images (as noise).

# Making a tensor filled with zeros. shape=[rows, columns]
tensor = tf.zeros(shape=[3, 4], dtype=tf.int32)
print(('Tensor full of zeros as int32, 3 rows and 4 columns:\n{0}').format(
    tensor.numpy()
))

# Making a tensor filled with zeros with data type of float32
tensor = tf.ones(shape=[5, 3], dtype=tf.float32)
print(('\nTensor full of ones as float32, 5 rows and 3 columns:\n{0}').format(
    tensor.numpy()
))

The output of this code piece will be the following.

Tensor full of zeros as int32, 3 rows and 4 columns:
[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

Tensor full of ones as float32, 5 rows and 3 columns:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

How to reshape data with tf.reshape

We might have generated some random noise or have a dataset of images in different sizes, which needs to be one-dimensional in order to fit into some filter or convolution.

We could use tf.reshape to reshape the images in whichever way we want. All we do here is define a tensor, and then reshape it into 8 columns with 1 row, instead of 2 columns with 4 rows.

# Making a tensor for reshaping
tensor = tf.constant([[3, 2],
                      [5, 2],
                      [9, 5],
                      [1, 3]])

# Reshaping the tensor into a shape of: shape = [rows, columns]
reshaped_tensor = tf.reshape(tensor = tensor,
                             shape = [1, 8])

print(('Tensor BEFORE reshape:\n{0}').format(
    tensor.numpy()
))
print(('\nTensor AFTER reshape:\n{0}').format(
    reshaped_tensor.numpy()
))

This produces the following result.

Tensor BEFORE reshape:
[[3 2]
 [5 2]
 [9 5]
 [1 3]]

Tensor AFTER reshape:
[[3 2 5 2 9 5 1 3]]

How to cast tensors to other data types with tf.cast

Some functions in TensorFlow and Keras requires specific data types as inputs, and we can do that with tf.cast. If you mostly have integers, you will probably find yourself casting from integer values to float values.

We can simply make a tensor with the datatype of float32. We can then cast this tensor to int, removing the comma and all decimals, while not rounding up or down.

# Making a tensor
tensor = tf.constant([[3.1, 2.8],
                      [5.2, 2.3],
                      [9.7, 5.5],
                      [1.1, 3.4]], 
                      dtype=tf.float32)

tensor_as_int = tf.cast(tensor, tf.int32)

print(('Tensor with floats:\n{0}').format(
    tensor.numpy()
))
print(('\nTensor cast from float to int (just remove the decimal, no rounding):\n{0}').format(
    tensor_as_int.numpy()
))

The output of this code piece will simply be stripping the commas from the original tensor to a new tensor without the commas – a successful conversion from float to int.

Tensor with floats:
[[3.1 2.8]
 [5.2 2.3]
 [9.7 5.5]
 [1.1 3.4]]

Tensor cast from float to int (just remove the decimal, no rounding):
[[3 2]
 [5 2]
 [9 5]
 [1 3]]

Linear Algebra Operations

Many algorithms or research needs these operations in order to implement algorithms and trying new things, e.g. making smaller changes in activation functions or optimizers. You will encounter some of these operations in my linear algebra series.

  • Transpose tensor with tf.transpose
  • Matrix Multiplication with tf.matmul
  • Element-wise multiplication with tf.multiply
  • Identity Matrix with tf.eye
  • Determinant with tf.linalg.det
  • Dot Product with tf.tensordot

How to transpose a tensor with tf.transpose

Suppose we want to do linear algebra operations, then the tf.transpose function comes in handy.

# Some Matrix A
A = tf.constant([[3, 7],
                 [1, 9]])

A = tf.transpose(A)

print(('The transposed matrix A:\n{0}').format(
    A
))

This produces $A^T$, i.e. it produces the transposed matrix of A.

The transposed matrix A:
[[3 1]
 [7 9]]

How to do matrix multiplication with tf.matmul

Many algorithms requires matrix multiplication, and this is easy in TensorFlow with the tf.matmul function.

All we do here is define two matrices (one is a vector) and use the tf.matmul function to do matrix multiplication.

# Some Matrix A
A = tf.constant([[3, 7],
                 [1, 9]])

# Some vector v
v = tf.constant([[5],
                 [2]])

# Matrix multiplication of A.v^T
Av = tf.matmul(A, v)

print(('Matrix Multiplication of A and v results in a new Tensor:\n{0}').format(
    Av
))

If you then use the tf.matmul on A and v, we get the following.

Matrix Multiplication of A and v results in a new Tensor:
[[29]
 [23]]

How to do element-wise multiplication with tf.multiply

Element-wise multiplication comes up in many instances, especially in optimizers. Reusing the tf.constants from before, such that we can compare the two, we simply use tf.multiply instead.

# Element-wise multiplication
Av = tf.multiply(A, v)

print(('Element-wise multiplication of A and v results in a new Tensor:\n{0}').format(
    Av
))

And the outcome will be the following.

Element-wise multiplication of A and v results in a new Tensor:
[[15 35]
 [ 2 18]]

How to make an identity matrix with tf.eye

In Linear Algebra, the identity matrix is simply a matrix with ones along the diagonal – and if you find the identity matrix of some matrix A, and multiply the identity matrix with A, the result will be the matrix A.

We simply define a tensor A, get the rows and columns and make an identity matrix.

# Some Matrix A
A = tf.constant([[3, 7],
                 [1, 9],
                 [2, 5]])

# Get number of dimensions
rows, columns = A.shape
print(('Get rows and columns in tensor A:\n{0} rows\n{1} columns').format(
    rows, columns
))

# Making identity matrix
A_identity = tf.eye(num_rows = rows,
                    num_columns = columns,
                    dtype = tf.int32)
print(('\nThe identity matrix of A:\n{0}').format(
    A_identity.numpy()
))

The output of the above code is the following.

Get rows and columns in tensor A:
3 rows
2 columns

The identity matrix of A:
[[1 0]
 [0 1]
 [0 0]]

How to find the determinant with tf.linalg.det

The determinant can be used to solve linear equations or capturing how the area of how matrices changes.

We make a matrix A, then cast it to float32, because the tf.linalg.det does not take integers as input. Then we just find the determinant of A.

# Reusing Matrix A
A = tf.constant([[3, 7],
                 [1, 9]])

# Determinant must be: half, float32, float64, complex64, complex128
# Thus, we cast A to the data type float32
A = tf.dtypes.cast(A, tf.float32)

# Finding the determinant of A
det_A = tf.linalg.det(A)

print(('The determinant of A:\n{0}').format(
    det_A
))

It turns out the output is around 20:

The determinant of A:
20.000001907348633

How to find the dot product with tf.tensordot

Dotting one tensor onto another is perhaps one of the most common linear algebra operations. Hence, we should at least know how to find the dot product of two tenors in TensorFlow.

We just need to instantiate two constants, and then we can dot them together – note that in this instance, tf.tensordot is the same as tf.matmul, but there are differences outside the scope of this article.

# Defining a 3x3 matrix
A = tf.constant([[32, 83, 5],
                 [17, 23, 10],
                 [75, 39, 52]])

# Defining another 3x3 matrix
B = tf.constant([[28, 57, 20],
                 [91, 10, 95],
                 [37, 13, 45]])

# Finding the dot product
dot_AB = tf.tensordot(a=A, b=B, axes=1).numpy()

print(('Dot product of A.B^T results in a new Tensor:\n{0}').format(
    dot_AB
))

# Which is the same as matrix multiplication in this instance (axes=1)
# Matrix multiplication of A and B
AB = tf.matmul(A, B)

print(('\nMatrix Multiplication of A.B^T results in a new Tensor:\n{0}').format(
    AB
))

The result is as follows, quite some big numbers as expected.

Dot product of A.B^T results in a new Tensor:
[[8634 2719 8750]
 [2939 1329 2975]
 [7573 5341 7545]]

Matrix Multiplication of A.B^T results in a new Tensor:
[[8634 2719 8750]
 [2939 1329 2975]
 [7573 5341 7545]]

Calculating Gradients

Let's make an example of the newer GELU activation function, used in OpenAI's GPT-2 and Google's BERT.

The GELU function:

$$ \text{GELU}(x) = 0.5x\left(1+\text{tanh}\left(\sqrt{2/\pi}(x+0.044715x^3)\right)\right) $$

GELU differentiated:

$$ \text{GELU}'(x) = 0.5\text{tanh}(0.0356774x^3 + 0.797885 x) + (0.0535161 x^3 + 0.398942 x) \text{sech}^2(0.0356774x^3+0.797885x)+0.5 $$

If we input $x=0.5$ into the GELU function, we get the following result:

$$ GELU'(0.5) = 0.5tanh(0.0356774*0.5^3 + 0.797885*0.5) + (0.0535161*0.5^3 + 0.398942*0.5)sech^2(0.0356774*0.5^3+0.797885*0.5)+0.5 = 0.867370 $$

When we plot the differentiated GELU function, it looks like this:

Let's just code this into an example in TensorFlow.

First, define the activation function; we chose the GELU activation function gelu(). Then we define a get_gradient() function which uses the Gradient Tape from TensorFlow.

The Gradient Tape is the important part, since it automatically differentiates and records the gradient of any operation indented under tf.GradientTape() as gt . After execution, we use the gradient tape with the gradient function gt.gradient() to retrieve the recorded gradient for the target y from the source x.

import math

def gelu(x):
    return 0.5*x*(1+tf.tanh(tf.sqrt(2/math.pi)*(x+0.044715*tf.pow(x, 3))))

def get_gradient(x, activation_function):
    with tf.GradientTape() as gt:
        y = activation_function(x)

    gradient = gt.gradient(y, x).numpy()

    return gradient

x = tf.Variable(0.5)
gradient = get_gradient(x, gelu)

print('{0} is the gradient of GELU with x={1}'.format(
    gradient, x.numpy()
))

The output will be the following – notice that the output is the same as what we calculated at the start, just with more decimals.

0.8673698902130127 is the gradient of GELU with x=0.5

Functions In TensorFlow 2.0

TensorFlow Functions with @tf.function offers a significant speedup, because TensorFlow uses AutoGraph to convert functions to graphs, which in turn runs faster.

The annotation takes the normal Python syntax and converts it into a graph – and it has minimum side effects, which means we should always use it, especially when training and testing neural network models.

All that is done here is making an image and running it through conv_layer and conv_fn, then finding the difference.

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)

@tf.function
def conv_fn(image):
  return conv_layer(image)

image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)

no_tf_fn = timeit.timeit(lambda: conv_layer(image), number=10)
with_tf_fn = timeit.timeit(lambda: conv_fn(image), number=10)
difference = no_tf_fn - with_tf_fn

print("Without tf.function: ", no_tf_fn)
print("With tf.function: ", with_tf_fn)
print("The difference: ", difference)

print("\nJust imagine when we have to do millions/billions of these calculations," \
      " then the difference will be HUGE!")
print("Difference times a billion: ", difference*1000000000)

As we can see, the difference is there. Maybe not for such few operations, but one could imagine how it scales – hint: it scales quite well.

Without tf.function:  0.005995910000024196
With tf.function:  0.005338444000017262
The difference:  0.0006574660000069343

Just imagine when we have to do millions/billions of these calculations, then the difference will be HUGE!
Difference times a billion:  657466.0000069344

Custom Train and Test Functions In TensorFlow 2.0

For this part, we are going to be following a heavily modified approach of the tutorial from tensorflow's documentation.

Remember that all of the code for this article is also available on GitHub, with a Colab link for you to run it immediately.

For the first part, we just have some imports that we need for later. We also specify that the backend should by default run float64 in layers.

from __future__ import absolute_import, division, print_function, unicode_literals

from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

tf.keras.backend.set_floatx('float64')

mnist = tf.keras.datasets.mnist

In this next snippet, all we do is load and preprocess the data.

# Load Data & Remove color channels
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimension
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

train_ds = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train)).shuffle(10000).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

Now we make a class, which starts here and each function will be described in it's separate little code piece.

If you don't know what an __init__() function does, then let me tell you it's called a constructor – a constructor runs this the code in it's function __init__ every time you instantiate (explained later) a new object of that class. The first step in TensorFlow is using the super() function, to run the superclass of the current subclass. All other code is a standard approach, we just define some variables and layers, like convolutions and dense layers. When we use the self., we assign a variable to the instance of the class, such that we can do self.conv1 in other methods, and we can do MyModel.conv1 outside the class, to access that specific variable.

class MyModel(Model):
    def __init__(self,
                 loss_object,
                 optimizer,
                 train_loss,
                 train_metric,
                 test_loss,
                 test_metric):
        '''
            Setting all the variables for our model.
        '''
        super(MyModel, self).__init__()
        self.conv1 = Conv2D(32, 3, activation='relu')
        self.flatten = Flatten()
        self.d1 = Dense(128, activation='relu')
        self.d2 = Dense(10, activation='softmax')

        self.loss_object = loss_object
        self.optimizer = optimizer
        self.train_loss = train_loss
        self.train_metric = train_metric
        self.test_loss = test_loss
        self.test_metric = test_metric

The next function is defining the architecture for our neural network, hence why it's called nn_model(). We just run through the model here when it's called with some input x. One smaller exercise, if you are just getting started out with Python/TensorFlow would be to remove the function nn_model, and provide it as an input when instantiating the class. Remember to replace references with the new name you give it.

    def nn_model(self, x):
        '''
            Defining the architecture of our model. This is where we run 
            through our whole dataset and return it, when training and 
            testing.
        '''
        x = self.conv1(x)
        x = self.flatten(x)
        x = self.d1(x)
        return self.d2(x)

Let's watch really close, lots of things are happening in the next function. First of all, we annotated the function with @tf.function for as much of a speedup as possible.

As explained earlier, the tf.GradientTape() records gradients onto a variable tape, which we can access afterwards. The training goes like this:

  1. Make predictions and call the object holding the loss function with our data and predictions. While this is happening, gradients were automatically recorded.
  2. Get the gradients from the gradient tape and apply them using the update rule from the optimizer picked (we will look at inputting these functions and variables later).
    @tf.function
    def train_step(self, images, labels):
        '''
            This is a TensorFlow function, run once for each epoch for the
            whole input. We move forward first, then calculate gradients 
            with Gradient Tape to move backwards.
        '''
        with tf.GradientTape() as tape:
            predictions = self.nn_model(images)
            loss = self.loss_object(labels, predictions)
        gradients = tape.gradient(loss, self.trainable_variables)
        optimizer.apply_gradients(zip(
                                  gradients, self.trainable_variables))

        self.train_loss(loss)
        self.train_metric(labels, predictions)

This next function is just a test step, used to test the last training step. This function is almost identical to the train_step() function, except for there are no gradients and updates.

    @tf.function
    def test_step(self, images, labels):
        '''
            This is a TensorFlow function, run once for each epoch for the
            whole input.
        '''
        predictions = self.nn_model(images)
        t_loss = self.loss_object(labels, predictions)

        self.test_loss(t_loss)
        self.test_metric(labels, predictions)

The next function ties the whole class together into one function, with three for loops. Later on, we define how many epochs (iterations) we want the neural networks to train and test for – and then for each iteration, we run through each observation.

Afterwards, we can see how well we optimized our loss function and metric. We just keep running this from $0$ to $n$ epochs. This concludes the class MyModel. Have a close look at the three for loops, as that is where all the action is happening.

    def fit(self, train, test, epochs):
        '''
            This fit function runs training and testing.
        '''
        for epoch in range(epochs):
            for images, labels in train:
                self.train_step(images, labels)

            for test_images, test_labels in test:
                self.test_step(test_images, test_labels)

            template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
            print(template.format(epoch+1,
                                  self.train_loss.result(),
                                  self.train_metric.result()*100,
                                  self.test_loss.result(),
                                  self.test_metric.result()*100))

            # Reset the metrics for the next epoch
            self.train_loss.reset_states()
            self.train_metric.reset_states()
            self.test_loss.reset_states()
            self.test_metric.reset_states()

For the next snippet of code, we simply define all the variables and functions we need for a neural network to run – a loss function, optimizer and metric.

# Make a loss object
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

# Select the optimizer
optimizer = tf.keras.optimizers.Adam()

# Specify metrics for training
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

# Specify metrics for testing
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

We take the loss functions, optimizer and metrics, and we input that into MyModel by instantiating the class with these variables. So when we call MyModel() with all these parameters, we actually run the __init__ function in the MyModel class.

As mentioned earlier, we can call functions and variables from the instance of a class, so here we quite simply call the fit function with our training and testing dataset.

# Create an instance of the model
model = MyModel(loss_object = loss_object,
                optimizer = optimizer,
                train_loss = train_loss,
                train_metric = train_metric,
                test_loss = test_loss,
                test_metric = test_metric)

EPOCHS = 5

model.fit(train = train_ds,
          test = test_ds,
          epochs = EPOCHS)

This produces the following output in the console (which will change each time you run the training).

Epoch 1, Loss: 0.13490843153949827, Accuracy: 95.94166666666666, Test Loss: 0.06402905891434076, Test Accuracy: 97.86
Epoch 2, Loss: 0.043823116325043765, Accuracy: 98.64666666666668, Test Loss: 0.06146741438847755, Test Accuracy: 98.05
Epoch 3, Loss: 0.022285125361487735, Accuracy: 99.29666666666667, Test Loss: 0.056894636656289105, Test Accuracy: 98.3
Epoch 4, Loss: 0.013788788002398602, Accuracy: 99.52666666666666, Test Loss: 0.0621878185059347, Test Accuracy: 98.3
Epoch 5, Loss: 0.010066991776032834, Accuracy: 99.66000000000001, Test Loss: 0.06649907561390188, Test Accuracy: 98.33