Introduction to the MXNet deep learning library
The deep learning libraries we will use in this book are MXNet, Keras, and TensorFlow. Keras is a frontend API, which means it is not a standalone library as it requires a lower-level library in the backend, usually TensorFlow. The advantage of using Keras rather than TensorFlow is that it has a simpler interface. We will use Keras in later chapters in this book.
Both MXNet and TensorFlow are multipurpose numerical computation libraries that can use GPUs for mass parallel matrix operations. As such, multi-dimensional matrices are central to both libraries. In R, we are familiar with the vector, which is a one-dimensional array of values of the same type. The R data frame is a two-dimensional array of values, where each column can have different types. The R matrix is a two-dimensional array of values with the same type. Some machine learning algorithms in R require a matrix as input. We saw an example of this in Chapter 2, Training a Prediction Model, with the RSNSS package.
In R, it is unusual to use data structures with more than two dimensions, but deep learning uses them extensively. For example, if you have a 32 x 32 color image, you could store the pixel values in a 32 x 32 x 3 matrix, where the first two dimensions are the width and height, and the last dimension is for the red, green, and blue colors. This can be extended further by adding another dimension for a collection of images. This is called a batch and allows the processor (CPU/GPU) to process multiple images concurrently. The batch size is a hyper-parameter and the value selected depends on the size of the input data and memory capacity. If our batch size were 64, our matrix would be a 4-dimensional matrix of size 32 x 32 x 3 x 64 where the first 2 dimensions are the width and height, the third dimension is the colors, and the last dimension is the batch size, 64. The important thing to realize is that this is just another way of representing data. In R, we would store the same data as a 2-dimensional matrix (or dataframe) with 64 rows and 32 x 32 x 3 = 3,072 columns. All we are doing is reshaping the data, we are not changing it.
These n-dimensional matrices, which contain elements of the same type, are the cornerstone of using MXNet and TensorFlow. In MXNet, they are referred to as NDArrays. In TensorFlow, they are known as tensors. These n-dimensional matrices are important because they mean that we can feed the data into GPUs more efficiently; GPUs can process data in batches more efficiently than processing single rows of data. In the preceding example, we use 64 images in a batch, so the deep learning library will process input data in chunks of 32 x 32 x 3 x 64.
This chapter will use the MXNet deep learning library. MXNet originated at Carnegie Mellon University and is heavily supported by Amazon, they choose it as their default Deep Learning library in 2016. In 2017, MXNet was accepted as an Apache Incubator project, ensuring that it would remain as open source software. Here is a very simple example of an NDArray (matrix) operation in MXNet in R. If you have not already installed the MXNet package for R, go back to Chapter 1, Getting Started with Deep Learning, for instructions, or use this link: https://mxnet.apache.org/install/index.html:
library(mxnet) # 1
ctx = mx.cpu() # 2
a <- mx.nd.ones(c(2,3),ctx=ctx) # 3
b <- a * 2 + 1 # 4
typeof(b) # 5
[1] "externalptr"
class(b) # 6
[1] "MXNDArray"
b # 7
[,1] [,2] [,3]
[1,] 3 3 3
[2,] 3 3 3
We can break down this code line by line:
- Line 1 loads the MXNet package.
- Line 2 sets the CPU context. This tells MXNet where to process your computations, either on the CPU or on a GPU, if one is available.
- Line 3 creates a 2-dimensional NDArray of size 2 x 3 where each value is 1.
- Line 4 creates another 2-dimensional NDArray of size 2 x 3. Each value will be 3 because we perform element-wise multiplication and add 1.
- Line 5 shows that b is an external pointer.
- Line 6 shows that the class of b is MXNDArray.
- Line 7 displays the results.
We can perform mathematical operations, such as multiplication and addition, on the b variable. However, it is important to realize that, while this behaves similarly to an R matrix, it is not a native R object. We can see this when we output the type and class of this variable.
When developing deep learning models, there are usually two distinct steps. First you create the model architecture and then you train the model. The main reason for this is because most deep learning libraries employ symbolic programming rather than the imperative programming you are used to. Most of the code you have previously written in R is an imperative program, which executes code sequentially. For mathematical optimization tasks, such as deep learning, this may not be the most efficient method of execution. Most deep learning libraries, including MXNet and TensorFlow, use symbolic programming. For symbolic programming, a computation graph for the program execution is designed first. This graph is then compiled and executed. When the computation graph is generated, the input, output, and graph operations are already defined, meaning that the code can be optimized. This means that for deep learning, symbolic programs are usually more efficient than imperative programs.
Here is a simple example of the type of optimization using symbolic programs:
M = (M1 * M2) + (M3* M4)
An imperative program would calculate this as follows:
Mtemp1 = (M1 * M2)
Mtemp2 = (M3* M4)
M = Mtemp1 + Mtemp2
A symbolic program would first create a computation graph, which might look like the following:
M1, M2, M3, and M4 are symbols that need to be operated on. The graph shows the dependencies for operations; the + operation requires the two preceding multiplication operations to be done before it can execute. But there is no dependency between the two multiplication steps, so these can be executed in parallel. This type of optimization means the code can execute much faster.
From a coding point of view, this means is that you have two steps in creating a deep learning model – first you define the architecture of the model and then you train the model. You create layers for your deep learning model and each layer has symbols that are placeholders. So for example, the first layer is usually:
data <- mx.symbol.Variable("data")
data is a placeholder for the input, which we will insert later. The output of each layer feeds into the next layer as input. This might be a convolutional layer, a dense layer, an activation layer a dropout layer, and so on. The following code example shows how the layers continue to feed into each other; this is taken from a full example later in this chapter. Notice how the symbol for each layer is used as input in the next layer, this is how the model is built layer after layer. The data1 symbol is passed into the first call to mx.symbol.FullyConnected, the fc1 symbol is passed into the first call to mx.symbol.Activation, and so on.
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=64)
act1 <- mx.symbol.Activation(fc1, name="activ1", act_type=activ)
drop1 <- mx.symbol.Dropout(data=act1,p=0.2)
fc2 <- mx.symbol.FullyConnected(drop1, name="fc2", num_hidden=32)
act2 <- mx.symbol.Activation(fc2, name="activ2", act_type=activ)
.....
softmax <- mx.symbol.SoftmaxOutput(fc4, name="sm")
When you execute this code, it runs instantly as nothing is executed at this stage. Eventually, you pass the last layer into a function to train the model. In MXNet, this is the mx.model.FeedForward.create function. At this stage, the computation graph is computed and the model begins to be trained:
softmax <- mx.symbol.SoftmaxOutput(fc4, name="sm")
model <- mx.model.FeedForward.create(softmax, X = train_X, y = train_Y,
ctx = devices,num.round = num_epochs,
................
This is when the deep learning model is created and trained. More information on the MXNet architecture is available online; the following links will get you started: