I get a lot of questions about how to learn TensorFlow and Deep Learning. I’ll often hear, “How do I start learning TensorFlow?” or “How do I start learning Deep Learning?”. My answer is, “Learn Deep Learning and TensorFlow at the same time!”. See, it’s not easy to learn one without the other. Of course, you can use other libraries like Keras or Theano, but TensorFlow is a clear favorite when it comes to libraries for deep learning. And now is the best time to start. If you haven’t noticed, there’s a huge wave of new startups or big companies adopting deep learning. Deep Learning is the hottest skill to have right now.
So let’s start from the basics. What actually is “Deep Learning” and why is it so hot in data science right now? What’s the difference between Deep Learning and traditional machine learning? Why TensorFlow? And where can you start learning?
What is Deep Learning?
Inspired by the brain, deep learning is a type of machine learning that uses neural networks to model high-level abstractions in data. The major difference between Deep Learning and Neural Networks is that Deep Learning has multiple hidden layers, which allows deep learning models (or deep neural networks) to extract complex patterns from data.
How is Deep Learning different from traditional machine learning algorithms, such as Neural Networks?
Under the umbrella of Artificial Intelligence (AI), machine learning is a sub-field of algorithms that can learn on their own, including Decision Trees, Linear Regression, K-means clustering, Neural Networks, and so on. Deep Neural Networks, in particular, are super-powered Neural Networks that contain several hidden layers. With the right configuration/hyper-parameters, deep learning can achieve impressively accurate results compared to shallow Neural Networks with the same computational power.
Why is Deep Learning such a hot topic in the Data Science community?
Simply put, across many domains, deep learning can attain much faster and more accurate results than ever before, such as image classification, object recognition, sequence modeling, speech recognition, as so on. It all started recently, too; around 2015. There were three key catalysts that came together resulting in the popularity of deep learning:
- Big Data: the presence of extremely large and complex datasets;
- GPUs: the low cost and wide availability of GPUs made the parallel processing faster and cheaper than ever;
- Advances in deep learning algorithms, especially for complex pattern recognition.
These three factors resulted in the deep learning boom that we see today. Self-driving cars and drones, chat bots, translations, AI playing games. You can now see a tremendous surge in the demand for data scientists and cognitive developers. Big companies are recognizing this evolution in data-driven insights, which is why you now see IBM, Google, Apple, Tesla, and Microsoft investing a lot of money in deep learning.
What are the applications of Deep Learning?
Historically, the goal of machine learning was to move humanity towards the singularity of “General Artificial Intelligence”. But not surprisingly, this goal has been tremendously difficult to attain. So instead of trying to develop generalized AI, scientists started to develop a series of models and algorithms that excelled in specific tasks.
So, to realize the main applications of Deep Learning, it is better to briefly take a look at each of the different types of Deep Neural Networks, their main applications, and how they work.
What are the different types of Deep Neural Networks?
Convolutional Neural Networks (CNNs)
Assume that you have a dataset of images of cats and dogs, and you want to build the model that can recognize and differentiate them. Traditionally, your first step would be “feature selection”. That is, to choose the best features from your images, and then use those features in a classification algorithm (e.g., Logistic Regression or Decision Tree), resulting in a model that could predict “cat” or “dog” given an image. These chosen features could simply be the color, object edges, pixel location, or countless other features that could be extracted from the images.
Of course, the better and effective the feature sets you found, the more accurate and efficient image classification you could obtain. In fact, in the last two decades, there has been a lot of scientific research in image processing just about how one can find the best feature sets from images for the purposes of classification. However, as you can imagine, the process of selecting and using the best features is a tremendously time-consuming task and is often ineffective. Further, extending the features to other types of images becomes an even greater problem – the features you used to discriminate cats and dogs cannot be generalized, for example, for recognizing hand-written digits. Therefore, the importance of feature selection can’t be overstated.
Enter convolutional neural networks (CNNs). Suddenly, without having to find or select features, CNNs finds the best features for you automatically and effectively. So instead of you choosing what image features to classify dogs vs. cats, CNNs can automatically find those features and classify the images for you.
What are the CNN applications?
CNNs have gained a lot of attention in the machine learning community over the last few years. This is due to the wide range of applications where CNNs excel, especially machine vision projects: image recognition/classifications, object detection/recognition in images, digit recognition, coloring black and white images, translation of text on the images, and creating art images,
Lets look closer to a simple problem to see how CNNs work. Consider the digit recognition problem. We would like to classify images of handwritten numbers, where the target will be the digit (0,1,2,3,4,5,6,7,8,9) and the observations are the intensity and relative position of pixels. After some training, it’s possible to generate a “function” that map inputs (the digit image) to desired outputs (the type of digit). The only problem is how well this map operation occurs. While trying to generate this “function”, the training process continues until the model achieves a desired level of accuracy on the training data. You can learn more about this problem and the solution for it through our convolution network with hands-on notebooks.
How does it work?
Convolutional neural networks (CNNs) is a type of feed-forward neural network, consist of multiple layers of neurons that have learnable weights and biases. Each neuron in a layer that receives some input, process it, and optionally follows it with a non-linearity. The network has multiple layers such as convolution, max pool, drop out and fully connected layers. In each layer, small neurons process portions of the input image. The outputs of these collections are then tiled so that their input regions overlap, to obtain a higher-resolution representation of the original image; and it is repeated for every such layer. The important point here is: CNNs are able to break the complex patterns down into a series of simpler patterns, through multiple layers.
Recurrent Neural Network (RNN)
Recurrent Neural Network tries to solve the problem of modeling the temporal data. You feed the network with the sequential data, it maintains the context of data and learns the patterns in the temporal data.
What are the applications of RNN?
Yes, you can use it to model time-series data such as weather data, stocks, or sequential data such as genes. But you can also do other projects, for example, for text processing tasks like sentiment analysis and parsing. More generally, for any language model that operates at word or character level. Here are some interesting projects done by RNNs: speech recognition, adding sounds to silent movies, Translation of Text, chat bot, hand writing generation, language modeling (automatic text generation), and Image Captioning.
How does it work?
The Recurrent Neural Network is a specialized type of Neural Network that solves the issue of maintaining context for sequential data. RNNs are models with a simple structure and a feedback mechanism built-in. The output of a layer is added to the next input and fed back to the same layer. At each iterative step, the processing unit takes in an input and the current state of the network and produces an output and a new state that is re-fed into the network.
However, this model has some problems. It’s very computationally expensive to maintain the state for large amounts of units, even more so over a long amount of time. Additionally, Recurrent Networks are very sensitive to changes in their parameters. To solve these problems, a way to keep information over long periods of time and additionally solve the oversensitivity to parameter changes, i.e., make backpropagating through the Recurrent Networks more viable was found. What is it? Long-Short Term Memory (LSTM).
LSTM is an abstraction of how computer memory works: you have a linear unit, which is the information cell itself, surrounded by three logistic gates responsible for maintaining the data. One gate is for inputting data into the information cell, one is for outputting data from the input cell, and the last one is to keep or forget data depending on the needs of the network.
If you want to practice the basic of RNN/LSTM with TensorFlow or language modeling, you can practice it here.
Restricted Boltzmann Machine (RBM)
RBMs are used to find the patterns in data in an unsupervised fashion. They are shallow neural nets that learn to reconstruct data by themselves. They are very important models, because they can automatically extract meaningful features from a given input, without the need to label them. RBMs might not be outstanding if you look at them as independent networks, but they are significant as building blocks of other networks, such as Deep Believe Networks.
What are the applications of RBM?
RBM is useful for unsupervised tasks such as feature extraction/learning, dimensionality reduction, pattern recognition, recommender systems (Collaborative Filtering), classification, regression, and topic modeling.
To understand the theory of RBM and application of RBM in Recommender Systems you can run these notebooks.
How does it work?
It only possesses two layers: a visible input layer and a hidden layer where the features are learned. Simply put, RBM takes the inputs and translates them into a set of numbers that represents them. Then, these numbers can be translated back to reconstruct the inputs. Through several forward and backward passes, the RBM will be trained. Now we have a trained RBM model that can reveal two things: first, what is the interrelationship among the input features; second, which features are the most important ones when detecting patterns.
Deep Belief Networks (DBN)
Deep Belief Network is an advanced Multi-Layer Perceptron (MLP). It was invented to solve an old problem in traditional artificial neural networks. Which problem? The backpropagation in traditional Neural Networks can often lead to “local minima” or “vanishing gradients”. This is when your “error surface” contains multiple grooves and you fall into a groove that is not the lowest possible groove as you perform gradient descent.
What are the applications of DBN?
DBN is generally used for classification (same as traditional MLPs). One the most important applications of DBN is image recognition. The important part here is that DBN is a very accurate discriminative classifier and we don’t need a big set of labeled data to train DBN; a small set works fine because feature extraction is unsupervised by a stack of RBMs.
How does it work?
DBN is similar to MLP in term of architecture, but different in training approach. DBNs can be divided into two major parts. The first one is stacks of RBMs to pre-train our network. The second one is a feed-forward backpropagation network, that will further refine the results from the RBM stack. In the training process, each RBM learns the entire input. Then, the stacked RBMs, can detect inherent patterns in inputs.DBN solves the “vanishing problem” by using this extra step, so-called
DBN solves the “vanishing problem” by using this extra step, so-called pre-training. Pre-training is done before backpropagation and can lead to an error rate not far from optimal. This puts us in the “neighborhood” of the final solution. Then we use backpropagation to slowly reduce the error rate from there.
An autoencoder is an artificial neural network employed to recreate a given input. It takes a set of unlabeled inputs, encodes them and then tries to extract the most valuable information from them. They are used for feature extraction, learning generative models of data, dimensionality reduction and can be used for compression. They are very similar to RBMs but can have more than 2 layers.
What are the applications of Autoencoders?
Autoencoders are employed in some of the largest deep learning applications, especially for unsupervised tasks. For example, for Feature Extraction, Pattern recognition, and Dimensionality Reduction. In another example, say that you want to extract what feeling the person in a photography is feeling, Nikhil Buduma explains the utility of this type of Neural Network with excellence.
How does it work?
RBM is an example of Autoencoders, but with fewer layers. An autoencoder can be divided into two parts: the encoder and the decoder.
Let’s say that we want to classify some facial images and each image is very high dimensionally (e.g 50×40). The encoder needs to compress the representation of the input. In this case we are going to compress the face of our person, that consists of 2000 dimensional data to only 30 dimensions, taking some steps between this compression. The decoder is a reflection of the encoder network. It works to recreate the input, as closely as possible. It has an important role during training, to force the autoencoder to select the most important features in the compressed representation. After training, you can use 30 dimensions to apply your algorithms.
Why TensorFlow? How does it work?
TensorFlow is also just a library but an excellent one. I believe that TensorFlow’s capability to execute the code on different devices, such as CPUs and GPUs, is its superpower. This is a consequence of its specific structure. TensorFlow defines computations as graphs and these are made with operations (also know as “ops”). So, when we work with TensorFlow, it is the same as defining a series of operations in a Graph.
To execute these operations as computations, we must launch the Graph into a Session. The session translates and passes the operations represented in the graphs to the device you want to execute them on, be it a GPU or CPU.
For example, the image below represents a graph in TensorFlow. W, x, and b are tensors over the edges of this graph. MatMul is an operation over the tensors W and x, after that Add is called and add the result of the previous operator with b. The resultant tensors of each operation cross the next one until the end, where it’s possible to get the wanted result.
TensorFlow is really an extremely versatile library that was originally created for tasks that require heavy numerical computations. For this reason, TensorFlow is a great library for the problem of machine learning and deep neural networks.
Where should I start learning?
Again, as I mentioned first, it does not matter where to start, but I strongly suggest that you learn TensorFlow and Deep Learning together. Deep Learning with TensorFlow is a course that we created to put them together. Check it out and please let us know what you think of it.
Good luck on your journey into one of the most exciting technologies to surface in our field over the past few years.