Deep Learning for layman

Prologue : Today Deep learning is a buzz word like how data science and machine learning was yesterday. And it is no surprise that you would blow up with too information with too complicated terminologies and glossaries when you try to understand Deep learning with the materials available on online.
This blog is written with an aspect of helping the layman to understand what is Deep learning without bringing complicated math and terminologies in first place.
Some parts of this writing include fictional contents and it is for obvious reasons to teach Deep learning in much simpler way, however facts about the deep learning remains the same.

Deep learning is not new.
Deep learning is not relatively new a field, it has been around in history for a long time. Techniques used Deep learning was helping machines to learn and solve problems. Hence this field of study comes under machine learning science. However techniques used in this field was not providing impressive result and had obstacle that was considered to be unsolvable on those days. Hence Deep learning was unpopular and remained in darkness for quite some time.
But recent breakthroughs in research have overcome most of the obstacles with high quality result. I will cover details of what was the breakthrough and how things turned good for deep learning in another post, for now lets keep things simple and understand the big picture.

Define Deep learning.
Deep learning is a science and study of applying neural networks (some times referred as deep neural network or artificial neural network) to solve complex problems in machine learning field.

What is Neural Network.
Back in history computer scientist wanted machine to think, learn and solve problems. They observed biological brain and nervous system to achieve this. They found a biological component in the brain called Neuron is responsible for human intelligence, hence by keeping the biological brain as the inspiration they developed a concept Artificial Neuron for machines to solve problems.

Biological Neuron components.
Dendrites – A pipeline that carries input signals
Nucleus – Consider this as a function that manipulate the input and decide what should be the output and where the output need to be sent.
Axon – A pipeline that carries output signals
Axon ending – Connecting point to another Neuron dendrites.

Biological Neuron vs Artificial Neuron

Artificial Neuron components.
Input – A pipeline that carries input.
Function – Math function that does take the input, applies user defined logic and send the output.
Output – A pipeline that carries the output.

The human brain has billions of Neurons interconnected ( referred as a biological neural network) and they talk to each other by sending and receiving electrical impulses which is responsible for making human to think and have consciousness. This network of communication has been mimicked with Artificial Neurons giving birth to something called Artificial Neural Network (ANN).

Neural Network

A novice user may now think, the concept or theory above ANN is fine, but how in real world ANN used to solve problems.
First lets see what are the real word problems. For example real world problem for a machine could be – image classification, object detection, handwriting detecting, text to speech processing, speech to text processing, learning to play a chess game, learning to diagnose a health problems, driving a car or even responding to your joke.

To crake on on all these problems reachers started to think from biological brain preceptive. What they found was interesting, biological brain does not work in the same way for every problem it encounters. i.e when human see an image and recognise the detail only some set of Neurons in neural network started functioning and they communicated to solve the puzzle. Whereas when driving a car totally differently different set of Neurons started functioning and they communicated to each other in a totally different way.

This helped reachers to understand the fact that there are many different neural network in biological system which gets functional based the problem it encounters.
Following the same fact researcher concluded that there is no one Artificial Neural Network need for solving all the problems. But we could artificially create different type Neural Network for solving different problems. i.e a specific neural network for Image processing, a specific neural network for driving a car and etc. In day to day practices all these specifically created neural networks are collectively referred as Artificial Neural Networks (ANNs) once again.

Understanding the internals of Artificial Neuron
Most simplest Neuton in a network may look something like shown in the diagram. It will have input, output, math function (aka activation function), weights (W) and optional basis (B). Weights W are nothing but user defined number which will be used in activation function for computation. Let’s say you want to solve classification problem. Given x1, x2, x3, x4 you want to find class Y. For doing this in neural network you will define a hypothetical math function. Let say you define a function shown in the diagram.

Activation Function

Since it is supervised problem you already know what should be the out for certain set of input. Now provide the inputs to the model it will give the output Y. If Y value is not the desired value that you expected then you are left out with two options. Either you can chance the hypothetical math function or change the weights so that output from the model will be close to the actual expected value. This process is called training the network.

In most of the cases rather than changing the math function you will adjust the weights in network to match with the actual expected value.One way of fining the suitable weights is by manually proving some arbitrary value for weight and then check the output value.Another systematic method of finding suitable values for weights is called Back propagation. But back propagation has its own disadvantage and we will not discuss that in this post.

Once the optimal weight are determined we use those weights in the network and we calculate the output. Now the output will be close the expected value. We will use the same weight for finding unknown Y.

Note: In the above example we have defined arbitrary activation function. But in most popular ANNs you see functions like Sigmoid, TanH, ReLU used as activation function inside the networks. There is no criteria to say what function should be used for what problem. Selection of activation function is purely output driven.

activation

Similarly using the same approach as shown in the above we can connect output one neutron to another neutron for solving much more for complex problem. Next lets see what are the ANNs available today and what type of problems it tries to solve.  As of this writing there are many varieties of ANN available. We will cover only the basic and popular ANNs in this post.

Perceptron.
Perceptron is one the most simplest form of Artificial Neural Networks.
We call an Artificial Neural Networks as Perceptron when only one neutron layer is used in the network as shown in below diagram.

ai-neuron
Perceptron can be used for trivial classification problem. However Perceptron are only consider to be baby step for learning and build for complex network.

Multi Layer Perceptron (MLP) .
When multiple layer is involved in the network we call that Multi Layer Perceptron. In other words multiple Perceptron combined to form MLP refer diagram below.

ann
Evolutions of Multi Layer Perceptrons gave birth to main category of ANN. This categorisation is based on the property how MLP internally communicate among the network.
1. Feed forward networks.
2. Recurrent or feedback networks.

Feed forward.
When you design an ANN and when you allow inputs to travel only in one direction. i.e from input to hidden layers if any, then to output. Then this type of ANN falls into category called Feed Forward. The name feed forward because input signals are passed and processed only in one direction that too in forward direction.

Recurrent or feedback.
ANNs belonging to this category allow inputs to travel forward, backward or loop through as show in below diagram.

Epilogue : As we discussed many neural networks has been developed since the inception of MLP and today some of the widely used network are Convolution Neural Network, Autoencoder, Restricted Boltzmann Machine, Recursive Neural Tensor Network etc. In all these ANNs basic concept of Artificial Neutron remain the same however their internal network communication and architecture get differs to achieve their desired goal.

Final thoughts on selecting a right network.
If you are interested in finding patters from unlabelled data, then use Restricted Boltzmann Machine (RBM) or Autoencoders network.
For text processing tasks like sentiment analysis, parsing, or named entity recognition use Recursive Neural Tensor Network (RNTN).
For image processing use – Convolution Neural Network (CNN) or Deep Belief Network.
For object recognition use convolution net Recursive Neural Tensor Network (RNTN).
For speech recognition use Recurrent Neural Network (RNN).
For any general classification problems use MLP with ReLu as activation function.




7 Comments

  1. Sunil Kappal wrote
    at 5:50 PM - 19th November 2016 Permalink

    Thanks for sharing this wonderfully crafted article!!!

  2. Sharma Pradeep wrote
    at 6:08 PM - 19th November 2016 Permalink

    What r the advantages of tanh?

  3. shakthydoss wrote
    at 6:19 PM - 19th November 2016 Permalink

    The range of the tanh function is [-1,1] and that of the sigmoid function is [0,1].
    Having stronger gradients: since data is centered around 0, the derivatives are higher. To see this, calculate the derivative of the tanh function and notice that input values are in the range [0,1]

  4. Sharma Pradeep wrote
    at 12:37 PM - 21st November 2016 Permalink

    Will u feel is that accurate activation function? Any example to explain this.

  5. shakthydoss wrote
    at 12:38 PM - 21st November 2016 Permalink

    I would prefer using Relu as activation function than tanh.
    As recent research proves Relu is more promising…

  6. Zeinab Sedahmed wrote
    at 12:39 PM - 21st November 2016 Permalink

    hi,i am new please advise me,how i can start with data mining to be expert.

  7. shakthydoss wrote
    at 12:42 PM - 21st November 2016 Permalink

    recommend to read data mining by vipin kumar as a first book. Then read all articles in my blog 🙂

Post a Comment

Your email is never published nor shared. Required fields are marked *