A recent post on using DSSTNE (a deep learning library that I had a minor hand in) for training a simple movie recommender, sparked off some interesting conversations around expectations we have of Deep Learning. It can basically be summed up as – Is Deep Learning the path to Artificial Intelligence or will it be a one-hit wonder liable to fall out of fashion quickly?
Having developed actual production systems using machine learning and deep learning, I want to set expectations for deep learning and highlight opportunities that should not be ignored.
If you want the truth to stand clear before you, never be for or against. The struggle between “for” and “against” is the minds worst disease. – Seng-ts’an, c. 700 C. E.
In case you haven’t heard Deep Learning (aka neural networks) are on a comeback after the great winter of AI, thanks largely to the dropping cost of compute (i.e. GPUs) and easier development libraries (i.e. CUDA, Theano, Torch, Caffe, TensorFlow and DSSTNE). However, the biggest reason is the easy access to large volumes of data thanks to the internet and the labeled data collection platforms like Amazon’s Mechanical Turk.
One such dataset is put together by ImageNet. ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the biggest challenges in Computer Vision for state of the art in image recognition and understanding. New York Times wrote about it back in 2014 and when Baidu was banned from the competition for breaking competition rules. The challenge is to classify 1.28 million images belonging to 1,000 classes.
Enter Deep Learning
Deep Learning made a splash at ILSVRC in 2012, when Alex Krizhevsky, Ilya Sutskever & Geoffrey E. Hinton proposed a 5 layer neural network that outperformed any of the non-neural network approaches in the ImageNet. Their SuperVision entry based on a deep learning network (commonly referred to as AlexNet) won the competition with a 16.4% error rate, compared to the next best entry with an error rate of 26.2%. Since then Google, Facebook, Microsoft, Baidu, and others have aggressively researched into using deep learning. Last year Microsoft won the competition with an error rate of around 6% using a network with 152 layers.
In other applications of deep learning, Google saw a 49% drop in their speech recognition (i.e. transcription) errors using long-short-term memory deep recurrent neural network. Paypal uses deep learning for fraud detection and prevention (blog, video).
A Cautionary Tale
Clearly, deep learning has been very successful in solving some of the most challenging problems in AI. While we must approach it with a healthy dose of skepticism, we have to acknowledge the successes and explore the possibilities. The problem often comes when people throw deep learning at a problem without thinking through the problem.
It is no surprise that Amazon uses deep learning for recommendations since they have open sourced the engine and blogged about it. But it was not always like that. One of the challenges the personalization team faced when exploring deep learning was that the initial prototypes gave the same or worse performance when compared to traditional machine learning approaches used in the field of recommendations.
My biggest contribution to that team’s effort was to model the problem the right way. In this particular case, the right way was not the traditional way recommender systems have been thought of.
Even though both algorithms (A and B) used deep learning, a similar sized network, structure and training parameters, the approach I proposed and demonstrated saw a 6x improvement in precision for the top recommended item. I cannot share the details behind the formulation since Amazon didn’t allow external publication of that work, but if you have access check out the video of talk I gave at the Amazon Machine Learning Conference in 2015 😉
Simply throwing the data (and compute) at deep learning is not a good idea. You have to model and solve the problem in a manner appropriate for that specific problem.
Promise of Deep Learning
Deep Learning’s biggest promise is actually in learning latent feature or representation learning, which makes the subsequent task of prediction easier. Getting the right features can make the learning and prediction part of the problem trivial. By far scientists spend most of their time in the manual engineering of the right features using domain knowledge, experience and intuition, supported by standard feature selection and projection algorithms.
In deep learning techniques, neural networks jointly optimize the feature engineering, feature selection, and modeling steps – all at the same time. This opens up the opportunity for us to skip manual feature engineering, and let the machine discover the relative importance and non-linear interaction between the signals as they propagate through the network layers. In some setups, such as autoencoders, the network can learn the important layers of features without any labels at all – i.e. in an unsupervised way. This means we can start applying machine learning to domains where we have large volumes of unlabelled data or where acquiring labels is difficult/expensive.
There is also a lot of interest in transfer learning, where features are first learnt in a domain with a large amount of labeled data or in an unsupervised way. Once learnt the features are then fine-tuned for another related domain using much smaller datasets. But the practical reality for the moment is the same – deep learning requires a lot of data and computation.
No Free Lunch!
When I was doing my Ph.D. at UNSW, I often chatted with Achim Hoffmann who wrote this interesting perspective on the limitation of machine learning [ post-script file ] published in the European Conference on Artificial Intelligence back in 1990. The key element for me was this.
The results indicate a rather general point. Namely, that for any amount of information which should get acquired, people have to do the complete work. One may choose between writing complex programs and providing a program with a huge amount of input data. In any case, the work cannot be reduced essentially. The machine can only do what it is told to do. And it cannot be told to generate information by itself. … The results do not mean, that machine learning is completely purposeless. But they clearly show that one cannot expect any magic from machine learning.
Even though 25 years have passed since this paper was written, the underlying idea is very relevant to set our expectations of machine learning and deep learning. We can throw data and compute at deep learning, but it cannot magically get us the answer. We still need human experts and scientists to figure out how to apply deep learning appropriately, not to mention push the research boundaries of what is capable of deep learning. I think deep learning is a very promising field for exploration and worth taking the risks in investing experimentation resources towards. We just need to be prepared to learn.