Recently I got a couple of EVGA GeForce GTX 1080 to keep my study nicely lit and warm when winter comes to Seattle. My interest in GPUs though is more for Deep Learning, than lighting and heating. Deep Learning is actively being explored for all kinds of machine learning applications since they offer a hope of automatic feature learning. In fact, a large number of Kaggle competition winners tend to rely on Deep Learning methods to avoid any kind of hand-crafted feature engineering. Considering how computationally expensive Deep Learning training tends to be, GPUs are essential for doing anything meaningful in a reasonable amount of time.
As part of my job with the Big Data & Analytics Platform team at Oracle, I come across customers that do need help with tackling some of these cutting-edge machine learning problems – from image understanding to speech recognition and even product recommendations. Part of the challenge is always simplifying the complexity and letting the people focus on what they need to do and hide away what is important but not necessary for immediate focus.
Siraj Rival ( @sirajology ) had posted a really nice video earlier this year on how to build a movie recommender system using 10 lines of C++ code and DSSTNE (pronounced “destiny“), a deep learning library that my old team at Amazon built and open-sourced earlier this year.
Aside: DSSTNE does automagic model parallelism across multiple GPUs and is also very fast on sparse datasets. Scott Le Grand ( @scottlegrand ) who was the main creator of DSSTNE has reported DSSTNE to be almost 15x faster than TensorFlow in some cases.
- Disclosure: Scott and I used to work together at Amazon for the personalization team that built DSSTNE. We no longer work for Amazon, so cannot speak to how it is being used inside Amazon.
- Update: Check out this talk by Scott talk on DSSTNE at Data Science Summit 2016 )
Back to Siraj’s movie recommender – although he does a great job, I think there are some very important points about the design of DSSTNE that are easily overlooked. DSSTNE has 3 important design elements:
- scale – to handle large datasets that won’t fit on a single GPU, and do that automatically.
- speed – for faster experimentation cycles, allowing the scientists to be more efficient and scale the number of experiments they run
- simplicity – for non-experts to experiment, deploy and manage deep learning solutions into production
In this post, I’ll show how to build a movie recommender writing NO lines of C++ code. DSSTNE is largely configured through a Neural Network Layer Definition Language and 3 binaries – generateNetCDF, train & predict. It uses a JSON based config file to describe the network, the functions, and parameters to use when training the model. This approach makes it much easier for people to run the hyper-parameter search across different network structures without needing to write a single line of C++ code.
So let us get started by installing CUDA and cuDNN on my Ubuntu 16.04.
CUDA & cuDNN
First the prerequisites for CUDA.
$ sudo add-apt-repository ppa:graphics-drivers/ppa $ sudo apt-get update $ sudo apt-get install nvidia-367 $ sudo apt-get install mesa-common-dev $ sudo apt-get install freeglut3-dev
Download the local run binary from https://developer.nvidia.com/cuda-toolkit
Install the CUDA 8 library:
$ sudo ./cuda_8.0.27_linux.run --override
IMPORTANT: Make sure you DO NOT install the drivers included with the .run file. Keep others are defaults and yes for everything else.
Set environment variables:
$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
At this point, you should be able to check that network cards are recognized by CUDA by running nvidia-smi.
$ nvidia-smi Thu Aug 11 10:51:36 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.35 Driver Version: 367.35 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A | | 0% 31C P8 7W / 180W | 1MiB / 8113MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 0000:06:00.0 On | N/A | | 0% 35C P8 8W / 180W | 156MiB / 8110MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 1 4455 G /usr/lib/xorg/Xorg 106MiB | | 1 5229 G compiz 48MiB | +-----------------------------------------------------------------------------+
To force CUDA to use the latest version of GCC edit the header file that drops you out of a build.
$ sudo nano /usr/local/cuda/include/host_config.h
Comment out the line which complains about the GCC version
//#error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
Compile the samples:
$ cd ~/NVIDIA_CUDA-8.0_Samples $ make
Some of the samples still fail, but I’ll look into them later.
Get the CUDNN library from https://developer.nvidia.com/cudnn and follow the instructions to install it.
Now for DSSTNE (Destiny)
First, you need to install the pre-requisites for DSSTNE. I’ve put together a shell script that runs the steps documented here.
Then, make sure you have the paths set up correctly. I had something like this in my .bashrc.
# Add CUDA to the path # Could use /usr/local/cuda/bin:${PATH} instead of explicit cuda8 export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # Add cuDNN library path export LD_LIBRARY_PATH=/usr/local/cudnn-8.0/lib64:${LD_LIBRARY_PATH} # Add OpenMPI to the path export PATH=/usr/local/openmpi/bin:${PATH} # Add the local libs to path as well export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib
Now to get, build and test DSSNTE:
$ git clone https://github.com/amznlabs/amazon-dsstne.git $ cd amazon-dsstne/src/amazon/dsstne $ make
This will build the binaries under amazon-dsstne/src/amazon/dsstne/bin for:
- generateNetCDF – converts CSV text files into NetCDF format used by DSSTNE
- train – trains a network using input, output data and config file with network definition
- predict – uses a pre-trained network to make predictions.
There is a nice example of training an auto-encoder based recommender for MovieLens 20m dataset that comes with the code.
Download the data in the CSV/Text File format. If you have your own dataset, make sure it conforms to this data format.
$ wget https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/data/ml20m-all
Convert the text data into NetCDF format data for network input and expected network output. It also builds up a features and samples index files.
$ generateNetCDF -d gl_input -i ml20m-all -o gl_input.nc -f features_input -s samples_input -c $ generateNetCDF -d gl_output -i ml20m-all -o gl_output.nc -f features_output -s samples_input -c
Train the network using the config for 30 epochs and batch size of 256. It will checkpoint and save the network every 10 epochs. Handy if you want to explore the network convergence by epochs.
$ train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 30
Once the training complete, you can use the network and GPU to make batch mode offline predictions in the original text format. The following command generated 10 movie recommendations for each user in ml20m-all file (i.e. -r ml-20all) into the recs file (-s recs). It also lets you mask or filter out movies that the user has already seen (-f ml20-all)
$ predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f ml20m-all -s recs -r ml20m-all
That’s it. While it’s training you can use nvidia-smi to see which GPU it is running on, and how much memory it uses.
$ nvidia-smi Thu Aug 11 12:11:58 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.35 Driver Version: 367.35 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A | | 0% 42C P2 77W / 180W | 524MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 0000:06:00.0 On | N/A | | 0% 36C P8 8W / 180W | 170MiB / 8110MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 5523 C train 521MiB | | 1 4455 G /usr/lib/xorg/Xorg 106MiB | | 1 5229 G compiz 60MiB | +-----------------------------------------------------------------------------+
Look ma, no C++ code!
The number of input nodes is automatically inferred from the input data file.
The number of output nodes is automatically inferred from the expected output data file.
Everything else is defined in the config.json or command line flags ( batch size and the number of epochs for training). Neural Network Layer Definition Language describes all that DSSTNE supports.
{ "Version" : 0.7, "Name" : "AE", "Kind" : "FeedForward", "SparsenessPenalty" : { "p" : 0.5, "beta" : 2.0 }, "ShuffleIndices" : false, "Denoising" : { "p" : 0.2 }, "ScaledMarginalCrossEntropy" : { "oneTarget" : 1.0, "zeroTarget" : 0.0, "oneScale" : 1.0, "zeroScale" : 1.0 }, "Layers" : [ { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "gl_input", "Sparse" : true }, { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "N" : 128, "Activation" : "Sigmoid", "Sparse" : true }, { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "gl_output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true } ], "ErrorFunction" : "ScaledMarginalCrossEntropy" }
Applying deep learning techniques to a problem such as recommendation typically means lots of experimentation exploring the different mix of the:
- types of input data and output targets – purchase or browsing history, rating, product attributes such as category, cost, and color, attributes such as age or gender, etc.
- network structures – the number of layers, number of nodes per layer, connections between layers, etc.
- network and training parameters – learning rates, denoising, drop-outs, activation functions, etc.
How you pose the problem and prepare the dataset (#1) is VERY important in applying deep learning. If you pose the machine learning problem incorrectly, not even deep learning and a cloud full of GPUs can help you there.
But, once you have that, figuring out the right network structure (#2) and training parameters (#3) can mean a difference between success and failure. That means running a lot of experiments or essentially a hyper-parameter search problem.
The JSON based config simplifies the hyper-parameter search problem. You can generate a large combination of these config files and try them out in parallel, quickly narrowing the options down to the configurations that are most suitable for that particular application.
Given this is still early days for Deep Learning, the speed and scale of experimentation has a huge bearing on what we learn about using Deep Learning. Plus I know my study will be warm for this coming winter.
Credits and References:
- Scott Le Grand ( @scottlegrand ) for making such an awesome Deep Learning library.
- Thanks, Robin ( @robinchow_rc ) for putting the box together for me.
- http://yangcha.github.io/GTX-1080
- https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775
- http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04/767270