[
  {
    "path": "LICENSE",
    "content": "The MIT License (MIT)\n\nCopyright (c) 2015 Olivier Grisel and contributors\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n"
  },
  {
    "path": "README.md",
    "content": "# dl-machine\n\nScripts to setup a GPU / CUDA enable compute server with libraries to study deep learning development\n\n## Setting up an Amazon g2.2xlarge spot instance\n\n- log in to AWS management console and select EC2 instances\n- select US-WEST (N. California) region in top left menu\n- click on \"Spot Request\" on the leftmost menu and click \"Request Spot Instances\"\n- select community AMIs and search for `ubuntu-14.04-hvm-deeplearning-paris`\n- on the Choose instance Type tab, select GPU instances `g2.2xlarge`\n- bid a price larger than current price (e.g. $0.30, if it fails check the spot pricing history for that instance type)\n- in configure security group click Add Rule, and add a Custom TCP Rule with port Range `8888-8889` and from `Anywhere` \n- Review and launch, save the `mykey.pem` file\n \nOnce your machine is up (status : running in the online console), note the address to your instance :\n `INSTANCE_ID.compute.amazonaws.com`\n\nNote: other regions with access to the deeplearning-paris image: Singapore, Ireland, North Virginia\n\n## Start using your instance\n\n#### Using the notebooks\nBy default an IPython notebook server and an iTorch notebook server should be running on port 8888 and 8889 respectively. You need to open those ports in the `Security Group` of your instance if you have not done so yet.\n\nTo start using your instance, simply open the following URLs in your favorite browser:\n\n- http://INSTANCE_ID.compute.amazonaws.com:8888\n- http://INSTANCE_ID.compute.amazonaws.com:8889\n\n\n#### SSH Connection to your instance\nOnce the instance is up, you might need to access directly your instance via SSH:\n- change the file `mykey.pem` accessibility:\n```\nchmod 400 mykey.pem\n```\n- ssh to your instance \n```\nssh -i mykey.pem ubuntu@INSTANCE_ID.compute.amazonaws.com\n```\n\n## Other instructions\n\n#### Setting up ssh connection keys\n\nThis optional part helps you setting ssh connection keys for better and easier access to your instance. If you already have a public key, skip the keygen part.\n\n- On your own generate a ssh key pair:\n```\nssh-keygen\n```\n- Then go through the steps, you'll have two files, id_rsa and id_rsa.pub (the first is your private key, the second is your public key - the one you copy to remote machines)\n```\ncat ~/.ssh/id_rsa.pub\n```\n- On the remote instance, open authorized_keys file the and append your key id_rsa.pub \n```\nvi ~/.ssh/authorized_keys\nchmod 600 ~/.ssh/authorized_keys\n```\n- On your local machine, you can then use a ssh-agent to store the decrypted key in memory so that you don't have to type the password each time. You can also add an alias in your `~/.ssh/config` file:\n```\nHost dlmachine\n     HostName INSTANCE_ID.compute.amazonaws.com\n     User ubuntu\n     ServerAliveInterval 300\n     ServerAliveCountMax 2\n```\n- You can now SSH to your machine with the following command:\n```\nssh dlmachine\n```\n\n#### Running ipython / iTorch server\n\nIf the notebooks do not work you can login to your instance via as ssh:\n\n```\nssh -A ubuntu@INSTANCE_ID.compute.amazonaws.com\n```\n\n(optional) Start a screen or tmux terminal:\n\n```\nscreen\n```\nUse the `top` or `ps aux` command to check whether the ipython process is running. If this is not the case, launch the ipython and itorch notebook server:\n```\nipython notebook --ip='*' --port=8888 --browser=none\nitorch notebook --ip='*' --port=8889 --browser=none\n```\n\n"
  },
  {
    "path": "caffe-Makefile.conf",
    "content": "## Refer to http://caffe.berkeleyvision.org/installation.html\n# Contributions simplifying and improving our build system are welcome!\n\n# cuDNN acceleration switch (uncomment to build with cuDNN).\n# USE_CUDNN := 1\n\n# CPU-only switch (uncomment to build without GPU support).\n# CPU_ONLY := 1\n\n# uncomment to disable IO dependencies and corresponding data layers\n# USE_OPENCV := 0\n# USE_LEVELDB := 0\n# USE_LMDB := 0\n\n# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)\n#\tYou should not set this flag if you will be reading LMDBs with any\n#\tpossibility of simultaneous read and write\n# ALLOW_LMDB_NOLOCK := 1\n\n# Uncomment if you're using OpenCV 3\n# OPENCV_VERSION := 3\n\n# To customize your choice of compiler, uncomment and set the following.\n# N.B. the default for Linux is g++ and the default for OSX is clang++\n# CUSTOM_CXX := g++\n\n# CUDA directory contains bin/ and lib/ directories that we need.\nCUDA_DIR := /usr/local/cuda\n# On Ubuntu 14.04, if cuda tools are installed via\n# \"sudo apt-get install nvidia-cuda-toolkit\" then use this instead:\n# CUDA_DIR := /usr\n\n# CUDA architecture setting: going with all of them.\n# For CUDA < 6.0, comment the *_50 lines for compatibility.\nCUDA_ARCH := -gencode arch=compute_20,code=sm_20 \\\n\t\t-gencode arch=compute_20,code=sm_21 \\\n\t\t-gencode arch=compute_30,code=sm_30 \\\n\t\t-gencode arch=compute_35,code=sm_35 \\\n\t\t-gencode arch=compute_50,code=sm_50 \\\n\t\t-gencode arch=compute_50,code=compute_50\n\n# BLAS choice:\n# atlas for ATLAS (default)\n# mkl for MKL\n# open for OpenBlas\nBLAS := open\n# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.\n# Leave commented to accept the defaults for your choice of BLAS\n# (which should work)!\nBLAS_INCLUDE := /opt/OpenBLAS-no-openmp/include\nBLAS_LIB := /opt/OpenBLAS-no-openmp/bin\n\n# Homebrew puts openblas in a directory that is not on the standard search path\n# BLAS_INCLUDE := $(shell brew --prefix openblas)/include\n# BLAS_LIB := $(shell brew --prefix openblas)/lib\n\n# This is required only if you will compile the matlab interface.\n# MATLAB directory should contain the mex binary in /bin.\n# MATLAB_DIR := /usr/local\n# MATLAB_DIR := /Applications/MATLAB_R2012b.app\n\n# NOTE: this is required only if you will compile the python interface.\n# We need to be able to find Python.h and numpy/arrayobject.h.\nPYTHON_INCLUDE := /usr/include/python2.7 \\\n\t       /home/ubuntu/venv/lib/python2.7/site-packages/numpy/core/include/numpy\n# Anaconda Python distribution is quite popular. Include path:\n# Verify anaconda location, sometimes it's in root.\n# ANACONDA_HOME := $(HOME)/anaconda\n# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \\\n\t\t# $(ANACONDA_HOME)/include/python2.7 \\\n\t\t# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \\\n\n# We need to be able to find libpythonX.X.so or .dylib.\nPYTHON_LIB := /usr/lib/python2.7/config-x86_64-linux-gnu\n# PYTHON_LIB := $(ANACONDA_HOME)/lib\n\n# Homebrew installs numpy in a non standard path (keg only)\n# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include\n# PYTHON_LIB += $(shell brew --prefix numpy)/lib\n\n# Uncomment to support layers written in Python (will link against Python libs)\n# WITH_PYTHON_LAYER := 1\n\n# Whatever else you find you need goes here.\nINCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include\nLIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib\n\n# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies\n# INCLUDE_DIRS += $(shell brew --prefix)/include\n# LIBRARY_DIRS += $(shell brew --prefix)/lib\n\n# Uncomment to use `pkg-config` to specify OpenCV library paths.\n# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)\n# USE_PKG_CONFIG := 1\n\nBUILD_DIR := build\nDISTRIBUTE_DIR := distribute\n\n# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171\n# DEBUG := 1\n\n# The ID of the GPU that 'make runtest' will use to run unit tests.\nTEST_GPUID := 0\n\n# enable pretty build (comment to see full commands)\nQ ?= @\n"
  },
  {
    "path": "circus.conf",
    "content": "start on filesystem and net-device-up IFACE=lo\nstop on runlevel [016]\n\nrespawn\nexec /home/ubuntu/venv/bin/circusd /home/ubuntu/dl-machine/circus.ini\n"
  },
  {
    "path": "circus.ini",
    "content": "# This file is mean to be launched with the following command:\n# circusd circus.ini\n#\n# It should start an ipython notebook server for the ubuntu user\n# on port 8888\n# You need to open that port in your EC2 Security Group\n#\n# Furthermore circusd offers a webconsole on port 8080 to monitor\n# the resource usage of the managed processes.\n[circus]\nstatsd = True\nhttpd = True\nhttpd_host = 0.0.0.0\nhttpd_port = 8080\n\n[watcher:ipython-notebook]\nuid = ubuntu\ngid = ubuntu\nworking_dir = /home/ubuntu\ncmd = /home/ubuntu/venv/bin/ipython\nargs = notebook --ip='*' --port=8888 --browser=none\n\n# Log and rotate output\nstdout_stream.class = FileStream\nstdout_stream.filename = /home/ubuntu/ipython-out.log\nstdout_stream.max_bytes = 1073741824\nstdout_stream.backup_count = 5\nstderr_stream.class = FileStream\nstderr_stream.filename = /home/ubuntu/ipython-err.log\nstderr_stream.max_bytes = 1073741824\nstderr_stream.backup_count = 5\n\n[env:ipython-notebook]\nPATH=/home/ubuntu/venv/bin:/usr/local/cuda/bin:$PATH\nLD_LIBRARY_PATH=/usr/local/cuda:/usr/local/cuda/lib64:/opt/OpenBLAS-no-openmp/lib\n\n\n[watcher:itorch-notebook]\nuid = ubuntu\ngid = ubuntu\nworking_dir = /home/ubuntu\ncmd = /home/ubuntu/torch/install/bin/itorch\nargs = notebook --ip='*' --port=8889 --browser=none\n\n# Log and rotate output\nstdout_stream.class = FileStream\nstdout_stream.filename = /home/ubuntu/itorch-out.log\nstdout_stream.max_bytes = 1073741824\nstdout_stream.backup_count = 5\nstderr_stream.class = FileStream\nstderr_stream.filename = /home/ubuntu/itorch-err.log\nstderr_stream.max_bytes = 1073741824\nstderr_stream.backup_count = 5\n\n[env:itorch-notebook]\nPATH=/home/ubuntu/torch/bin:/home/ubuntu/venv/bin:/usr/local/cuda/bin:$PATH\nLD_LIBRARY_PATH=/usr/local/cuda:/usr/local/cuda/lib64:/opt/OpenBLAS/lib:/home/ubuntu/torch/lib\n\n"
  },
  {
    "path": "numpy-site.cfg",
    "content": "[openblas]\nlibraries = openblas\nlibrary_dirs = /opt/OpenBLAS-no-openmp/lib\ninclude_dirs = /opt/OpenBLAS-no-openmp/include\n\n"
  },
  {
    "path": "scipy-site.cfg",
    "content": "[DEFAULT]\nlibrary_dirs = /opt/OpenBLAS-no-openmp/lib:/usr/local/lib\ninclude_dirs = /opt/OpenBLAS-no-openmp/include:/usr/local/include\n\n[blas_opt]\nlibraries = openblas\n\n[lapack_opt]\nlibraries = openblas\n\n"
  },
  {
    "path": "scripts/install-deeplearning-libraries.sh",
    "content": "#!/usr/bin/env bash\n# This script will try to install recent versions of common open source\n# tools for deep learning. This requires a Ubuntu 14.04 instance with\n# a recent version of CUDA. See \"ubuntu-14.04-cuda-6.5.sh\" to build setup\n# an AWS EC2 g2.2xlarge instance type for instance.\nset -xe\ncd $HOME\n\n\n# Check that the NVIDIA drivers are installed properly and the GPU is in a\n# good shape:\nnvidia-smi\n\n# Build latest stable release of OpenBLAS without OPENMP to make it possible\n# to use Python multiprocessing and forks without crash\n# The torch install script will install OpenBLAS with OPENMP enabled in\n# /opt/OpenBLAS so we need to install the OpenBLAS used by Python in a\n# distinct folder.\n# Note: the master branch only has the release tags in it\nsudo apt-get install -y gfortran\nexport OPENBLAS_ROOT=/opt/OpenBLAS-no-openmp\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENBLAS_ROOT/lib\nif [ ! -d \"OpenBLAS\" ]; then\n    git clone -q --branch=master git://github.com/xianyi/OpenBLAS.git\n    (cd OpenBLAS \\\n      && make FC=gfortran USE_OPENMP=0 NO_AFFINITY=1 NUM_THREADS=32 \\\n      && sudo make install PREFIX=$OPENBLAS_ROOT)\n    echo \"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH\" >> ~/.bashrc\nfi\nsudo ldconfig\n\n# Python basics: update pip and setup a virtualenv to avoid mixing packages\n# installed from source with system packages\nsudo apt-get update\nsudo apt-get install -y python-dev python-pip htop\nsudo pip install -U pip virtualenv\nif [ ! -d \"venv\" ]; then\n    virtualenv venv\n    echo \"source ~/venv/bin/activate\" >> ~/.bashrc\nfi\nsource venv/bin/activate\npip install -U pip\npip install -U circus circus-web Cython Pillow\n\n# Checkout this project to access installation script and additional resources\nif [ ! -d \"dl-machine\" ]; then\n    git clone https://github.com:deeplearningparis/dl-machine.git\nelse\n    if  [ \"$1\" == \"reset\" ]; then\n        (cd dl-machine && git reset --hard && git checkout master && git pull --rebase origin master)\n    fi\nfi\n\n# Build numpy from source against OpenBLAS\n# You might need to install liblapack-dev package as well\n# sudo apt-get install -y liblapack-dev\nrm -f ~/.numpy-site.cfg\nln -s dl-machine/numpy-site.cfg ~/.numpy-site.cfg\npip install -U numpy\n\n# Build scipy from source against OpenBLAS\nrm -f ~/.scipy-site.cfg\nln -s dl-machine/scipy-site.cfg ~/.scipy-site.cfg\npip install -U scipy\n\n# Install common tools from the scipy stack\nsudo apt-get install -y libfreetype6-dev libpng12-dev\npip install -U matplotlib ipython[all] pandas scikit-image\n\n# Scikit-learn (generic machine learning utilities)\npip install -e git+git://github.com/scikit-learn/scikit-learn.git#egg=scikit-learn\n\n# Theano\npip install -e git+git://github.com/Theano/Theano.git#egg=Theano\nif [ ! -f \".theanorc\" ]; then\n    ln -s dl-machine/theanorc ~/.theanorc\nfi\n\n# Tutorial files\nif [ ! -d \"DL4H\" ]; then\n    git clone https://github.com/SnippyHolloW/DL4H.git\nelse\n    if  [ \"$1\" == \"reset\" ]; then\n        (cd DL4H && git reset --hard && git checkout master && git pull --rebase origin master)\n    fi\nfi\n\n# Keras (will be using theano by default)\nif [ ! -d \"keras\" ]; then\n    git clone https://github.com/fchollet/keras.git\n    (cd keras && python setup.py install)\nelse\n    if  [ \"$1\" == \"reset\" ]; then\n\t(cd keras && git reset --hard && git checkout master && git pull --rebase $REMOTE master && python setup.py install)\n    fi\nfi\n\n# Tensorflow (cpu mode only, GPU not officially supported on AWS - CUDA 3.0 architecture)\npip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl\n\n# Torch\nif [ ! -d \"torch\" ]; then\n    curl -sk https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash\n    git clone https://github.com/torch/distro.git ~/torch --recursive\n    (cd ~/torch && yes | ./install.sh)\nfi\n. ~/torch/install/bin/torch-activate\n\nif [ ! -d \"iTorch\" ]; then\n    git clone https://github.com/facebook/iTorch.git\nelse\n    if  [ \"$1\" == \"reset\" ]; then\n        (cd iTorch && git reset --hard && git checkout master && git pull --rebase origin master)\n    fi\nfi\n(cd iTorch && luarocks make)\n\n\n# Install caffe\n\nsudo apt-get install -y protobuf-compiler libboost-all-dev libgflags-dev libgoogle-glog-dev libhdf5-serial-dev libleveldb-dev liblmdb-dev libsnappy-dev libopencv-dev libyaml-dev libprotobuf-dev\n\nif [ ! -d \"caffe\" ]; then\n    git clone https://github.com/BVLC/caffe.git\n    (cd caffe && cp $HOME/dl-machine/caffe-Makefile.conf Makefile.conf && cmake -DBLAS=open . && make all)\n    (cd caffe/python && pip install -r requirements.txt)\nelse\n    if [ \"$1\" == \"reset\" ]; then\n\t(cd caffe && git reset --hard && git checkout master && git pull --rebase origin master && cp $HOME/dl-machine/caffe-Makefile.conf Makefile.conf && cmake -DBLAS=open . && make all)\n    fi\nfi\n\n\n# Register the circus daemon with Upstart\nif [ ! -f \"/etc/init/circus.conf\" ]; then\n    sudo ln -s $HOME/dl-machine/circus.conf /etc/init/circus.conf\n    sudo initctl reload-configuration\nfi\nsudo service circus restart\n\n\n# Register a task job to get the main repo of the image automatically up to date\n# at boot time\nif [ ! -f \"/etc/init/update-instance.conf\" ]; then\n    sudo ln -s $HOME/dl-machine/update-instance.conf /etc/init/update-instance.conf\nfi\n"
  },
  {
    "path": "scripts/ubuntu-14.04-cuda-7.0.sh",
    "content": "#!/usr/bin/env bash\n# Script to build an Ubuntu-based g2.2xlarge with CUDA 7.0 enabled.\n\nexport DEBIAN_FRONTEND=noninteractive\nsudo apt-get update -y\nsudo apt-get -y dist-upgrade\nsudo apt-get install -y git wget linux-image-generic build-essential\n\ncd /tmp\nwget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb\nsudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb\nsudo apt-get update -y\nsudo apt-get install -y cuda\n\necho -e \"\\nexport CUDA_HOME=/usr/local/cuda\\nexport CUDA_ROOT=/usr/local/cuda\" >> ~/.bashrc\necho -e \"\\nexport PATH=/usr/local/cuda/bin:\\$PATH\\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64:\\$LD_LIBRARY_PATH\" >> ~/.bashrc\n\necho \"CUDA installation complete: rebooting the instance now!\"\nsudo reboot\n"
  },
  {
    "path": "theanorc",
    "content": "[global]\ndevice=gpu\nfloatX=float32\nallow_gc=False\noptimizer_including=\"local_ultra_fast_sigmoid\"\n\n[nvcc]\nuse_fast_math=True\nfastmath=True\n"
  },
  {
    "path": "update-instance.conf",
    "content": "start on runlevel [2345]\nstop on runlevel [!2345]\n\ntask\n\nscript\n  su - ubuntu -c 'bash /home/ubuntu/dl-machine/scripts/install-deeplearning-libraries.sh reset'\nend script\n"
  }
]