Jupyter Notebook is an app for data analysis. The idea is to combine documentation and the code! My wife uses it for her data science courses from Coursera. Once she complained that some tasks took whole night to complete on her laptop. Her Sony Vaio is pretty powerful, but definitely not a mainframe. When I noticed that Notebook is actually a web application I immediately suggested to run it in Amazon AWS! This is a short instruction how to setup Jupyter Notebook there.
First you have to provision EC2 instance with Amazon Linux. I recommend so called “compute-optimized” instance types (cX) as they provide max CPU power. Amazon Linux already comes with Python 2.7.12 which is enough for Jupyter. Installing Jupyter is pretty simple:
sudo pip install jupyter
Then you need to start it. Here is what I do:
ssh -i <rsa-key> ec2-user@<ec2-machine-public-dns> screen jupyter notebook --no-browser
First I login to the EC2 instance. Then I start screen session so I can easily logout/disconnect and let jupyter run in background. Third line is launching Jupyter Notebook. Note “no-browser” that’s because by default Notebook would try launching browser and we don’t want that. Jupyter will print out login URL similar to http://localhost:8888/?token=a917d6207a4726774e2fd4d6053d12e24b0326628e2d7350. Copy it to you clipboard.
Next step is to create an SSH tunnel to access our Jupyter instance:
ssh -i <rsa-key> -fNL 8888:localhost:8888 ec2-user@<ec2-machine-public-dns>
Now you can open you browser and pasted saved URL:
The last thing you can do (if you want to try data science staff) is installing popular Python packages. But before that you need to install GCC and its prerequisites. In Amazon Linux (and Red Hat) it’s super easy:
sudo yum groupinstall "Development Tools"
Then you can install actual packages using pip:
sudo pip install numpy sudo pip install pandas sudo pip install xgboost sudo pip install sklearn
And so on…