Jupyter Notebook on Amazon Linux

Jupyter Notebook is an app for data analysis. The idea is to combine documentation and the code! My wife uses it for her data science courses from Coursera. Once she complained that some tasks took whole night to complete on her laptop. Her Sony Vaio is pretty powerful, but definitely not a mainframe. When I noticed that Notebook is actually a web application I immediately suggested to run it in Amazon AWS! This is a short instruction how to setup Jupyter Notebook there.

First you have to provision EC2 instance with Amazon Linux. I recommend so called “compute-optimized” instance types (cX) as they provide max CPU power. Amazon Linux already comes with Python 2.7.12 which is enough for Jupyter. Installing Jupyter is pretty simple:

sudo pip install jupyter

Then you need to start it. Here is what I do:

ssh -i <rsa-key> ec2-user@<ec2-machine-public-dns>
screen
jupyter notebook --no-browser

First I login to the EC2 instance. Then I start screen session so I can easily logout/disconnect and let jupyter run in background. Third line is launching Jupyter Notebook. Note “no-browser” that’s because by default Notebook would try launching browser and we don’t want that. Jupyter will print out login URL similar to http://localhost:8888/?token=a917d6207a4726774e2fd4d6053d12e24b0326628e2d7350. Copy it to you clipboard.

Next step is to create an SSH tunnel to access our Jupyter instance:

ssh -i <rsa-key> -fNL 8888:localhost:8888 ec2-user@<ec2-machine-public-dns>

Now you can open you browser and pasted saved URL:

The last thing you can do (if you want to try data science staff) is installing popular Python packages. But before that you need to install GCC and its prerequisites. In Amazon Linux (and Red Hat) it’s super easy:

sudo yum groupinstall "Development Tools"

Then you can install actual packages using pip:

sudo pip install numpy
sudo pip install pandas
sudo pip install xgboost
sudo pip install sklearn

And so on…

Chocolatey: package manager for Windows

We’re all used to package managers in Linux distros: Aptitude in Debian, yum in Read Hat, emerge in Gentoo. This list can be continued. I was very surprised when my colleague suggested to install Homebrew on Macbook to get some tools. I did that and I’m happy. Almost all my tools now get installed with brew install.

If I’m stuck with Windows on my home laptop I thought maybe there is a package manager for Windows? Yes! There is! It’s called Chocolatey! Looks like it’s based on Power Shell so the installation process is super easy. Plus reading there guide is  enough to make it work.

As of now I have installed bunch of my tools using choco install: JDK, Groovy, Gradle, Maven and Intellij IDEA! Isn’t that neat?