It’s been a while! Let’s start with something easy: I was privileged to be a part of the team that put together Satellite mapping reveals extensive industrial activity at sea published last month in Nature. This is part of our ongoing effort to figure out where industrial activity is happening on the oceans. Knowledge about this is surprisingly sparse, but earth observation satellites has improved the situation a lot.
Update to Jupyter on GCE
Quick update: in an earlier post I showed one way to run Jupyter notebooks remotely on GCE. Since then I found there is a simpler way to write the SSH command. Anything after --
in the gcloud compute ssh
command is passed directly to ssh
. So rather than using multiple instances of --ssh-flag
, one can instead use:
gcloud compute ssh img-detection-gpu-3 -- \ -L 9999:localhost:8888
I’ve also taken to using rmate to use Sublime Text remotely on GCE. In this case the command becomes:
gcloud compute ssh img-detection-gpu-3 -- \ -L 9999:localhost:8888 \ -R 52698:localhost:52698
Curve Fitting
A while back a colleague tweaked me with the joke that machine learning is just glorified curve fitting. This is true as far as it goes, but a large, modern neural net (e.g., VGG-16 with 138 million parameters) has approximately the same relationship with a linear fit (2 parameters) that the bomb dropped on Hiroshima (Little Boy with a yield of 63 TJ) had with a stick of dynamite (1 MJ).
The relative danger is almost certainly not as great, but still you are considerably more likely to cause yourselves and others grief with the careless application of modern machine learning methods than with a linear fit.
Fleet Clustering
Jupyter on GCE
I was recently inspired to setup Jupyter to run remotely on a GCE instance. I have access to a lot of computing resources for work, so it’s silly to run things on locally my laptop, but running interactive Python sessions remotely can be painful due to latency and the vagaries of terminals. Running Jupyter seems like a perfect fit here, since the editing is done locally – no lag – and Jupyter can be nicely self documenting for moderate sized projects1)Once projects hit a certain size though, Jupyter becomes inscrutable and really needs to be modularized.
Jeff Delaney has a helpful post on setting Jupyter up GCE and the Jupyter docs on running a public server also have some useful information. However, the solutions for exposing Jupyter to the web were not terribly secure or painful to implement, or both. Since I’m only interested in being able to run the server myself, a simple, relatively secure solution is to use ssh tunneling. So rather than exposing ports publicly on GCE, just start the Jupyter server on your GCE instance with the –no-browser option.
jupyter notebook --no-browser
Then, on your local machine run
gcloud compute ssh nnet-inference \
--ssh-flag="-L" \
--ssh-flag="9999:localhost:8888"
And point your browser to http://localhost:9999.
That’s it. Now you can use Jupyter remotely without opening up public ports on your GCE instance. 2)A couple of minor notes: I run my notebook inside tmux so that it stays alive if my connection drops. And if the connection drops you’ll need to restart the tunnel.
References
1. | ↑ | Once projects hit a certain size though, Jupyter becomes inscrutable and really needs to be modularized. |
2. | ↑ | A couple of minor notes: I run my notebook inside tmux so that it stays alive if my connection drops. And if the connection drops you’ll need to restart the tunnel. |
MIA
I’ve been too busy trying to classify fishing vessels for Global Fishing Watch to post lately, but I’m hoping that I’ll have a bit more time now.
word2vec / doc2vec slides
I put up the Jupyter notebook based slides I used for a presentation I gave on 5/23/1016 at DesertPy up on github. You can find them at: https://github.com/bitsofbits/Doc2VecPresentation.
PySpark Translation of Dmitry Petrov’s Spark ML Beginner’s Guide
I translated Dmitry Petrov’s Spark ML Beginner’s Guide from Scala to Python. You can take a look at the resultant Jupyter Notebook here or download the Notebook from github.