Automation — The Easiest Part of the Hardest Data Science Tournament

Suraj Parmar
3 min readJun 18, 2023

--

The easiest way to get started with the hardest data science tournament on the planet.

Just give me the code:

“You should really check this website, Numerai!” I suggest to almost all of my friends interested in data science and finance. Then I send them my Medium articles with easy-to-get-started Colab notebooks.

“I ran your easy Colab notebooks, went through the code base and the example scripts and trained a baseline model, also uploaded the predictions. but the automation process is going to take a while”, most of them reply.

For many, the hardest part about getting started with the hardest data science tournament on the planet, was not the data science but actually setting up the automation, especially in the daily tournaments where it’s very hard to run a Colab notebook everyday within a scheduled time.

The existing automation set up requires you to have some level of experience with cloud providers. However, the hardest thing in the hardest data science tournament should be data science and not the infrastructure set-up that takes away the fun part!

Model uploads to the rescue 🚀

Recently introduced in the Fireside chat, the model upload feature makes it super easy (and costs $0) to automate your daily submissions.

This was my reaction after uploading my first model in beta test. Within 20 minutes of being added to beta test channel!

The above notebook has multiple examples including LightGBM, Neural Networks, Ensemble of both, and neutralizing of the ensemble. LightGBM is the easiest one. I’d suggest uploading all 4 pickle files to different models for testing before you start staking on them.

With this new feature, you can easily

  1. Train a model in Colab.
  2. Save it as a cloudpickle binary.
  3. Upload to models page.

That’s it! 🎉

Specifications

Unlike previous automation solutions (compute and compute-lite) where you host your models on your own AWS account, this new feature provides us with 1 CPU, 4GB memory without any costs. The runtime limit is 10 minutes, which should be sufficient for most models for live predictions (~6k rows). So you can simply train a model wherever you’d like, upload and relax. No compute cost for inference. To upload your first model,

  1. You’ll need to enable beta features in the Settings on your profile.

2. Upload the saved file from the Colab to your model in submissions page.
https://numer.ai/submissions

Three automated models with compute-lite (top), Model upload (middle), compute-heavy (bottom). To illustrate the ease and cost of setting up.

Cost: 🔽 ($0)
Complexity: 🔽
Experiments: 🔼

Since now the compute is being taken care of, you can focus more on the modelling and perform quicker experiments without the bottleneck of deploying the model.

While this feature gives you free compute, you should also note that you are giving your pickled model too. So if you are concerned about uploading the model, you can use other automation solutions, compute-heavy and compute-lite that will be hosted on your own AWS instance and will upload predictions only.

In my beta testing experience, I was able to deploy all sorts of models including neutralization pretty easily. My friends are liking it, I hope you too!

What’s next:

1. Sign up for Numerai
2. Create new model slots
3. Run Colab Notebook above and save .pkl files
4. Upload the saved .pkl files. (https://numer.ai/submissions)

References:
1. Google Doc
2. Tournament overview
3. Numerai新機能Model Uploadsのご紹介
4. An Easy guide to the hardest tournament on the planet

DALLE2 Out painting

Thank you Natasha-Jade for reviews

--

--