Using r/WallStreetBets data for Numerai Signals submission.
I stumbled upon Arjun Rohlfing-Das ‘s excellent post on Sentiment Analysis for Trading with Reddit Text Data that uses r/wallstreetbets data for sentiment analysis which seems to be holding predictive power.
This is a ‘Run All’ notebook. Once you have setup the PRAW credentials, all you have to do is, just click Run all from colab and it will grenerate a .csv …
Treating Punctuation restoration as translation with Transformers.
The transcript we get in ASR is often not punctuated and to use it in other tasks, we need a punctuated text. There are many approaches for this but I wanted to explore seq2seq Transformers with this and possibly for multi-lingual application too.
Can you make unique and equally good predictions?
If you think Numerai’s main tournament is hard, then you might want to take a look at Signals! It’s more ambitious, and of course, harder! Signals provide a platform to evaluate your financial models and earn some NMR cryptocurrency too!
“Beating the wisdom of the crowds is harder than recognizing faces or driving cars” — Marcos López de Prado
If you are new to Numerai main tournament, this might help.
From the tournament perspective, the main difference between these two…
Update — DEC 19, 2020: The notebook has been updated according to the new target “Nomi”. TARGET_NAME is now only “target” instead of “target_kazutsugi” .
Note: This isn't a 'Run all' and submit notebook. I have tried to make this flexible so feel free to experiment and customize according to your style and workflow.
This post on Model Diagnostics. It also has links to community-written posts on the metrics.
Also, check out A guide to “The hardest data science tournament on the planet” if you want to get started with submitting your predictions for the tournament.
Now, having already submitted…
Update — DEC 01, 2020: The notebook has been updated according to the new target “Nomi”. TARGET_NAME is now “target” instead of “target_kazutsugi”
💡 The Numerai tournament problem
The Numerai data science problem is like a typical supervised machine learning problem, where the data has several input features and corresponding labels (or targets). And our goal is to learn a mapping from input to targets using various techniques. We usually split data into training and validation parts…
Classifying digits by training a model on MNIST dataset is really a fun thing to do with the frameworks available and putting it to production would be great.
We know that neural networks can be seen as ‘Universal Function Estimators’ , means we can map them to their correct label. This is called Supervised learning approach.
What if we don’t have labels ? We are left with images only? What can we do with them? Now this is getting interesting. We can train a network to improve resolution of an image, De-noise them, even Generate new samples. …
I have been using Python for creating and training my Machine Learning Models which requires setting up quiet a few things(I mostly use Google Colab though). Currently, I am learning Machine Learning and web development along side Android App development.
If you are also into Deep Learning then you must have done Basic Linear regression and the MNIST classification challenge which is the basic problem in Computer Vision. So when I learned about TensorFlow Lite it inspired me to make an app which can utilize the features of Android Smartphone, so I created this basic MNIST handwritten digits classification App.
It is one of the simplest Algorithm to get started. The goal here is to fit a straight line between two or more variables. Where one is independent and another is dependent (i.e, Y=f(X), means that the value of Y changes according to X but we can take any X in the range given.). Thus, we try to find a relationship (definition of f(X)) between Y and X. Which is in the form of a line Y = m*X + b.
We try to predict real values at…
According to Wikipedia,
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed
Gradient Descent is the most popular optimization strategy, used machine learning and deep learning right now. It can be combined with almost every algorithm yet it is easy to understand. So, everyone planning to go on the journey of machine learning should understand this.
Gradient descent is used to find local minima of a given function. So, It is a convex function based optimization algorithm.
It is simply used to find the values of the parameters at which the given function reaches its nearest minimum cost.
We start by defining initial parameters and then with the derivation of the function…