The minimum contribution is $10 usd and you can contribute up to $300 usd. The funds raised will go to the winners. There will always be transparency throughout the whole process, both to know who passed to the next round, and how they did it (with the Release dataset), as well as in the final, since the winning models will be made public. In fact, these models will be shared with those who participated, which is very valuable in itself, because if you are not among the winners, you will still have a chance to learn a great deal from them and their winning models!
At the end of the tournament we will share with you the winning Machine Learning models in a Notebook format.This for educational purposes, as you will be able to study them and learn from the best! You will also learn about the process of participating in a tournament, do your best to advance to the next stage, and enjoy an awesome adrenaline rush once you are competing in the playoffs!
We try to make the data science problems fun, educational and valuable, but above all, we try to make them simulate real life situations. That is why we have decided to do them in different stages, so we can simulate new observations, and re-train the models based on those new observations. For now, we will be launching tournaments with classical Machine Learning (tabular data), but we hope to soon make them more complex with deep learning problems.
They have intermediate complexity, since they are tabular problems. That is our goal at least for the first few tournaments. Once we have run several of them successfully, we will make the move to tournaments with more complex problems that can be solved with deep learning tools.
We constantly have tournaments open to join and support. We try to have at least one active tournament per month. Of course, for more data science and ML practice, as well as more opportunities to earn cash, we encourage you to join our regular, sponsored competitions.
Each tournament will last between 1-1.5 months. The regular season lasts between 2 and 3 weeks. The playoffs, which are from the quarterfinals, semifinals and final, are held in a span of one week each.
Running in stages has several purposes and benefits. The first one is to simulate the input of new real data as time goes by. This new data would be to evaluate the generalizations of the completed models and in turn, will serve to re-train the models for the next phase. Another benefit is to get a model that generalizes particularly well for all stages, so it will be tested in different scenarios, and will eliminate the overfitting that we try so hard to avoid in real life. On the other hand, there is the thrill of being able to advance to higher levels, and go head-to-head with other data scientists. Enjoy the adrenaline rush!
During this period we will be promoting the participation and funding of the tournament. We will also have a pool prize goal, which could range from USD$ 500 to USD$ 5,000 or more, according to the demand. In order to run a tournament we must have a minimum of 8 participants. This is in order to successfully run the playoff stages and to be able to match each of them correctly. The amount raised could be larger or smallerthan the prize pool goal. The important thing is to reach a minimum of 8 participants to be able to run the tournament. If the target number of participants is not reached by the given deadline, we could extend the starting date of the tournament. In the case that we do not reach the 8 participants, after an acceptable period of time, the money raised will be returned in full to those who supported it.
In the regular season stage of the tournament, you will be playing against all the tournament participants. You will be able to track the action through a public leaderboard, whose order is given in ascending or descending order (according to the evaluation metrics), which will clearly indicate who would be advancing to the next round (the top 8). Your goal in this stage is to wind up in one of the top 8 positions. During this stage it will still be possible to accept new competitors and add money to the prize pool. This is possible because we are still competing against each other. So, we encourage you to invite and challenge your friends and colleagues to participate and win!
This is where the real excitement begins! The system will randomly assign pairs of competitors, in 4 groups of 2 people. At this point you start competing directly with your opponent, not with everyone else. This allows you to increase your adrenaline every time you submit a new solution, as you want to prove that you are the best in that bracket! A new dataset called QuarterfinalsTest.csv will be immediately released, with the samples you must now predict in the new round. This file will contain the original data plus the true labels from the regular season. This is where we "reset" the competition, as you must re-train your model with more data, and make predictions on the newly expanded data set, all associated with the same data science problem. Our system will highlight the best score between each two competitors in yellow. Here you must take into account the timing, as the quarter finals will only last 1 week! At this point, no new participants or new amounts of money will be admitted, and we will announce the final prize pool to be distributed among the winners.
When the quarter finals are over, the system will automatically recognize the best of each bracket, and will randomly assign a new pair of competitors for this new round! You will continue to compete head-to-head with your new opponent. Get ready for your excitement level to step up every time you submit a new solution, as you want to prove that you are the best in that bracket! A new dataset called SemifinalsTest.csv will be immediately released, with the samples you must now predict. This file will contain the original data plus the true labels of regular season and quarterfinals. Here we "reset" the competition again, as you must re-train your model with more data, and make predictions on the newly expanded data set - once again, all focused on the same data science problem. Our system will highlight the best score between the two. Here you must take into account the timing, as the semi-finals will only last 1 week!
The finals is where the top two competitors will go head-to-head! Get ready for an adrenaline-fueled knock-down, drag-out data science action like you’ve never seen before! Our two finalists will compete for a week, which will be divided into two parts: The Final and the Final Shot. The difference is that for the Final you will have the opportunity to re-train the model on the final datasets and send multiple solutions to the system to get the scores. But in the Final Shot, you will only have ONE opportunity to submit your final solution. This guarantees that the model has correctly generated the predictions, and takes into account all the previous stages. In this last submission you will have to send the Notebook through the same submission form, and it will be used to validate the results and are the notebooks that will be delivered to all the competitors who participated. At the end you will stand on the (virtual) Podium and bask in the glory as the best!
At the end of each stage we will release the true values for these datasets. This allows each competitor to evaluate their own model and check that our scores were correct. But the most important thing will happen at the end of the competition, where the winning models, in Notebook format, will be shared with ALL the competitors, thus maintaining transparency and ensuring the good practices and ethics of the winners. Of course, this also offers the benefit of a great learning opportunity, by allowing all the competitors to study what the winning players have done.
The IP on the winning models is intended for purely academic and educational purposes, not for commercial use. This is in order to encourage participation, as well as to facilitate learning in real life scenarios. If a commercial application is of interest, please contact our team to discuss our options, including a regular data science competition that we can conduct on our platform.
The prize pool will be distributed as follows: 50% to the winner, 30% to the second place and 20% to the third place.
The third place would be the best score from the semifinals, of those who did not advance to the grand final.
In order to support the platform and the promotion of the platform to keep bringing more participants, we collect a portion of the total amount raised in the competition. Our fee is 20% of the total collected.
Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!