Arsa Nikzad
Posted 3 months ago DATATHON ORGANISERCompetition Evaluation Metric
I wanted to share a thought regarding the competition's evaluation metric and get your inputs.
Since the competition's metric resembles MAE and 1) we are dealing with sparse, zero-inflated data, and 2) we are not evaluating models at a higher level of the hierarchy, could we end up selecting models (as the best model) that systematically under-forecast for non-zero, high-volume items?
MAE tends to select models that best estimate the median of the data, and in zero-inflated datasets, the median is often zero (or close to it). While we do have a bias term in the evaluation metric to help balance things, I’m curious if it’s strong enough to prevent this issue.
Typically, the dollar value of those minor, high-volume items is as significant, if not greater, than the majority of low-volume items, making under-forecasting them particularly costly.
Also, you can improve the forecast probably if you don't select the full horizon for all the products. If there are new product introductions you don't need to send the full period of time to your forecasting model because it will under forecast.
Those were my 5 cents, at least that's how I structure the problem/solution. My score says my approach sucks though.