VN1 Forecasting - Accuracy Challenge Phase 1
Share:
Finished
competition-bg

VN1 Forecasting - Accuracy Challenge Phase 1

Flieber, Syrup Tech, and SupChains Launch an AI-Driven Supply Chain Forecasting Competition

VN1 Forecasting - Accuracy Challenge
Machine Learning/AI
Enterprise
E-commerce/Retail
Total Prize 20,000
Scroll To Top

Arsa Nikzad

Posted 4 months ago DATATHON ORGANISER
EVALUATION LOSS

Competition Evaluation Metric

I wanted to share a thought regarding the competition's evaluation metric and get your inputs. 
Since the competition's metric resembles MAE and 1) we are dealing with sparse, zero-inflated data, and 2) we are not evaluating models at a higher level of the hierarchy, could we end up selecting models (as the best model) that systematically under-forecast for non-zero, high-volume items?

MAE tends to select models that best estimate the median of the data, and in zero-inflated datasets, the median is often zero (or close to it). While we do have a bias term in the evaluation metric to help balance things, I’m curious if it’s strong enough to prevent this issue.

Typically, the dollar value of those minor, high-volume items is as significant, if not greater, than the majority of low-volume items, making under-forecasting them particularly costly.


There might be different ways to tackle the under-forecasting part but you need to process Client-Warehouse-Product one by one (can't vectorize all combinations with the same model), like using a segmentation approach to select the best forecasting model based on the time series characteristics. You can also optimize different kpis for each Client-Warehouse-Product combination based on the same "segmentation".
Also, you can improve the forecast probably if you don't select the full horizon for all the products. If there are new product introductions you don't need to send the full period of time to your forecasting model because it will under forecast.
Those were my 5 cents, at least that's how I structure the problem/solution. My score says my approach sucks though. 
Interesting, thanks. 
Evaluating Client-Warehouse-Product individually would be ideal, However it might not be feasible when dealing with millions of items, especially when using a global forecasting model. With using global models we need a loss function that delivers reasonable accuracy (RMSE) and bias for both slow-moving and fast-moving items, while minimizing the need for manual post-processing adjustments.
What if you do something in the middle? Like do some clustering of the Client-Warehouse-Product based on the time-series characteristics (seasonality periods, stationary, amount of periods with zero demand, average demand, trend, etc) so you can classify them in let's say 5 groups, and then you just run 5 models (one each of course).
I'm afraid clustering time series is even more complicated than forecasting them :). You barely can extract any meaningful features out of sparse time series.
Indeed all metrics based on AE will incentivize underforecasting for skewed datasets (when the median is lower than the average demand) 

That’s why we also look at the bias. 

But even when looking at both we might still slightly incentivize for low forecasts. 

For the sake of discussion, which would be your perfect kpi?
I tend to favor using RMSSE (or even scale-dependent RMSE) along with BIAS as two simultaneous evaluation terms. Ideally, we would also evaluate at higher levels of the hierarchy and differentiate between slow-moving and fast-moving items.
I think the best kpi depends on the characteristics of each Client-Warehouse-Product combination time series, that's why forecasting is artisanal.
I like the wasserstein distance for daily demand forecasts that need to sum up fairly well and can tolerate some flexibility on when the demand occurs (normally stores for stocking don't care too much if demand was off by a day, but they would care if it missed the demand entirely).
Thanks for sharing, was not aware of this measure. 
Join our private community in Discord

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!