Siddhartha Venkatayogi

My team’s DataHack 2026 project! Made in 6 hours.


A skip prediction system for the music streaming platform Lyra that applies generative retrieval concepts to session modeling. Tracks are encoded as hierarchical semantic IDs via 3-level k-means clustering on audio features, then sessions are modeled as token sequences in a decoder-only transformer trained with causal language modeling. Ensembled with LightGBM for final predictions.


Achieved the best average performance rank (6th in skip prediction and 3rd in CLV score)!


Unfortunately I can’t make the Github repo public because it’s forked from the MLDS private parent repo (starter code containing baseline solutions and dataloading).

Feel free to message me if you’d like to see the code!