My implementation of TIGER, from the 2023 paper Recommender Systems with Generative Retrieval (Rajput et al.), trained and evaluated on Amazon Beauty.
Github!
| Metric | Mine | Paper |
|---|---|---|
| Recall@5 | 0.0312 | 0.0454 |
| NDCG@5 | 0.0210 | 0.0321 |
| Recall@10 | 0.0486 | 0.0648 |
| NDCG@10 | 0.0265 | 0.0384 |
Invalid-ID rate @10 ≈ 0.0006. Best checkpoint at step 20K (val NDCG@10 = 0.0377).
NDCG (Normalized Discounted Cumulative Gain) measures ranking quality, rewarding the correct item appearing higher in the top-K list. Hits are discounted by their rank and normalized so a perfect ranking scores 1.0.
Future Work
- Improve hyperparameters or training behavior to match or improve on original paper’s reported metrics. Current test metrics ended up at ~70% of paper reported. Pretty sure that this is because of sub optimal RQVAE training.
- Add implementations for PLUM and STATIC
References
- Recommender Systems with Generative Retrieval — Rajput et al., NeurIPS 2023.
- Autoregressive Image Generation using Residual Quantization — Lee et al. (RQ-VAE).