Show HN: LLMadness – March Madness Model Evals
4.2(10,000 条评价)
免费增值· 起 US$20.00/月
2026年上线
Sobre
I wanted to play around with the non-coding agentic capabilities of the top LLMs so I built a model eval predicting the March Madness bracket.<p>After playing around a bit with the format, I went with the following setup:<p>- 63 single-game predictions v. full one-shot bracket<p>- Maxed out at 10 tool calls per game<p>- Upset-specific instruction in the system prompt<p>- Exponential scoring by round (1, 2, 4, 8, 16, 32)<p>There were some interesting learnings:<p>- Unsurprisingly, most brackets a
优点
- +Capacidade de fazer previsões de March Madness
- +Modelo de avaliação de LLMs
- +Opção de fazer previsões de jogos individuais ou um-shot bracket
缺点
- −Limitações no número de tool calls por jogo
- −Necessidade de instruções específicas para upsets