Programação Educação Design & UX

Show HN: LLMadness – March Madness Model Evals

4.2(10,000 yorum)

Freemium· US$'dan itibaren20.00/ay

2026 yılında başlatıldı

Show HN: LLMadness – March Madness Model Evals ziyaret et

Sobre

I wanted to play around with the non-coding agentic capabilities of the top LLMs so I built a model eval predicting the March Madness bracket.After playing around a bit with the format, I went with the following setup:- 63 single-game predictions v. full one-shot bracket- Maxed out at 10 tool calls per game- Upset-specific instruction in the system prompt- Exponential scoring by round (1, 2, 4, 8, 16, 32)There were some interesting learnings:- Unsurprisingly, most brackets a

Artılar

+Capacidade de fazer previsões de March Madness
+Modelo de avaliação de LLMs
+Opção de fazer previsões de jogos individuais ou um-shot bracket

Eksiler

−Limitações no número de tool calls por jogo
−Necessidade de instruções específicas para upsets

Show HN: LLMadness – March Madness Model Evals

Sobre

Artılar

Eksiler

Você também pode gostar

Cursor

Claude

ChatGPT