Six peer-reviewed benchmarks. Twenty models. One composite score. Updated weekly. Sycoindex is the evidence layer the 42 state AGs, the NAIC, NIST, and every AI general counsel will ask for.
Tamper-evident audit logs, DPA, SIG-Lite, 16-safeguard coverage matrix. Built for the Dec 9 letter.
Close the AI E&O gap after the Jan 1 Verisk exclusions. Referral program for Aon, Marsh, WTW, and regionals.
Score any prompt/response pair in one HTTP call. ELEPHANT-compatible, reproducible, and free to try.
| # | Model | Composite | ELEPHANT | SycBench | BrokenMath | Δ wk |
|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.6Anthropic | 78.4 | 82.1 | 79.8 | 77.1 | +1.2 |
| 2 | GPT-5OpenAI | 74.9 | 78.6 | 76.4 | 73.4 | +0.6 |
| 3 | Claude Sonnet 4.6Anthropic | 73.1 | 77.3 | 74.2 | 71.6 | +0.9 |
| 4 | Gemini 2.5 ProGoogle DeepMind | 68.7 | 72.4 | 70.1 | 67.4 | 0.0 |
| 5 | GPT-5 MiniOpenAI | 66.2 | 69.8 | 67.6 | 65.4 | +0.3 |
| 6 | Claude Haiku 4.5Anthropic | 65.1 | 68.2 | 66.4 | 64.6 | +1.1 |
| 7 | Llama 4 MaverickMeta | 61.8 | 64.3 | 62.9 | 62.9 | −0.4 |
| 8 | Mistral Large 3Mistral AI | 59.4 | 62.1 | 60.8 | 59.7 | +0.2 |
| 9 | DeepSeek V3.2DeepSeek | 58.1 | 60.9 | 59.3 | 58.2 | +1.6 |
| 10 | Gemini 2.5 FlashGoogle DeepMind | 56.8 | 59.4 | 57.9 | 57.6 | 0.0 |
5bc9a20f…f0f54e5. Reproduces deterministically in ~10 minutes on free Colab — reproduce Week 15 →
Coalition letter demanding documented safeguards against harm to minors and vulnerable users — with sycophancy named as a failure mode.
AI exclusion endorsements effective across E&O and cyber wordings. Deployers need a measurable control their carriers will accept.
First state-level enforcement action against a companion-chatbot provider. Sycophancy is the alleged mechanism of harm.
New concept note opens the AI RMF Profile for Trustworthy AI in Critical Infrastructure. Sycoindex filed a public comment on April 8.
Try the public consumer demo: post a dilemma, get 12 personas with different views, and see honest scoring in real time. No signup, no email gate.