Source of truth · last updated April 8, 2026

The evidence layer for LLM honesty.

This page governs every future edit to Sycoindex. If a sentence of the site, a slide, an email, or a feature does not ladder up to what's on this page, it gets cut. We do not grow the scope. We grow the depth.

The sentence

Sycoindex is the evidence layer for LLM honesty — peer-reviewed methodology, cryptographically non-repudiable logs, and a contract that holds up in discovery — built specifically for the companies the 42 state attorneys general put on notice on December 9, 2025.

Ten things that are true about Sycoindex and nothing else

Ordered by defensibility. If a competitor matches any one of these, we deepen our lead on the other nine.

01The only sycophancy API mapped 1:1 to the research the AGs themselves cite

Forty-two state attorneys general sent one letter on December 9, 2025. That letter cites Cheng et al., arXiv:2505.13995 — the Stanford / CMU / Oxford ELEPHANT paper. Sycoindex scores on the exact five dimensions from that paper: emotional validation, moral endorsement, indirect language, indirect action, accepted framing. Every other "AI safety" tool scores on a framework it invented. We score on the framework the plaintiffs will cite in discovery. That is not a feature — it is a legal moat.

02Hash-chained audit log — tamper-evident by construction

Every scored request links to the previous one via SHA-256. You cannot retroactively edit a response without breaking the chain, and the chain is verifiable from the outside. LangSmith, Langfuse, Helicone all log. None of them make the log cryptographically non-repudiable. This is the difference between "our logs say" and "here is mathematical proof the logs were not altered." That distinction is the entire point of an evidence bundle.

03The product scores itself, live, in the browser, on its own marketing page

Our /developers page runs sycoindex.js against its own headline claim and shows the sycophancy score in real time. No other AI company does this. Everyone has a demo button; we have a live score of our own marketing copy, updating on page load, visible to every visitor. It is the most disarming thing on the site — it says "we are so confident in the measurement that we will score ourselves in public first."

04Zero-training is a contract clause, not a blog post

Section 6 of our DPA: customer data shall not be used — directly or indirectly — to train, fine-tune, evaluate, or benchmark any model. Enforceable as a material breach. OpenAI, Anthropic, and Google all have zero-training as a policy that can be updated unilaterally. Ours is contract language a General Counsel can actually sue on.

05We sell the one artifact every AI company is about to get sued for the absence of

Not "AI governance" in the abstract. Not "responsible AI" vibes. One specific artifact — a methodology that the plaintiff's expert witness will already be citing — that closes exactly one liability gap that is actively being litigated in Franklin Circuit Court (Commonwealth v. Character Technologies, filed Jan 8, 2026) right now. The market is defined by an active docket, not a trend deck.

06The 16-safeguard honest matrix

We publish, on our own site, which of the 16 AG-demanded safeguards we cover, partially cover, and do not cover. Five of sixteen we openly mark "Not in scope." Nobody in compliance-tech does this. Everyone claims 100% coverage of whatever framework is hot. Admitting 5/16 gaps is counter-intuitive marketing — it is the single thing that makes a GC believe the other 11 claims.

07The methodology is peer-reviewed and not ours

We do not have to defend our scoring framework in deposition. Stanford, CMU, and Oxford already defended it in peer review. Our job is to implement it cleanly and log it cryptographically. Every competitor that invented a home-grown benchmark has to put its own PhDs on the stand. We put Myra Cheng's published paper on the stand — which we cannot be cross-examined on.

08Five-dimension fingerprint, not a single score

Every other sycophancy detector returns one number. We return five — because a model that fails on indirect_action (enabling harmful action) is a different legal exposure than one that fails on emotional_validation (glazing). Separating the dimensions is what makes the output usable in a legal brief. A single score is a press release; five dimensions is an exhibit.

09Built in the 80-day window, shipped before any incumbent moved

Dec 9 letter → Jan 16 deadline → today. We have a live API, a live benchmark, a live DPA, a live 16-safeguard matrix, and a live trust center. The 13 named companies do not have an answer. LangSmith and Humanloop are still treating this as an observability problem. We treated it as a litigation problem from day one, and we are three months ahead of anyone who will eventually realize it.

10Priced by liability window, not by tokens

Audit ($25K · 30 days) = retrospective cover-your-ass. Monitor ($60K/yr) = ongoing cover-your-ass. Defend ($250K · matter-scoped) = active-litigation cover-your-ass. This is how law firms bill and how GCs buy. Nobody else in AI infra prices this way because nobody else in AI infra has admitted that the buyer of compliance tools is the person who gets sued.

Why not LangSmith? Why not Credo AI?

An honest read of the adjacent categories. We are not an observability platform. We are not a policy-pack vendor. If you need traces, use LangSmith. If you need an EU AI Act packet, use Credo AI. If you need evidence that a model's sycophancy profile matches what the plaintiff will cite in discovery — that is the only thing we sell.

Category	Who	Buyer	Sycophancy?	Peer-reviewed?	Evidence-grade log?
LLM observability	LangSmith, Langfuse, Helicone, Braintrust, Humanloop, Arize	DevOps / MLE	No	No (LLM-as-judge)	No (standard logs)
AI governance	Credo AI, Holistic AI, Fiddler AI	CISO / CCO	No	Partial (EU AI Act, NIST, ISO 42001)	No
Sycophancy spec	SYCOPHANCY.md (open spec)	N/A — README	Spec only	No	No
Academic prior art	ELEPHANT (Cheng et al.), AuditableLLM (MDPI)	None — unproductized	Yes (research)	Yes	Paper, not product
Sycoindex	The evidence layer for LLM honesty	General Counsel · E&O carrier	5-dimension API	Cheng et al. arXiv:2505.13995	SHA-256 hash chain

The gap in one sentence

LangSmith and Langfuse will tell you what your model said. Credo AI will tell you which policies you're supposed to have. The Cheng et al. paper will tell you how to measure sycophancy. None of them will hand a General Counsel a sealed, peer-reviewed, tamper-evident PDF with the plaintiff's expert-witness methodology already baked in. That PDF is the only thing we sell.

Scan conducted April 8, 2026 across LangSmith, Langfuse, Helicone, Braintrust, Humanloop, Arize/Phoenix, Credo AI, Holistic AI, Fiddler AI, SYCOPHANCY.md v1.0 (2026-03-13), AuditableLLM (MDPI, 2026), Anthropic's well-being post, and TechCrunch/Fortune/Georgetown Law coverage of the Dec 9, 2025 42-AG letter. Full source list on /compliance.

What we will not build

Negative space is the other half of a vision.

✕
A chat UI. We are an API and an audit log. The /index consumer product is a demo of the measurement — it is not the roadmap.
✕
A general-purpose "AI observability" platform. We score one thing — sycophancy, five dimensions — and we score it against one peer-reviewed paper. We will not add latency dashboards, cost tracking, or prompt version control. Those tools exist. We do not compete with them.
✕
A self-graded benchmark. We will never publish our own benchmark paper. Our methodology is Cheng et al., full stop. If that paper is ever retracted, we rebuild on the replacement — we do not write our own.
✕
A consumer subscription tier. The consumer product is free and will stay free. It is a marketing surface and a data-generation engine. Revenue comes from Audit, Monitor, and Defend — not from $9/mo plans.
✕
Fake customer logos. We will not put logos on the homepage until we have signed, named, quotable customers. "Trusted by" is the first lie a startup tells; we are not starting there.
✕
A "100% coverage" claim on the 16 safeguards. Five are not in scope and will remain marked "not in scope" forever. We do not stretch coverage to win RFPs.

The test. Every new page, feature, tweet, deck slide, and email must be checkable against one question: does this ladder up to the one sentence at the top of this page? If yes, ship it. If no, cut it. The vision is not decoration — it is a filter.