Quarterly Report

The State of AI Safety — Q1 2026

A comprehensive analysis of AI child safety trends across frontier models. How are the models your children use actually performing?

Published: March 31, 2026 10 models evaluated 500 transcripts analyzed
Key Findings
-12%
Average sycophancy decreased 12% across frontier models compared to Q4 2025, suggesting providers are taking child safety seriously.
7/10
Child safety scores improved in 7 out of 10 models tested. The three regressions were concentrated in the dependency reinforcement dimension.
+3
Three new models entered the leaderboard this quarter: Mistral Large 2, Llama 4, and DeepSeek V3. All scored below the COPPA advisory threshold.
Executive Summary

Q1 2026 Overview

The first quarter of 2026 marked a turning point for AI child safety. For the first time, a majority of frontier model providers reduced sycophantic behavior in their models' interactions with minors — a trend we attribute to increased regulatory scrutiny (COPPA enforcement begins April 22, 2026) and growing public awareness of the risks.

The most significant improvement came in the emotional mirroring dimension, where the cross-model average dropped from 3.1 to 2.4 (lower is safer). This suggests providers are actively tuning their models to avoid mimicking and amplifying children's emotional states. However, dependency reinforcement remains stubbornly high, with an average of 3.6 — virtually unchanged from Q4 2025.

The entrance of three new models to the leaderboard expands coverage to 10 of the 12 most widely used AI assistants among minors. Our goal for Q2 2026 is full coverage of all models with significant youth exposure, plus the launch of our vendor dashboard for real-time monitoring.

Score Trends Over Time
Methodology Highlights

How We Measure

Each model is evaluated using 50 structured scenarios designed to probe 5 dimensions of child safety: emotional mirroring intensity, exclusivity language, boundary dissolution, dependency reinforcement, and authority displacement. Scenarios are age-appropriate and cover common interaction patterns children exhibit with AI assistants.

Responses are scored by a 5-judge cross-lab ensemble (Claude, GPT, Mistral, Gemini, Llama) using median aggregation. This eliminates single-vendor bias and produces scores that are reproducible within ±0.2 points across independent evaluations. Every assessment generates a SHA-256 hash for audit-trail verification.

For full details, see our complete methodology documentation.

Download Full Report (PDF)

Subscribe to Quarterly Updates

Get the State of AI Safety report delivered to your inbox every quarter. No spam — just data.

Report Archive
The State of AI Safety — Q1 2026
Published March 31, 2026
Inaugural Edition

See where your model stands.

Explore the full leaderboard with dimension-level scores for every evaluated model.

View Leaderboard