Hi John,
Edtech developers are eager to build with AI, but they often lack reliable ways to know if what they're generating is actually instructionally sound.
That’s where our Evaluators for developers come in. We’re building tools that help edtech developers automatically assess the instructional quality of the AI-generated content they leverage in their products, grounded in research-backed educational rubrics and expert-designed scoring systems.
There’s a real opportunity to bring more transparency, consistency, and rigor to AI tools used in classrooms. And the pain point is clear — edtech developers have told us they don’t have the time, expertise, or infrastructure to fully build these solutions in-house. We’re giving them a shortcut to something better: evaluation systems rooted in learning science, built for the speed of AI.
Our goal isn’t just to evaluate AI content; it’s to do it right. That means partnering directly with experts in learning science and educators to define what “good” actually looks like for student learning.
Take our work on text complexity as an example. We’ve collaborated with the team at Student Achievement Partners, the authors of the SCASS rubric, which is widely regarded as the gold standard for evaluating how challenging a text is for students. Together, we’ve adapted the language rubric for an AI-first world, making it precise and machine-readable while staying true to its educational integrity.
But we don’t stop at rubrics; we also build with real classroom insight. We partner with literacy experts from The Achievement Network who apply these rubrics to AI-generated content, scoring hundreds of examples to create high-quality benchmark datasets. That expert-annotated data becomes the foundation for our evaluator models.
In short, every individual evaluator we build will be grounded in research, vetted by experts, and shaped by the real-world experience of educators who understand what students need to succeed.