Model-based Testing Examples

Top AI models underperform in languages other than English

This illustrates a widespread problem affecting large language models (LLMs): even when an English-language version passes a safety test, it can still hallucinate dangerous misinformation in other ...

AI model enables personalized blood glucose predictions for type one diabetes

Type 1 diabetes (T1D) is an autoimmune condition in which the body's own immune system attacks insulin-producing cells. As a result, patients with T1D must closely monitor their blood glucose (BG) ...

13d

OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83%

GPT-5.4 is also more reliable, producing 18% fewer errors and 33% fewer false claims than GPT-5.2, according to OpenAI.

Communications of the ACM

Measuring What Matters in Large Language Model Performance

As large language models (LLMs) gain momentum worldwide, there’s a growing need for reliable ways to measure their performance. Benchmarks that evaluate LLM outputs allow developers to track ...

Science News

A precise proton measurement helps put a core theory of physics to the test

For over a decade, confusion over the size of the proton has held scientists back. Disagreeing measurements of the subatomic particle’s radius meant that scientists couldn’t test one of their key ...

Ars Technica

Waymo leverages Genie 3 to create a world model for self-driving cars

Google-spinoff Waymo is in the midst of expanding its self-driving car fleet into new regions. Waymo touts more than 200 million miles of driving that informs how the vehicles navigate roads, but the ...

CBS News

OpenAI says it will start testing ads on ChatGPT in the coming weeks

OpenAI announced Friday that it will begin testing ads on ChatGPT in the coming weeks, opening the door to another potential revenue stream for the AI company in addition to its subscription-based ...

ministryoftesting.com

The future of testing: Autonomous agents, ethical AI, and human oversight

The role of the tester has never been static! From the personal touch of verification to automated regressions, Quality Assurance (QA), and now Quality Engineering, software testing has evolved ...

Business Wire

U.S. Army Selects Striveworks for AI Test and Evaluation

AUSTIN, Texas--(BUSINESS WIRE)--Striveworks, a leading developer of cutting-edge artificial intelligence solutions, has been selected to provide AI test and evaluation services for the U.S. Army under ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results