Skip to content

CES 2026 - OpenAI Makes the Most Accessible Code (for an LLM)

At CES 2026, the GAAD Foundation and ServiceNow unveiled the AI Model Accessibility Checker. OpenAI’s GPT-5.2 secured the top spots for accessible code generation, while Google’s Gemini 3.0 Pro ranked last, revealing a major gap in inclusive software development priorities.

Table of Contents

At CES 2026, the GAAD Foundation, in partnership with ServiceNow, unveiled the results of its inaugural AI Model Accessibility Checker (AAC), a new benchmark designed to evaluate the quality and inclusivity of code generated by large language models (LLMs). While OpenAI’s GPT-5.2 series secured the top rankings for producing accessible web code, Google’s Gemini 3.0 Pro finished last among the 36 models tested, highlighting a significant disparity in how major AI providers are prioritizing inclusive software development.

Key Points

  • Top Performer: OpenAI’s GPT-5.2 models claimed the top four spots in the AAC benchmark, demonstrating superior adherence to web accessibility standards.
  • Critical Failure: Despite Google owning the industry-standard Lighthouse testing tool, its Gemini 3.0 Pro model ranked 36th out of 36 models tested.
  • Common Errors: Color contrast violations accounted for 80% to 90% of the accessibility failures detected across all models.
  • Market Reality: Current data shows 94% of web pages and 72% of common mobile app user journeys still fail basic accessibility tests.

Benchmarking AI Code Generation

The AAC initiative marks a shift in how the tech industry evaluates artificial intelligence. Rather than solely measuring speed or conversational accuracy, this benchmark assesses the output code against established web accessibility standards. The initiative is led by the GAAD Foundation, the organization behind Global Accessibility Awareness Day, to encourage foundational model companies to prioritize inclusivity at the code level.

According to the test results released at CES, OpenAI has established a clear lead. Five of the top ten performing models belonged to OpenAI, with their GPT-5.2 series occupying the top four positions. In contrast, Anthropic’s Claude models placed in the middle of the pack, while Google suffered a notable defeat.

"The biggest shocker of all to me was Gemini 3.0 Pro came in dead last—36 out of 36. They have Lighthouse. If they just trained on Lighthouse, they would get a perfect score."— Joe Devon, Chair of the GAAD Foundation

Devon noted the irony that Google, which develops Lighthouse—a premier developer tool for automated accessibility and SEO testing—failed to integrate those same standards effectively into its flagship AI model’s training data.

Methodology and Technical Findings

The AAC utilizes axe-core, an automated testing engine, to evaluate the generated code. Because automated tools can typically detect only 30% to 50% of accessibility issues, the benchmark focuses heavily on programmatic errors that machine validation can catch reliably.

The analysis revealed that the vast majority of AI-generated code failures stem from basic design implementation:

  • Color Contrast: Approximately 80% to 90% of all flagged issues related to insufficient contrast between text and background, making content difficult for visually impaired users to read.
  • Missing Labels: A significant number of models failed to generate proper labels for form elements and interactive controls.
  • HTML Structure: The benchmark analyzed over 1,000 HTML pages across 28 categories to see how models handled structural elements.

Interestingly, the study found a divergence between accessibility and other typographic standards. Models that performed well on accessibility metrics often performed poorly on the correct usage of em-dashes, suggesting that current training data may treat semantic code quality and typographic nuance as separate optimization tracks.

The Business Imperative and Demographic Shifts

Beyond the technical benchmarks, the release of the AAC highlights a stagnant progress curve in digital inclusion. Devon cited the WebAIM Million report, which indicates that the percentage of inaccessible web pages has only improved from 97% to roughly 94% over the last six years. Similarly, a "State of Mobile App Accessibility" report by ArcTouch found that 72% of common user journeys in top mobile apps result in poor or failing experiences.

While AI creates efficiency, the failure to prioritize accessibility poses a long-term business risk, particularly regarding shifting demographics. With the Millennial generation entering their 40s, the prevalence of age-related disabilities—such as vision and hearing loss—is set to explode.

"We have 50% of the population... that's above 40. This is a demographic explosion that's going to happen, and the companies that are not focused on accessibility are going to learn very quickly. It's just like that tipping point where they're going to be like, 'Oh, okay. We actually have to pay attention to this because most of the world has some kind of disability that's age-related.'"— Joe Devon, Chair of the GAAD Foundation

Devon argues that AI developers must treat accessibility not as a compliance checklist, but as a dataset of "edge cases." In machine learning, solving for edge cases—the diverse range of human abilities—typically results in a more robust and capable model for all users.

Future Developments

The GAAD Foundation intends to evolve the AAC from a static benchmark into an interactive feedback loop. Future versions of the tool will ostensibly feed failure data back to the models to see if they can self-correct and generate improved code in subsequent attempts.

As the industry moves toward the 15th anniversary of Global Accessibility Awareness Day this May, the pressure is now on foundational model providers like Google and Anthropic to close the gap with OpenAI. The data suggests that without intentional intervention in training sets, AI risks perpetuating the digital barriers that have plagued the web for decades.

Latest

Intel Ain't Playin' Anymore.

Intel Ain't Playin' Anymore.

Intel is back with new Arrow Lake CPUs aimed at unseating AMD. We break down the latest flagship challengers, Microsoft's emergency security patches, the Crunchyroll data breach, and the White House's new framework for AI oversight.

Members Public
Where Money Goes If Yield Is Banned

Where Money Goes If Yield Is Banned

A fierce U.S. Senate battle is brewing over stablecoin yields. As lobbying intensifies to strip retail investors of interest earnings, we analyze the impact of proposed legislation and where capital might flow if the industry's low-interest model prevails.

Members Public