How GENbAIs Works

A systematic framework for detecting AI bias using real-world scenarios
instead of artificial academic tests

🚨 The Problem

AI systems like ChatGPT, Claude, and Gemini are being used for important decisions, but we don't really know how biased they are in real-world scenarios. Most existing tests use artificial questions that don't reflect how people actually use these systems.

Our Research Scale

8 AI Systems Tested
2,960 Responses Analyzed
5,807 Bias Instances Found
6 Cognitive Dimensions

🔬 Our Approach: Test AI Like Humans Use It

1

Gather Real-World Content

We collected authentic news articles from around the world, covering:

  • Different political perspectives (left, center, right)
  • Multiple regions (North America, Europe, Asia, Africa, etc.)
  • Various topics (politics, health, environment, technology)
2

Create Realistic Questions

Instead of artificial test questions, we used AI to generate natural questions people might actually ask about these news stories, like:

  • "What were the main problems with this policy?"
  • "Who was most affected by this event?"
  • "What should be done about this situation?"
3

Test 8 Major AI Systems

We fed the same article + question combinations to:

  • ChatGPT (OpenAI)
  • Claude (Anthropic)
  • Gemini (Google)
  • Llama (Meta)
  • Mistral, DeepSeek, and others
4

AI Systems Judge Each Other

Here's the clever part: We had each AI system analyze not just its own responses, but all the other AI systems' responses for bias. This cross-checking reveals patterns that single evaluations miss.

5

Multi-Dimensional Analysis

Instead of just asking "is this biased?", we measured six different aspects:

  • Detection Ability: How well can the AI spot bias?
  • Self-Awareness: Does it recognize its own biases?
  • Consistency: Are its judgments reliable?
  • Objectivity: Does it apply the same standards to everyone?
  • Cognitive Resistance: Can it avoid biased thinking?
  • Self-Application: Does it hold itself to the same standards?

🔍 What We Discovered

Every AI System Shows Bias

All tested systems inject bias into their responses, even when analyzing politically neutral content.

Corporate "Fingerprints"

Different companies' AI systems show distinct bias patterns:

  • Google models: Lower bias scores (4.1-4.2)
  • Mistral: Higher bias scores (7.1)
  • Each company's training approach leaves identifiable signatures

Hidden Differences

AI systems that seem equally biased can have completely different cognitive abilities - crucial information that simple bias scores miss.

🎯 Why This Matters

As AI systems make more decisions affecting people's lives, we need better ways to understand their biases and cognitive limitations. GENbAIs provides a framework for systematically evaluating AI systems using realistic scenarios rather than artificial tests.

🎯 The Goal

Help organizations choose AI systems responsibly and push the industry toward more fair and reliable AI through systematic, verifiable bias detection methods.

Explore Our Research

Discover the complete findings, methodology, and tools from our systematic AI bias research

📊 View Full Research 📄 Read Research Paper 🧠 Expert Consulting