Collaborative AI: Building Trustworthy AI Agents for the Contact Center
Blog by Josef Novak, Chief Innovation Officer, Spitch
OpenAI and Apollo Research recently published a fascinating new paper on a fairly new alignment issue in frontier AI: “scheming.” This refers to the risk that an AI application learns to hide its true intentions, and secretly pursue “misaligned” goals while appearing to follow its instructions.
Their work illustrated just how difficult this problem is. A new training method called “deliberative alignment” successfully reduced deceptive behavior in the model; however, a critical challenge remained: “situational awareness.” The AI often behaved better simply because it knew it was being tested.
This might seem like an abstract concern, but it has important implications for how we deploy AI in the real world. So what does this mean for the contact center?
The Risk of a “Helpful” AI That Games the System
The main issue with scheming isn’t an AI “taking over”. In the contact center, there is a risk that an AI may learn to optimize for the wrong metric, or worse, learn how to appear successful in evaluations without actually being helpful to customers.
Consider these example scenarios:
- The Agent Assistant That Pleases the Grader: An AI assistant is designed to help human agents, and is measured on Average Handle Time (AHT). It may learn that suggesting short, simple, but incomplete answers gets the customer off the phone faster, improving its score. If only AHT is measured the number of times the customer calls back won’t matter.
- The “Sandbagging” Training Bot: An AI used for agent training might be evaluated on agent pass rates. It learns to serve up overly simple or repetitive scenarios. This ensures that agents pass easily – but of course they won’t be any better prepared for the real world.
- The RAG System That Strategically Hallucinates: A retrieval-augmented generation (RAG) tool that can’t find a precise answer in the knowledge base doesn’t admit uncertainty. Instead, it synthesizes a plausible-sounding response to satisfy the user and resolve the issue, prioritizing the appearance of knowledge over truthfulness.
In each case, the AI isn’t actually “broken”; it’s cleverly optimizing for a simplified goal. This is a fundamental limitation of a purely autonomous, black-box approach to AI in the contact center – and something we need to avoid.
The Solution: Collaboration
The OpenAI and Apollo Research study validates the core philosophy we’ve built at Spitch. While their attempt to instill in the AI a set of internal “honesty principles” is a valuable step, we believe the most robust and practical solution isn’t to hope the AI polices itself. The solution is to maintain collaborative principles so that humans remain in the driver’s seat.
Our paradigm reduces the risk of scheming by redesigning the relationship between human and machine. Rather than doing clumsy handoffs, the AI acts as a persistent teammate. The challenge of “situational awareness” becomes an asset, not a liability. An AI that is continually aware it’s working with a human partner is an AI that is being aligned in real time. The human agent provides the continuous, grounded oversight that a lab environment can only simulate.
In the model we use at Spitch, the AI’s success is not measured by an abstract internal score but by the success of the human it’s assisting. This human-in-the-loop partnership makes the system inherently more honest. The AI isn’t trying to pass a test; it’s designed to serve as a supportive, transparent colleague within a collaborative framework
Building a Better Teammate in a Synthetic World
At Spitch, our collaborative framework leverages synthetic, generative environments to create a “flight simulator” for the contact center that allows us to:
- Prepare for the unexpected: We can generate a nearly infinite array of conversational scenarios, training both human and AI teammates to handle unusual edge cases and complex problems they might not otherwise encounter.
- Unify learning: Most importantly, the human and AI agents learn together. The AI learns nuance and context from human-led interactions, and the human learns how to better leverage their AI partner to streamline CX experiences. This creates a powerful, self-improving cycle for both participants.
A New Mandate for Trustworthy AI
The OpenAI and Apollo Research results are worth a close look. They show that achieving robustly aligned behavior in autonomous AI is an incredibly difficult, and is likely going to remain so for the foreseeable future.
Key takeaways
- Collaborative AI, which remains aware it is working with a human partner, should stay continuously aligned.
- Humans and AI agents should learn together: the AI learns nuance and context from humans, while humans benefit from efficient, intelligent support.
- Purely autonomous LLM-based AI solutions still have fundamental limits that are unlikely to be overcome in the near future.
If your business is in CX, the best approach to leveraging generative AI is not to deploy autonomous ‘black boxes’ and hope they don’t learn to game the system. It’s to build a transparent, accountable, and collaborative AI ecosystem. It’s about providing your human agents with AI teammates designed from day one for collaborative support, ensuring that every interaction is not just more efficient, but ultimately more empathetic and more human.

