The headline last week — "AI Governance Crisis Deepens as Agent Adoption Outpaces Organisational Control" — reads like an audit finding. It is precisely what a lead auditor writes before issuing a major nonconformity against AS9100 or IATF 16949: capability deployed, process uncontrolled, escape criteria undefined. Except this time the nonconforming process thinks for itself, generates its own justifications, and can compose a rebuttal before you've finished the 8D.

I've spent twenty years making sure manufacturing processes don't ship defective product. Over the last two, I've been designing autonomous multi-agent AI systems. The failure patterns are identical. The discipline is identical. What's missing is the willingness to apply one to the other.

Why your IT team can't own this — and your quality director should

IT departments deploy systems. Quality departments control processes. These are not the same activity, and the distinction matters more when the system in question acts autonomously.

Your IT team is trained to answer three questions. Does it work? Is it secure? Is it scalable? Those are deployment questions. A quality function asks different things: what the failure modes are, what happens when the system degrades, who the process owner is, and where the control plan lives.

Consider Ford. They deployed hundreds of AI cameras for quality inspection, then pulled the system and rehired human engineers after what the press generously called a shortfall. I would bet my last euro that somewhere in that organisation, someone asked does the AI work? and nobody asked what is our documented response when it doesn't? The cameras were a capability. There was no control plan. The 8D was written in retrospect, by the finance department.

I have audited plants where AI vision systems were inspecting critical-to-safety characteristics with no documented failure modes, no defined escape criteria, no poka-yoke on the AI itself. The standard doesn't cover it yet. AS9100 and IATF 16949 were written for processes that do not improvise.

PFMEA for autonomous agents — failure modes no standard covers yet

Run a process FMEA on an autonomous AI agent and the failure modes don't match anything in the current severity-occurrence-detection framework. A few I've catalogued from building and operating these systems:

  • Confident hallucination — the agent generates a plausible but incorrect answer with certainty that discourages human verification. Severity 9, occurrence depends on model architecture, detection — that's your problem.
  • Consensus drift — multiple agents converge on the same wrong answer because they share overlapping training-data biases. Single-model testing will never catch this.
  • Action without authorisation — the agent takes a tool action outside its intended scope because the scope was never bounded. Sending an email, modifying a record, calling an API.
  • Gradual degradation — model performance erodes over weeks due to context drift, prompt accumulation, or upstream API changes. Nobody notices because there is no SPC chart on agent output quality.

None of these appear in VDA 6.3 or any process audit checklist I've used. They will, eventually. Right now the standards bodies are where they were with software quality in 2005 — aware of the problem, years from a usable framework.

The control stack I built — routing, consensus, and the stamp

When I designed MultiPS, I didn't set out to build a performance platform. I set out to build a control system. Running 63-plus models in parallel with consensus synthesis is a governance mechanism — the AI equivalent of redundant measurement systems and cross-verification in metrology.

When a majority of independent models, each with different architectures, training corpora, and failure profiles, converge on the same output, I have something statistically meaningful. When they diverge, that divergence is a detection signal. The process is entering an unstable state. That is statistical process control applied to language model inference.

The routing layer is the second control. Not a cost-optimisation feature, though it does reduce cost substantially. It directs queries based on task complexity, risk classification, and required confidence level. A low-stakes summarisation goes to a single fast model. A safety-relevant analysis routes to full consensus. The routing is the control plan — it determines which path through the process each input takes, and what verification applies.

Consensus is not a performance feature. It is the last control mechanism you have when the process thinks for itself.

Then there is human-in-the-loop. In aerospace, we call this the stamp. In QRQC, we call it the floor decision. In multi-agent AI, it is the gate where a human reviews output before it becomes an action. No defined gates, no controlled process. Full stop.

I learned this the hard way. Early in the MultiPS build, I had a routing layer optimised purely for cost and speed. It broke — not because the answers were uniformly wrong, but because there was no control on when to escalate. A cheap model handling a complex analytical query produced plausible nonsense at exactly the moment a human would have flagged it. The fix was not a better model. The fix was a routing rule that classified query risk and triggered consensus above a defined threshold. That is PFMEA thinking applied to inference architecture. You don't solve capability gaps with more capability. You solve them with controls.

Key takeaways

  • Treat every autonomous agent as a new process requiring PFMEA, control plan, and escape criteria — not as a software install needing a security review.
  • Consensus across independent models functions as in-line SPC. Divergence between models is a detection signal, not a bug to suppress.
  • Define human-in-the-loop gates by risk classification, not by workflow convenience. High-severity outputs route through human verification by design.
  • If your AI governance policy doesn't reference failure modes, escape criteria, and documented response procedures, it is a marketing document — and your next audit will prove it.

The plants that survive AI deployment won't have the smartest agents. They'll have the tightest control plans. The headline asked why adoption outpaces control. Nobody wrote the control plan, because nobody in the organisation believed AI deployment was a quality problem. It is. It always was.