I have stood in front of a lot of whiteboards with plant managers who just spent six figures on a vision system. The pitch was always the same — one camera, one neural net, fewer inspectors, faster line. Six months later I am back in the same room looking at the same whiteboard, and nobody can explain why good parts are getting scrapped while suspect parts keep walking out the door.

The answer was in the news, twice, in the same week.

Two headlines, same week, opposite outcomes

Ford rehired roughly 350 inspectors and technicians it had replaced with an AI quality-control system. The system could not hold the line in production. Same week, Sandia National Laboratories went live with an AI-assisted inspection workflow for ceramic components. Working. Deployed. Quiet about it.

Read those side by side and most of the industry concludes: Ford bought a bad model, Sandia bought a good one. Wrong read. The contrast is architectural, not algorithmic.

In my world — Airbus manufacturing engineering under EASA oversight — if a false acceptance walks out the door, nobody asks what your validation F1 score was. They ask for the evidence trail proving a nonconforming part could not reach a customer undetected. That regulatory frame forces a specific discipline: you design inspection as a system, not as a benchmark. Sandia designed a system. Ford bolted on a model.

What Sandia actually built — a workflow, not an oracle

Sandia's deployment is described as AI-assisted inspection. That word matters. Assisted. Not autonomous. Not replacement. They built a workflow where the AI sits inside a human-in-the-loop process — flagging, prioritising, triaging — not adjudicating alone.

This is the architecture most manufacturers skip because it sounds unimpressive in a board presentation. They want the AI to make the call so they can remove headcount. Sandia built it to support the call so they can remove ambiguity. The difference is the entire game.

An AI that surfaces the ambiguous, routes the clear-cut, and escalates edge cases to a trained inspector is a tool. An AI that makes final accept–reject decisions on a regulated part with no override path is a liability waiting for an audit. In aerospace we run PFMEA on every inspection step for exactly this reason — you design the process assuming every node can fail, and you engineer a control for each failure mode. Sandia applied that logic to AI. Ford applied hope.

The single-model trap

I designed and built MultiPS — a platform that runs 63 AI models in parallel and synthesises their outputs through consensus. Not for vanity. Single-model systems are brittle by construction. A single model has one worldview, one set of training biases, one failure surface. When it is wrong, it is wrong with complete confidence. It does not hesitate. It does not flag its own uncertainty. It gives you a number and moves on to the next part.

I have watched this failure mode on the floor. A vision model trained on parts under controlled lighting performs beautifully in the lab. On the shop floor, with vibration, dust accumulation on the lens, and a raw-material lot from a different supplier, it starts making decisions that a first-week apprentice would question. But the model does not question itself. It has no mechanism for doubt.

A single model does not know when it is wrong. A consensus of models shows you exactly where the disagreement lives — and disagreement is the most valuable signal in automated inspection.

That disagreement is what triggers human review. When models agree, you proceed with confidence. When they split, you escalate. When they all panic, you stop the line. Three states, each with a designed response. A single-model deployment gives you two states — yes or no — and zero self-awareness about which one is the lie.

The cost structure looks deceptive on the capital request. One model, one camera, one integration — the vendor quote reads like a bargain. Then the bill arrives: rework, escapes, customer escalations, and eventually the rehiring of inspectors you should never have displaced. I have seen rework costs from a single AI-blind deployment wipe out a year of projected labour savings inside one quarter. Ford's rollback is not a PR problem. It is a systems-engineering invoice.

Before sign-off

The question is not "how accurate is the model." Validation accuracy is a lab number. The question is: what happens when the model is wrong in production, and how does the system detect it?

If the answer involves a dashboard turning red, you do not have a system. You have an alarm with no response plan. If the answer involves consensus disagreement, human escalation paths, defined fallback modes, and an audit trail that an EASA or IATF auditor can follow — then you have architecture.

Key takeaways

  • Assisted beats autonomous. AI that triages and flags inside a human-in-the-loop workflow consistently outperforms AI that makes final accept–reject calls with no override. Sandia proved this; the principle maps directly to AS9100 and IATF 16949 control-plan logic.
  • Single-model deployments fail silently. One model has no internal mechanism for doubt. When conditions drift from training data, it keeps producing confident wrong answers — and your dashboard stays green until a customer finds the defect.
  • Consensus creates detectable failure. Running multiple models in parallel converts silent errors into visible disagreement, giving you a trigger for human review before parts ship. This is the architectural difference between a tool and a liability.
  • Integration strategy, not model selection, determines outcome. The 8D investigation into a failed AI deployment almost never identifies the algorithm as root cause. It identifies the absence of a system around it.

AI inspection does not fail at the algorithm. It fails at the architecture — at the whiteboard decision, made months before any hardware arrives, to treat a model as a system rather than as a component inside one. Sandia drew the right diagram. Ford drew the wrong one. The models were beside the point.