When the Nidec ransomware story broke, every CISO reached for their incident response binder. Every quality director I know reached for something else — a phone, to call their plant managers and ask a question nobody in the press was asking: what happens to our traceability when the OT network comes back up? The headlines covered data exfiltration, negotiation timelines, stock price impact. What they missed is that a ransomware shutdown in a manufacturing environment is not primarily a cybersecurity event. It is a quality event with a delayed fuse. The breach is the IT story. The restart is where parts go wrong.
What actually dies when OT goes dark
Most incident response plans treat operational technology the way they treat a file server: isolate, wipe, restore from backup, bring online. That framing collapses the moment you are responsible for a line that torque-bolts aerospace assemblies or stamps tolerance-critical components. I hold a CEH certification and 30-plus security credentials, and I spent years on the manufacturing side as a process operational manager at Airbus. The gap between how security teams and quality teams understand a "successful recovery" is wider than most executives realise.
When OT goes dark, three things break that no backup restores cleanly.
- Process parameters drift. PLCs, SCADA tags, machine programs don't just "come back." If a line was mid-cycle when the freeze hit, if a heat-treat furnace cooled unevenly during the outage, if a coating line's viscosity drifted because temperature control was lost for six hours — those events are not in any log. The log is gone. You are shipping blind.
- Calibration chains break. Every gauge, torque wrench, and CMM in a modern plant carries a calibration chain tying it back to a national standard through scheduled verifications. When the maintenance management system is encrypted, you lose the calibration status of every instrument on the floor. Some expired during the outage. Some were "borrowed" by a shift that could not access the locked-out system and needed to keep running.
- Electronic traveller records become untrusted. Serial-level genealogy, material lot-to-part mapping, torque-and-time stamps — all of it lives in MES and QMS databases. After ransomware, even if the data is recovered, its integrity is in question. Were records modified during the dwell time before encryption? Were there gaps in the audit trail? You cannot assume the data is clean. You must prove it.
I built a greenfield QA/QC department from zero for over 900 employees at SNOP. I know what it costs in engineering hours and scrap to validate a manufacturing system when you have no trusted baseline, no inherited procedures, no "we've always done it this way." Post-ransomware revalidation looks almost identical. You are starting from zero, whether you planned to or not.
The restart is the real quality event
Here is an uncomfortable observation. The hardest part of a ransomware incident in a manufacturing plant is not the breach. It is the first shift after restoration. That is when a process engineer has to answer a question most plants have never formalised: is this line still making conforming parts?
Re-validating a line after an OT incident is harder than initial validation, because you are fighting two adversaries simultaneously — unknown process drift during the outage, and compromised data integrity in the systems that would normally tell you whether drift occurred. Your PFMEA assumed stable conditions. Your control plan assumed trustworthy measurements. Your SPC charts assumed continuous, unbroken data collection. All three assumptions just failed.
Almost no manufacturer I have audited or consulted for has a written procedure for this. I have reviewed incident response plans across automotive and aerospace suppliers — they have an IT chapter, a legal chapter, a communications chapter. What they do not have is a quality chapter. No protocol for quarantining parts produced during the dwell period. No re-validation requirement per critical process. No defined count of consecutive conforming parts before line release. No clarity on who signs the restart — the plant quality manager, the customer's SQE, or the authority having jurisdiction.
A clean EASA audit means nothing if the calibration records that proved it were encrypted last Tuesday.
Your PFMEA needs a ransomware failure mode
The standard IATF 16949 and AS9100 toolset was not built for cyber-physical failure modes. PFMEA templates give you columns for failure cause, failure effect, severity, occurrence, detection. They assume mechanical, thermal, human causes. They do not have a row for "SCADA historian unavailable for 11 hours during a ransomware dwell period, SPC continuity lost, process capability index uncomputable."
That row needs to exist. At Airbus, where I serve as Head of Manufacturing Engineering Technical Authority for North America, we treat cyber-resilience as a quality system property — not an IT add-on. If IT owns it, the recovery target is "systems are back online." If quality owns it jointly, the recovery target is "we can prove every part shipped since restoration conforms to specification." Different finish lines. Only one of them protects your customer and your certification.
Building this into the quality system means your control plan carries an alternate verification path for every CTQ characteristic — a paper traveller, a manual gauge, a backup capability study. Your calibration management needs offline redundancy. Your QRQC process needs a cyber-incident trigger that treats OT data loss as a containment event, not an IT ticket. And your 8D response to a ransomware restart should be written before the ransomware arrives, not improvised in a crisis room at 2 a.m.
Key takeaways
- Quarantine everything produced during the dwell window. If you cannot prove continuous process control, you cannot ship. Treat the gap as a containment event, not a data gap.
- Write a quality restart procedure before you need it. Define which processes require re-validation, how many consecutive conforming parts clear a line, and who holds restart authority.
- Add cyber failure modes to your PFMEA and control plan. OT unavailability, historian corruption, and calibration-system loss are process failure modes — model them with severity, occurrence, and detection scores like any other.
- Give quality a seat at the IR table from day one. If the first time quality learns about an OT incident is after systems are restored, you are already behind on traceability and already at risk of shipping unverified parts.
If your incident response plan has an IT chapter, a legal chapter and a comms chapter but no quality chapter, you are planning for the wrong recovery. You are planning to bring systems back online. You should be planning to bring trust back online — every parameter, every measurement, every traveller, every part that leaves your dock after the all-clear sounds. The breach makes the news. The recall ruins the company. The distance between the two is a document most manufacturers have not yet written.