“When you’re fighting fires in maintenance,” says reliability expert Chris Napier, “you have to understand you may be the one holding the matches.”
It may sound like a funny thing to say about a profession built around fixing things, but Chris Napier has worked with enough maintenance teams across a wide range of industrial facilities to know what he’s talking about.
The fires that consume most maintenance teams (the unplanned failures, emergency shutdowns, scrambles, etc.) are largely self-generated. They result from deferred decisions, missing information and habits that feel normal until you see what normal could look like.
The previous piece in this series made the case for multi-signal diagnostics: moving from single-signal monitoring to a fuller picture turns alerts into actual diagnoses. This one picks up where that one left off.
A good diagnosis is only as valuable as the repair it enables. Between a confirmed fault and a completed fix lies a set of decisions (work order quality, parts staging, scheduling windows, technician preparation, post-repair validation, etc.) that determine whether the diagnostic investment pays off.
This is the last mile of reliability. It’s where many programs quietly stall.
The work order that doesn’t work
Ask Chris what a bad work order looks like, and he sums it up in three words: “machines not running.” That’s the description. Your technician shows up, figures it out, fixes it, and closes the order with “problem fixed” and moves on.
“But you don’t actually know what happened,” says Chris. “All you have is quantitative data: downtime recorded, parts scanned out. You may have the man-hours, but you don’t have any details of the maintenance performed: that qualitative data needed to determine the root cause or prevent that event from happening again. You’re just blind to what happened, and I’ve seen that a hundred million times.”
A good work order, by contrast, gives your technician a description of the problem, detailed step-by-step job plans, a complete parts list, and explicit requirements for what to record upon close. It’s a communication tool. When it fails at that job, the whole chain breaks.
The planning gap nobody talks about
Even well-written work orders can get stuck if the scheduling conversation never happens. Chris knows where that breakdown occurs: “The planners for maintenance and reliability don’t talk to the planners for production scheduling.” It’s not as if they are enemies; they just live in separate worlds. And that separation has a cost.
“Maintenance needs to work around the production schedule,” he says. “But first, we have to find out what it is. That comes from communication.” The production schedule exists because of customer orders. Maintenance teams that don’t understand that context push for access at the wrong times and lose credibility when they do. Teams that do understand it can make the case for a window and make it stick.
Don’t let the knowledge walk out the door
The maintenance skills gap is real and worsening as experienced workers retire. Chris’s practical answer cuts to the chase: “Before your senior millwrights retire, document what they do in certain situations and make it the SOP. Get that information down on paper so it can be passed on to the next generation. If they retire and take that information with them, it’s lost.”
The training framework he recommends builds on failure mode and effects analysis: essentially, a structured inventory of everything that can go wrong. “Identify all those failure modes and train your mechanics on them so they know what to expect. Put them in the best possible position to correct what could or will happen.”
The step that gets skipped
Among all the places the last mile breaks down, post-repair validation is the most consistent culprit. Chris describes the typical scenario: “A mechanic goes to a machine, fixes it, it works and he walks away. He’s not there with the operator to confirm the repair was correct, and nothing gets recorded.”
The consequences compound over time. Your reliability engineer can’t use data that doesn’t exist. Root cause analysis becomes guesswork. The feedback loop that should improve the diagnostic model goes silent. “Once that data is lost, you can’t go back,” Chris says. “Both the quantitative and qualitative data are valuable. We miss one part of it far too often.”
From 100% Reactive to Fully in the Green
Here’s what it looks like when a team gets all of it right.
A major US plastics recycling facility knew the advantages of predictive maintenance: cost savings, fewer headaches, and stability. What they were living with was the opposite: 100% reactive, running machines to failure, and absorbing the consequences. Missed customer orders. Unplanned downtime. Overtime to make up for it. And a revolving door of staff, because a reactive environment is a tough place to work.
Sensors were already deployed. Alerts weren’t being acted on. The gap wasn’t technology. It was culture.
Then they decided to change. They started acting on alerts, planning and scheduling maintenance instead of reacting to breakdowns, and building communication between teams that hadn’t been talking.
In under two years, they got every machine they monitor now running in the green.
“They’ve had a total culture change driven around this,” says Chris Napier. “I know it can be done.”
What’s coming
Help is closer than more teams realize. Chris describes a near future in which AI agents handle tasks that often fall through the cracks: closing work orders, capturing technician observations, and surfacing the right information at the right moment. He’s particularly enthusiastic about voice-to-work-order functionality, which lets a technician record what they did in the field before the details fade. “Instead of sitting down in front of a computer after a long day, trying to remember what happened, he can just talk to his app,” Chris says. The data is captured. The feedback loop closes.
Which brings us back to the matches. The goal of a mature reliability program isn’t to get better at fighting fires. It’s to stop starting them. That shift runs through every step of the last mile: the work order that actually communicates, the repair completed within a real window, the technician who knows what they’re walking into, and the validation that confirms it worked.
Once all those steps work together, you’re no longer fighting fires. Because you stopped starting them. “It’s about changing the way we operate,” says Chris. “Put reliability first in everything you do.”
Ready to close the loop from diagnosis to done? See how AI in predictive maintenance works.