Many industrial accidents are investigated thoroughly.
Reports are written.
Root causes are identified.
Recommendations are issued.
Yet, similar incidents continue to occur.
Because the real gap is not in analysis after failure —
It is in anticipation before failure.
🔍 A Recent Reminder
A major incident at a thermal power plant involved a boiler explosion caused by rupture of a high-pressure steam ring header tube, leading to multiple fatalities and severe injuries.
The investigation highlighted a cascade of issues:
- Lube oil leakage in PA fan system
- Restart of equipment without resolving underlying fault
- Sensor / pressure signal inconsistencies
- Failure of hot air gate closure during mill trip
- Sudden furnace pressure surge beyond safe limits
This was not a single-point failure.
It was a system failure.
❗ The Real Question
After such incidents, we often ask:
“What went wrong?”
But the more powerful question is:
“Where did we fail to anticipate this?”
⚙️ This is Where FMEA Comes In
Failure Mode & Effects Analysis (FMEA) is not just a documentation exercise.
It is a structured way to ask:
- What can fail?
- How can it fail?
- What would be the impact?
- How likely is it?
- How do we detect or prevent it?
If applied rigorously, FMEA forces teams to visualize failure before it happens.
🔗 Mapping the Incident Through an FMEA Lens
Let’s look at some key elements from the incident:
1. Lube Oil Leakage in PA Fan System
- Failure Mode: Leakage in discharge line
- Effect: Pressure drop, instability in fan operation
- Missed Opportunity: Preventive detection + shutdown interlock
2. Restart Without Root Cause Resolution
- Failure Mode: System restart under degraded condition
- Effect: Escalation under higher load
- Missed Opportunity: FMEA-based restart permissive controls
3. Sensor / Pressure Mismatch
- Failure Mode: Faulty or misleading signals
- Effect: Incorrect decision-making
- Missed Opportunity: Redundancy / validation logic
4. Failure of Hot Air Gate Closure
- Failure Mode: Interlock failure during mill trip
- Effect: Air–fuel imbalance → pressure surge
- Missed Opportunity: Automatic isolation logic (poka-yoke)
5. Sudden Furnace Pressure Surge
- Failure Mode: Uncontrolled pressure rise
- Effect: Structural rupture and explosion
- Missed Opportunity: Early warning + escalation thresholds
🔄 The Pattern is Clear
None of these failures were unimaginable.
They were unanticipated in a structured way.
That is exactly what FMEA is meant to prevent.
⚠️ Why Traditional Approaches Fall Short
In many plants:
- FMEA exists, but is not actively used
- It is created during commissioning, then not updated
- It is disconnected from real-time operations
- Lessons learned from incidents are not fed back
So the system remains:
Compliant — but not preventive
🔁 What Effective FMEA Should Look Like
A strong FMEA system should:
- Be continuously updated with field learnings
- Be linked to control logic and interlocks
- Drive preventive actions, not just documentation
- Be integrated with operational decision-making
- Enable early detection of abnormal patterns
🧠 The Real Shift
We need to move from:
Post-incident diagnosis
to
Pre-incident anticipation
From:
“Why did it fail?”
to
“How could this fail — and how do we prevent it?”
🔐 Final Thought
In high-risk environments like thermal power plants:
Failures are rarely caused by a single event.
They are caused by a chain of small, unmanaged risks.
FMEA exists to break that chain — before it becomes irreversible.
Because in such systems:
Prevention is not just a quality goal.
It is a safety necessity.




