|
|---|
|
|---|
| Root Cause Analysis |
|
|
|
|
Intermittents make us crazy. The sparser the intermittent, the crazier it makes us. The sparsest possible intermittent is the event, which happens exactly once. You can't solve events, or extremely sparse intermittents, with the Universal Troubleshooting Process or any of its contemporaries. Even generic problem solving methodologies do poorly against extremely sparse intermittents.
When safety concerns permit, we often ignore sparse intermittents. How many invoices come back "could not reproduce problem". But in an electric power station, to name one example, future occurrence must be prevented. And that means finding the root cause. Root Cause Analysis is a troubleshooting process optimized specifically for sparse intermittents. Max Ammerman, formerly of Florida Power & Light, literally "wrote the book on the subject". His book is called "The Root Cause Analysis Handbook". This 135 page book is certainly not simple, but it's understandable by a person who solves problems for a living, especially if he's studied any other problem solving methodology. In the introduction, the book describes Root Cause Analysis as using the following process:
In fact, every chapter offers hints, tips, guidelines and pitfalls. He even tells you exactly how to interview a person, how to avoid "leading" him, and what to do if the person appears to be hiding something (a likelihood when interviewing someone about a serious messup). It's the little "how to" explanations that make this book so useful. Task analysis it the act of analyzing the task related to the event. Task analysis reveals how the task should be done, as a baseline to evaluate what actually happened. Change analysis is the comparison of the activities during or leading up to the event, contrasted with those same activities done successfully. The trick here is to ask questions leading to distinctions. These distinctions prove useful in later steps. Control barrier analysis is the study of failed control barriers. A control barrier is anything, be it physical, policy, or anything else, that prevents problems. Either the existing control barriers are insufficient and need augmentation, or a control barrier failed to prevent the event. In the latter case, evaluate what caused the control barrier to fail. It's absolutely vital you identify all control barriers. Event and causal factor charting is the creation of a flow chart detailing all the activities before, during, and after the event, as well as conditions, changes, control barriers. The idea is to trace the cause and effect back to the various causes, so that in a later step a root cause can be correctly identified. This step is complex, so read chapter 5 carefully. We all think we know how to conduct interviews. Chapter 6 goes over the process with a fine tooth comb. Many times during the chapter, I thought to myself "that's a good point -- I didn't think of that". The final work product of the interview is an interview sheet and an observation sheet. These are used to reveal and to substantiate hypotheses necessary to finish the Root Cause Analysis. Determining the root cause is where the rubber meets the road. A wrong root cause will doubtlessly create a flawed solution, at best a "coathanger" solution fixing a symptom but creating or allowing side effects. At worst it will simply not work, or make things worse. In this respect Root Cause Analysis is identical to the Universal Troubleshooting Process. Chapter 7 discusses how to put together all the info gleaned earlier to correctly deduce the root cause. Developing corrective actions requires knowledge of the root cause, failed barriers, distinctions between functional and dysfunctional tasks, economics, company policies, government regulations, and a host of other information. In the terminology of generic problem solving, this step analyzes the solved state. The final work product of the Root Cause Analysis is the report. It must give conclusions and recommendations, and must be presented correctly. Root Cause Analysis is vital for sparse intermittents, and for safety critical situations where letting the problem recur or reproducing the problem is out of the question. It's an excellent tool for the diagnosis of failed processes, policies, even human interactions. When you read this book, be aware that it mentions PIC without ever defining it. PIC stands for Problem Identification & Correction, Florida Power & Light's internal name for the Root Cause Analysis process. If you solve problems, you need to know Root Cause Analysis. Max Ammerman's book is called "The Root Cause Analysis Handbook", ISBN 0-527-76326-8. It's widely available. Get it. |
Recommended
Powered by Auto-365.Com










