Thursday 29th of July 2010

Places I love



Root Cause Analysis PDF Print E-mail
Intermittents make us crazy. The sparser the intermittent, the crazier it makes us. The sparsest possible intermittent is the event, which happens exactly once. You can't solve events, or extremely sparse intermittents, with the Universal Troubleshooting Process or any of its contemporaries. Even generic problem solving methodologies do poorly against extremely sparse intermittents.

When safety concerns permit, we often ignore sparse intermittents. How many invoices come back "could not reproduce problem". But in an electric power station, to name one example, future occurrence must be prevented. And that means finding the root cause.

Root Cause Analysis is a troubleshooting process optimized specifically for sparse intermittents. Max Ammerman, formerly of Florida Power & Light, literally "wrote the book on the subject". His book is called "The Root Cause Analysis Handbook". This 135 page book is certainly not simple, but it's understandable by a person who solves problems for a living, especially if he's studied any other problem solving methodology.

In the introduction, the book describes Root Cause Analysis as using the following process:

  • Significant event occurs
  • Problem definition and initial data collection
  • Task analysis
  • Change analysis
  • Control barrier analysis
  • Begin the event causal factor chart
  • Conduct interviews
  • Determine root causes
  • Recommend corrective actions
  • Report conclusions
Notice I didn't number these steps. Ammerman often treats them like tools, and it's frequently mentioned that in some situations certain steps can be skipped. Each of the above steps, except the first one, consumes a chapter in the book. The first two steps above are obvious to anyone who solves problems. Chapter 1 tells how to define the problem and collect data, and discusses hints, tips, guidelines and pitfalls.

In fact, every chapter offers hints, tips, guidelines and pitfalls. He even tells you exactly how to interview a person, how to avoid "leading" him, and what to do if the person appears to be hiding something (a likelihood when interviewing someone about a serious messup). It's the little "how to" explanations that make this book so useful.

Task analysis it the act of analyzing the task related to the event. Task analysis reveals how the task should be done, as a baseline to evaluate what actually happened.

Change analysis is the comparison of the activities during or leading up to the event, contrasted with those same activities done successfully. The trick here is to ask questions leading to distinctions. These distinctions prove useful in later steps.

Control barrier analysis is the study of failed control barriers. A control barrier is anything, be it physical, policy, or anything else, that prevents problems. Either the existing control barriers are insufficient and need augmentation, or a control barrier failed to prevent the event. In the latter case, evaluate what caused the control barrier to fail. It's absolutely vital you identify all control barriers.

Event and causal factor charting is the creation of a flow chart detailing all the activities before, during, and after the event, as well as conditions, changes, control barriers. The idea is to trace the cause and effect back to the various causes, so that in a later step a root cause can be correctly identified. This step is complex, so read chapter 5 carefully.

We all think we know how to conduct interviews. Chapter 6 goes over the process with a fine tooth comb. Many times during the chapter, I thought to myself "that's a good point -- I didn't think of that". The final work product of the interview is an interview sheet and an observation sheet. These are used to reveal and to substantiate hypotheses necessary to finish the Root Cause Analysis.

Determining the root cause is where the rubber meets the road. A wrong root cause will doubtlessly create a flawed solution, at best a "coathanger" solution fixing a symptom but creating or allowing side effects. At worst it will simply not work, or make things worse. In this respect Root Cause Analysis is identical to the Universal Troubleshooting Process. Chapter 7 discusses how to put together all the info gleaned earlier to correctly deduce the root cause.

Developing corrective actions requires knowledge of the root cause, failed barriers, distinctions between functional and dysfunctional tasks, economics, company policies, government regulations, and a host of other information. In the terminology of generic problem solving, this step analyzes the solved state.

The final work product of the Root Cause Analysis is the report. It must give conclusions and recommendations, and must be presented correctly.

Root Cause Analysis is vital for sparse intermittents, and for safety critical situations where letting the problem recur or reproducing the problem is out of the question. It's an excellent tool for the diagnosis of failed processes, policies, even human interactions. When you read this book, be aware that it mentions PIC without ever defining it. PIC stands for Problem Identification & Correction, Florida Power & Light's internal name for the Root Cause Analysis process.

If you solve problems, you need to know Root Cause Analysis. Max Ammerman's book is called "The Root Cause Analysis Handbook", ISBN 0-527-76326-8. It's widely available. Get it.

 

Recommended


Powered by Auto-365.Com