1. Problem Solving within IT Service Management
IT Service Management (ITSM)
IT Service Management is a discipline for managing IT systems and technology centered on the identification and delivery of IT Services used by the business. These IT Services are defined in business terms and are the final outcome for IT systems and technology. Within the practice of ITSM, the ITIL (IT Infrastructure Library) framework links Root Cause Analysis to the practice of Problem Management. The ITSM discipline and the ITIL framework approach provide a beneficial relationship to successful IT related Problem Solving by IT Service Support professionals. As such, it is useful to consider the following definitions:
Problem and Problem Management
- A Problem is the unknown cause of one or more related Incidents.
- Problem Management manages the investigation into the cause of these related Incidents and ensures an appropriate resolution to the Problem is found.
Incident and Incident Management
- An Incident is an unplanned event that is a deviation from normal (as defined by the Service Level Agreement) that affects an IT Service. This deviation could include:
- Disruption to the agreed service
- Reduction in the quality of agreed service
- Something that could lead to a disruption or reduced quality of agreed service
- Incident Management manages the quick response and restoration of these incidents, and may further escalate issues to Problem Management for further investigation.
Priority, Impact, Urgency
Priority is a generic ITSM definition that defines the priority in which issues, such as Incidents and Problems, are dealt with. Priority of an issue is a combination of Impact and Urgency. These are defined as the degree of:
- Impact-positive or negative business effect of something.
- Urgency-response time required to address an “impact” event.
Service Level Agreement (SLA)
A written or understood agreement between an IT Service Provider and the Business (through identification of the Business Customer) that outlines all arrangements for the use, performance levels, and management of an IT Service. Incident and Problem priority objectives and other details are often documented and agreed in this Agreement.
2. Problem Solving
What is Problem Solving?
Problem Solving is a basic and key life skill that is often considered to be one of the most complex human intellectual functions. Fortunately, it’s also something that can be continually learned, practiced and improved.
In its basic sense, Problem Solving is a mental practice for thinking and reasoning. It can be refined and improved when broken into the separate, but related, parts of Problem Finding and Problem Shaping.
Identifying the Problem is the first step to good Problem Solving. The problem statement becomes the target that is being solved for, and getting this target right or wrong can have serious good or bad consequences. In many cases, identifying the problem is more complex than actually solving the problem. A key to good problem finding involves the use of Creative Thinking.
Problem Shaping follows Problem Finding. Once the Problem has been correctly identified, questions need to be asked that shape the direction and findings of problem investigation. Each question leads to insight into the underlying Problem Cause(s), and thus refines and shapes further questions to be asked. A key to good problem shaping involves the use of Critical Thinking.
3. Problem-Solving Perspectives
Adjust your Thinking and Reasoning
Problem Solving as a skill requires that the problem solver adjust their approach and perspective to solving the problem. Failure to adjust and taking the wrong perspective from the onset is one of the primary reasons problem solvers fail, or fail to act in a timely manner. This can be accomplished by understanding the differences between:
|Critical Thinking: Logical||Creative Thinking: Generative|
|Critical Thinking in Problem Solving can be considered logical thinking. It is a methodical approach that involves the cognitive skills of goal clarification, observation, interpretation, analysis, categorizing, relating, inference making, evaluation of results, assessment, and explanation of conclusions.||Creative thinking in Problem Solving can be thought of as finding options and alternatives, or more commonly referred to as “thinking outside the box” of common and tried solutions.|
|Deductive Reasoning: Top down||Inductive Reasoning: Bottom up|
|Critical Thinking is associated with deductive reasoning, where it is based on a set of propositions and the subsequent investigation and factual discoveries that bring to light the root cause of a Problem.
Tends to be a top-down approach to Problem Solving. It requires detailed knowledge or experience combined with a logical practice that confirms each proposition is NOT the source of the Problem.
|Creative thinking is associated with inductive reasoning, where Inductive Reasoning can be thought of as assumptions or generalized conclusions drawn from a set of observations. These assumptions are not necessarily valid conclusions, but start points to be further investigated and validated.
Tends to be a bottom-up approach to Problem-Solving. It is based on both intuition and making guesses based on experience. It must be followed up by verification of these guesses, or assumptions, typically using an experimental approach.
|When to Use Critical Thinking and Deductive Reasoning
– A problem is familiar or of a familiar type
– A problem solver has sufficient skill and experience
A critical marketing application has several different user error messages across the marketing department. With our programming experience we know that each error message is triggered by application error trapping code. Therefore, we deduce that we should investigate the programming code related to the application modules that produced the error message to confirm the application logic.
|When to Use Creative Thinking and Inductive Reasoning
– A problem is unfamiliar
– Deductive reasoning has reached a dead end and more alternatives are needed
A critical marketing application has several different user error messages across the marketing department. We have no programming skill, but have observed in our past experience that shared applications are run from a central server. The marketing application is a shared application; and therefore we induce (assume) that the Problem must be based on a server. Our investigations will now take this path
4. Problem Solving as a Structured Practice
The Problem Solving practice seeks to prevent Problems from ever recurring by taking effective corrective actions. It involves a structured and methodical approach to problem solving. In general, this structure involves:
- Correctly defining the problem
- Finding the root cause(s) of the problem through Root Cause Analysis,
- Determining the most effective corrective actions to take, and
- Implementing the solution to successfully manage the problem.
Root Cause Analysis (RCA)
Root Cause Analysis is a sub-practice of the larger Problem Solving practice that requires an appropriate application of Problem Solving skills in conjunction with a methodical and systematic approach to identifying the true root cause(s) of Problems.
Each Root Cause Analysis approach shares a common aim to avoid focusing on and solving the symptoms of the problem, and to instead drill down to identify and solve the true root cause of the problem.
A key assumption to Root Cause Analysis is that there is always one true root cause for any given problem. This leads to a key challenge of having sufficient focus and perseverance to find this one true root cause.
Primary Goal of Root Cause Analysis is to endeavor to determine the lowest level “root” cause of a Problem that supports taking the most effective corrective actions. The objective of Root Cause Analysis is to reveal the correct root cause of the Problem, because without it we cannot determine what effective corrective actions must be taken.
Kepner-Tregoe Root Cause Analysis Method
The Kepner-Tregoe method to analyze problems was developed by researchers Dr. Charles Kepner and Dr. Benjamin Tregoe. This method emphasizes a structured approach to problem solving that relies on setting priorities and making use of technician knowledge and experience. The method includes five steps to problem solving.
The Problem Solving Plan Using Kepner-Tregoe
It is recommended that a structured problem solving plan should be created when solving any Problem. The plan should follow the problem solving steps, such as those defined by Kepner-Tregoe, and should include the business goals and objectives that need to be achieved. Each step can then be managed at an appropriate level based on priorities, time pressures, and availability of information.
The Problem Solving Plan is iterative, where new facts and observations shed new and increasingly accurate light on both the Problem definition as well as the root cause investigation.
5. RCA Methods and Techniques
Comparison of Methods and Techniques
There are many Root Cause Analysis related methodologies and techniques. The more common ones along with their primary characteristics are listed in the following table.
Each has particular strengths that make it suitable for use in specific situations. These are defined more fully in the subsequent pages.
The Journalism Standard
The Journalism Standard is a technique that is focused on factual reporting and analysis, where emotion and assumptions are removed from consideration. It can be thought of as listing and considering “just the facts”. The Journalism Standard reminds the Problem Solver to research and list the basic facts of the situation first, to seek interviews and independent confirmation, and then to evaluate using a neutral approach.
This technique also reminds the Problem Solver to avoid some of the more common Problem Solving mistakes including eliminating or limiting possible causes that includes:
Avoid Jumping to Assumptions: Avoid eliminating possible causes due to one or more incorrect assumptions. Making assumptions is a necessary part of life. However, when the stakes are high and risks of failure increase, making assumptions can be dangerous! Many times a Problem has escalated or dragged on due to an assumption being made that eliminated a check point.
Ex: A technician may eliminate checking the application drivers on Server PC “knowing that it can’t be the Server as no one has updated the Server since it was last working…”
Avoid Tunnel Vision: Avoid missing possible causes due to an obsession or narrow focus on one or a small range of assumptions. Jumping to a conclusion may lead you to a quick diagnosis of a Problem, but more often than not it leads to a failure to correctly identify the root cause. In other cases, the narrow focus misses the correct root cause and the Problem escalates as the investigation drags on.
Ex: A technician may limit investigation to a Server PC “concluding that it must be the Server as the application error is generated from the software residing on the Server…”
The 5 w’s
The 5 Whys is a method for perseverance in Problem Shaping to find the true root cause, and not to stop at a superficial symptoms and assumptions. This basic cause and effect method is simple in concept: to investigate the possible cause of a problem, ask the question “why did this happen” in five successions. This technique was originally used within Toyota Motor Corporation and is a critical component of problem solving now also used within Kaizen, lean manufacturing, and Six Sigma. The 5 Whys is a questions-asking method that pushes the problem solver to dig deeper. By no means is the method limited to5 degrees of detail, but it has been generally accepted that 5 five iterations of asking why is generally sufficient to get to a root cause.
The 5 w’s is a simple and commonly known rule to gather the facts:
- who was involved
- what events happened
- when did the sequence of events happen
- where did the sequence of events happen
- how did the sequence of events happen
- why did the problem happen (initial inductions and deductions)