ITIL tells you what to do when addressing Incident Management. ITIL does not tell or show “how” to do it. In more than 95% of cases where we personally got involved with helping a client to solve a vexing problem/incident using Problem-solving for ITIL, we found that the reason why they could not solve the incident themselves was that they were too general in the description of the fault. The “devil lies in the detail” and therefore we need to get there ourselves to get the true path of events and interpret that into a plausible answer.
Let’s illustrate this with an example. The most common occurrence is the infamous fault of a service or website being “SLOW”. In trying to get more specific we normally want to know what they mean by “slow”. Does it take too long to open a page, or is it the “NEXT” button needs to be pushed more than once or is it a case of the screen eventually freezing? I think you would agree that each of these possibilities could have a different answer, right?
The incident investigation practice is subject to many possible pitfalls in specificity that could delay a team to get to the correct answer quickly, accurately and permanently. Let’s look at a few of these:
Respondents in an investigation deal with faults such as; not working, failing, dead, fallen over, and not responding, just to mention a few. These descriptions are reflecting an “end-stage” or a consequence of the fault and are not the fault at all. Your team needs to develop certain questions they could ask to move away from these supposed faults. Questions such as “what happened that is not supposed to happen?” or “what is supposed to happen but it did not?”.
Answers such as “it is happening to everybody, it is happening everywhere, and it is happening all the time” is a clear sign that the team does not know the answers.
When you ask a general question you are going to get a general answer, which is not helping the team to get to the bottom of an incident. Working with investigation teams I’ve heard questions such as “is your system green or have you guys changed anything over the weekend?” The answer in almost all cases would be “yes or no” and does not give you anything more to work on.
So, the obvious answer is to have a systematic approach that would ensure your OBJECT and FAULT components are very specific and then broken down into specific worked questions within a factor analysis framework. A question such as “Tell me what would explain a connection to drop for the ABC website, but not for the other websites on the same LAN?”, would improve your chances significantly and move you towards the answer quickly and efficiently.
How to Get There
As stated above, ITIL does not tell or show “how” to solve incidents. And coming up with the “Tell me what would explain …” question above usually isn’t very simple (unless you’re trained of course). However, ITIL does refer to the four levels of wisdom, which is spot-on for solving incidents and problems. In the incident analysis, there are at least four levels of data. Level one is stating the facts as it is and that is the “data” level. Not easy to solve incidents on this level with factual data only. The ease of problem-solving increases with the level of data being used. The next level is “information”, the third level is “knowledge” and the fourth level “intelligence or wisdom”.
So, now the question is how do we get to the other levels? We suggest you look at the following questioning techniques to get to the other levels. Here they are using the example above:
Ask: “Which website do we have a problem with?” (IS Question). The answer would be “ABC”. This is pure factual data and not that helpful in finding the answer.
Ask: “If it is the ABC website we are having a problem with, which other websites on the same LAN could have had the same problem but did not?” (BUT NOT Question). The answer would be “The Mango and E-Xpress sites”. The SME would find this intriguing because they are on the same LAN and that does not make sense.
Ask: “Why do we have a problem with the ABC website and not the other ones?” (WHY or WHY NOT Question) The challenge for the Networks SME is now to find a plausible explanation or expert guess why that could be possible.
Ask: “As the SME how do you think the fact that the ABC site “has a set of unique proxy rules” could have caused the incident?” (Possible Cause Question)
This systematic Problem-solving for ITIL thinking approach with worked questions coupled with a disciplined use of a template could leverage the in-house content knowledge drastically.