ITIL Problem Solving

By Leigh Wilkinson, OIT

One of the elements of the ITIL based service model is the Continuous Service Improvement (CSI) component. CSI is dedicated to the following objectives:

  • Review, analyze, and make recommendations on improvement opportunities in each lifecycle component of OIT
  • Review and analyze service levels
  • Identify and implement activities to improve OIT services
  • Improve cost effectiveness of OIT services
  • Ensure applicable quality management methods are used to support continual improvement activities.

These objectives can be reached through frequent review of metrics and management of information currently available, assisting others in OIT in implementing additional metrics, and auditing activities to verify process compliance.
Often CSI requires that we adopt a root cause problem solving approach to a difficult problem. Problems in ITIL are described at two levels:

  1. Incidents – any time a process fails to meet the promised standard
  2. Problems – incidents that reoccur with enough frequency that standard solutions known as ‘workarounds’ can be determined and become part of a regular process

When a problem is particularly difficult and a solution cannot be easily found then it is assigned to a root cause problem analysis team. This team is usually comprised of representatives from each formal work group who has any involvement in the failing process.  These members generally come from OIT but may also include the customer when appropriate. The team is led by a facilitator trained in the use of root cause process analysis tools. These tools come from the quality and process improvement disciplines often referenced as TQM, lean, Kaizen, or Six Sigma processes.
The group gathers together to define the problem in detail including the impact on customers. The group gathers any metrics available to measure the performance issue and includes any baseline performance data for good performance.
Next the group outlines possible causes and begins a systemic elimination of any possible causes. The goal is to produce a matrix of all related causal factors and to create a better understanding of how to resolve performance issues while mitigating the possibility of the problem returning. During this process the group also explores the opportunity to improve on the effectiveness and efficiency of the baseline process.
Once the problem has been resolved, a session is held to elicit lessons learned and documentation is kept for use in similar situations. The ultimate goal is to avoid having to relive the problem over and over again.
We have implemented a few of these root cause teams in the last year and while the teams have achieved desired results we haven’t always been happy with the amount of time it took to get results. The good news is we are getting better at it and the time frames are generally shrinking. We are asking better questions, collecting better data, and getting faster results.
We have developed a few significant lessons learned from this experience:

  • Clear and accurate definition of the problem and impact is critical
  • Clear definition of terms is critical
  • When faced with a difficult problem, look at the process first
  • Use what data you have and note where missing data is critical – take steps to begin gathering data where needed
  • Put data gathering and documentation as a priority before, during and after experiencing problems – gathering data after the fact isn’t easy or always possible.
  • Trust the process and tool sets to get results
  • Create short term and long term improvement plans – agree on priorities
  • Follow up on those plans to ensure that they are being implemented.

Two of W. Edward Deming’s 14 Quality points are worth considering here: "Improve constantly and forever" and "The transformation is everyone's job". We are on our way to full ITIL based best practices as we adopt problem solving and resolution processes to ensure that we eliminate as much as possible variation and performance problems across OIT.