IT Operations Getting 50% more work done, with same staff

Would you not like to be able to deliver 30-50% more with the same staff?

Then read on, but do read the whole article, to get the whole picture.

The perspective is an internal IT organisation, that has the common setup of two departments IT Infrastructure & Operations and IT Development.

My assumption is that this is a rather mature ITIL process based organisation, especially when it comes to the core operational processes.

For the last couple of years there has been a trend on development side to move to agile project development methodologies that has its origin in Lean and TOC, with focus on getting rid of waste and improving the throughput. DevOps has been another buzzword, to bring development department and operations department tighter together and work against the common objectives.

Agile for IT Development focusing on getting more of what the customers want out and faster.

This article is intended to trigger the same journey on IT Operations, by increasing the capability of delivering:

  • Significantly faster
  • With significantly higher reliability
  • Significantly more

Under following restrictions:

  • Without adding resources
  • With no degradation in quality
  • With no added stress to the staff

Do you want to learn how you do this?

The approach I have described in this article does not require you to change your organization, instead focus is on how we work and handle the incoming requests.

One interesting aspect for operations is this fact “yesterday’s solutions are

causing most of today’s problems” (Peter Senge’s bestselling classic, The Fifth Discipline).

This article is a theoretical analyze, and have not been tested out in reality, but conclusions are based on knowledge and logical validations.

IT Operations Goal

Let me state a generic goal definition for IT Operations, so we have this as a baseline to identify what we need to change.

To be able to define the goal of IT Operations we should really put it into its context of the company so we can relate its goal to company goal.

But let’s be pragmatic, the company’s goal is to earn money by delivering products and services to the customers that want to buy from a reliable supplier and as fast as possible e.g. delivering to the market (throughput) and at the same time minimizing inventory and minimizing operational expense.

A company has two main cost factors;

  • the inventory cost which is all HW, software, desktop etc. everything you buy
  • the operational expenses, your staff, external cost for renting your facilities etc.

The main income is based on throughput, how much of your cost is converted to something going out of the company and consumed by the customers. In other words, the output of the inventory and operational expenses.

IT Operations is part of IT, the Goal for IT is to provide the digital solutions for the products and services the company sells in the volumes and in the speed the company can sell them.

IT has three main objectives;

  1. Provide continuity in the existing services and products
  2. Provide new features and improved existing features for current services and products
  3. Provide new services and products

Both IT Operations and IT Development share the three objectives, they also inherit the demand on maximize throughput, minimize inventory and minimize operational expense from the company level.

It Development has started with agile projects and a journey around CI/CD (continues integration & Continues Delivery) to improve the throughput.

It Operations has started with Automation, but what else can we do to improve the throughput?

If we sum this up in a Goal Tree for IT Operation it can look like this.

You off course must tune it to reflect your own organisation, but it should give you an understanding of the content and details that is enough for the scope of this article.

IT Operations flow of work

We interact with our customers with our first line of support, that is backed up with the second and third line specialists grouped into competence silos.

First line support is staffed 24*7 and is reacting requests and on events coming in from our monitoring solution, they are doing the initial work to solving incidents.

Requests are then passed to second line support, which is part of third line support.

Third line support have the deep skilled specialist, that are grouped in technology silos.

Third line support is staffing up second line support, by having one person on daily duty, and a second person on nightly duty.

Your organisation might have a slightly different setup, that is ok, article will still be applicable for your setup.

IT Operations provided services

IT Infrastructure & Operations provide services in the scope of ITIL processes, infrastructural services and project resources.

Below picture are summarizing up the essential parts without describing everything in details, you can easily fill in and complement the list with your own services.

Obstacle of maximizing throughput

Why maximize throughput?

Throughput is the speed you get products and services out to your customers.

In your company, you have factors that limits your throughput, I call them constraints.

The constraint is one of two things;

  • internal constraint
    • we have something in our system (IT Operations) that is not producing with the speed needed
      • we are not selling enough (market can consume more than we sell)
      • we have a bottleneck
    • we do not have the needed money
    • we have policies that prevent us from delivering
  • external constraint
    • our market segment is not interested in our services or products
    • we can produce more than market can consume

If we look at the internal constraint, it is usually a bottleneck and we can be visualize it like this:

The pictures show that one work centre cannot keep up with incoming load, and is therefore slowing down the company and preventing it from delivering.

In this chapter I am visualizing some of the common constrains that exists in IT Operations organisations and suggest how we shall handle them to improve the throughput.

Negative effects of your Policies

Policies can ether contribute positively or prevent your organisation from performing.

Remember this simple rule:

  • The way you masseur me, will reflect in how I behave.

“the way you measure me, will reflect in how I behave” so make sure the policy’s you have and the metrics you use are helping your organisation perform good.

Eli Goldratt “Tell me how you measure me and I will tell you how I will behave. If you measure me in an illogical way… do not complain about illogical behavior…”

In next chapter, we will talk about the effects of resource allocation, most IT Operations organisations are working with resource allocation and have a policy to measure each team based on resource allocation.

The amount of resource allocation is also relating to if a team can recruit or not.

The focus is on load on individuals instead on the overall team load or competence groups load in the team.

This is creating an unbalanced team, as the team is domed to have individuals that is overloaded periodically whiles others at the same time has spared capacity.

So instead of getting a positive effect, it will drive cost and increase stress.

This is a very important area to analyse as if you have a policy that are creating negative effects, changing it has “no cost” for implementation, and you will get value back fast.

Take a look at how you measure your organisation and ask yourself how the measure contribute to:

  • Minimize inventory
  • Minimize operational expense
  • Maximize throughput

If your measurement does not contribute to any of above objectives, consider what negative effects they have on above objectives, and how you shall handle these negative effects.

Negative effect of resource allocation

In all IT departments, I have worked in we have been using resource allocation plans, to allocate resources to projects, this is a natural way of working in a project oriented organisation, so rather expected.

In general, the concept is that resources are assigned to projects based on a forecast, this can be on monthly sometimes on a more granular basis.

Normally the staff is allocated to project 30-70% rest of the time is spent on daily operations.

The positive effect of resource allocation is the continuity for a project, project team have the possibility to get to know each other.

Negative effects are that;

  • resource can be sick and this might delay the project
  • resource might be hocked up with incidents and delaying projects
  • resource is allocated to many projects
    • Resource is not available when needed
    • Resource is periodically overloaded; more than one project needs assistance at the same time
  • Team might have spared capacity but individuals are overloaded with many project requests at the same time
  • Responsibility for a delivery is on an individual and not a team responsibility
  • Staff is focusing on their own work and is not stimulating a team cooperation
UDE/Interference Injection Desired Effect
Allocated project resource is not available when project needs them Assign work to team Resources are available to execute the request in time
Allocated resources are overloaded Assign work to team Balance the work over the team
Individual responsibility for the delivery to a project Assign work to team Team takes responsibility for the delivery and will offload individual pressure
Staff is focusing on their own work Assign work to team Stimulate team cooperation

Negative effect of multitasking on throughput

Our staff get requests via multiple channels, and is therefore interrupting current ongoing work.

Out staff is allocated to more than one project and must multitask to support all projects.

If we on top of this also add in the effect of ITIL processes, like incidents, problems etc. where an incident often takes prioritisations over project tasks.

To sum it up a person has a flood of unsynchronised work coming in through multiple channels;

  • ITIL process related tasks like incidents, change, problem, service requests
  • Project tasks
  • Assisting a colleague solving a problem
  • Responding on demand request via chat, mail and phone
  • Doing a favour for “friends”

kärmavbild 2017-06-11 kl. 16.39.38.png

Yes, Operational staff often must be master in multitasking.

And I think you all know the negative effects of trying to multitask (which is not possible for a human being, we are switching between the tasks);

  • Execution time of tasks increases
  • Quality goes down
  • You get stressed and your health goes down

You can make a very quick test of how multitasking is affecting you;

The task is to write following 3 strings of characters

  • Multitask
  • Abcdefghi
  • 123456789

Test 1: multitasking

  • Write the first letter of each string, then second, and so on until all 3 strings are written and looks like above table

Test 2: focus on one task at a time

  • Write the first string, then write the second string, and last the third string

This was my result;

  • Test 1: 54 seconds
  • Test 2: 12 seconds

Even if this was a very simple task, I sometimes was struggling during Test 1 of which character to write, if this had been a more complex task, the likelihood of errors had gone up dramatically.

UDE/Interference Injection Desired Effect
Staff is multitasking All tasks shall come into one queue WIP is going down

  • Staff is less stressed
  • Lead time is going down
  • Quality is going up
  • Free up time for team mate support
  • Free time for team continuously improve way of working in the team

WIP stands for Work In Progress.

Negative effect of how we work with tasks

Here is some common knowledge on how people work with assignments:

  • Not starting the task until the last moment (Student Syndrome)
    • If you have kids in school you know this is true.
  • Delaying (or pacing) completion of the task (Parkinson’s Law)
    • Some people delay tasks, they fine tune it, just to prove that their estimation was correct.
  • Cherry picking tasks
    • Pick the fun staff first.

Negative effect of current task Prioritization

Work prioritization is based on;

  • Priority scale used by
    • incident, problem
    • Order projects
    • Used as a general way of deciding on what to do first
  • Task dependency order
    • Project task
  • Due date
    • Project tasks
    • Change request
    • Service request
  • Poking
    • The more someone interrupts you and check if you have helped them the higher attention they get
    • “Friends” are reaching out and requesting assistance, and you always help friends as you might need a favour later
  • Fun staff
    • Fun tasks get priority over boring staff

It is very hard for an individual to know which activity is most important, which results in wrong prioritizations.

In a throughput world, the only unified priority model is due date, as it works for all kind of tasks, incidents as well as project tasks.

UDE/Interference Injection Desired Effect
Unclear what task is the most important Unified prioritisation model based on task due date It is always be clear what task is the most important

Negative effects of Due date

Due date for projects is based on;

  • estimation of time required to execute the task, the estimator adds a buffer, the project manager usually also adds a buffer.
  • Dependency in execution order to other project tasks
  • Project manager want work started as early as possible, to minimize risks of delays

The negative effects are;

  • the way due date is defined, project managers request the work to be done as early as possible as operations is not trusted to deliver in time.
  • The estimation has a large buffer both added by project manager and the estimator

If we set the due date to the time when task is needed to be finished we can use it for prioritising our work.

UDE/Interference Injection Desired Effect
Due date is set to as early as possible Set due date to when the task must be finished Work is done when it is needed to be delivered

Negative effects of task lead time estimation

It’s hard to estimate lead time, in Scrum project methodology they introduced reference tasks to be used as a baseline for estimating a lead time.

The reference tasks are work that the team previously have done and all in the team understand and can relate to.

How do we normally estimate task lead time?

Estimating lead time:

  • The estimator, estimates the lead time based on high probability for success
  • The estimator adds a buffer
  • The project manager usually also adds a buffer, as based on experience things get late

We have a lot of time added into buffers, and they can be considered as waste, as it should not be needed if all goes well.

TOCPM (Theory Of Constraint Project Management) teach to estimate lead time based on a 50% probability for success, which means no waste.

In TOCPM tasks buffers are added to a shared project buffer (TOCPM experts, yes there is other buffers as well, but not needed to be described here).

The buffer gives the project a balance to handle exceptions, and at the same time makes the project staff focusing on the short lead time, where there is no slack, it must be done in a focused way to be successful.

To estimate without hidden safety margins, assume the following parameters:

  • People have everything they need to perform a task.
  • The task is completed without unforeseen problems.
  • Work is focused on the task, without interruption or multi-tasking.

Let’s use TOCPM task lead time estimation

Let’s also add in a buffer time of 10-25% based on the task complexity.

But lets add it to the due date and not on the lead time, in other words move back the due date based on the buffer.

This will make sure we have full focus on the delivery, and we will deliver slightly ahead of schedule.

UDE/Interference Injection Desired Effect
Huge lead time buffer, added by both estimator and project manager Estimate tasks to a probability of 50% success and add a time buffer on due date.
  • Force assignee of task to focus on the deliver.
  • Remove waste.
  • Have a buffer on the due date, so most tasks should be delivered ahead of schedule

Negative effect by ticket flows

Requester of a ticket, assign the ticket to the team they believe will solve it, in some cases this is not the right team and in other times the team cannot solve the whole request.

Some requests are very small, and takes a minute or up to 10 minute to execute, and might be a stopper for the requester and the project or work the requester is working on. Many times, the requester choose not to use the ticketing system and is doing an ad hoc request to someone they know in the team.

Incident resolution time is sometimes long due to its hard to identify the cause of problem, especially when it is cross team analyse that is required, it can result in “no error here” syndrome and everyone is waiting for someone else to find the problem.

Negative effects are;

  • Tickets are sent from one team to next team
    • In each queue the ticket has a wait time
    • Some ticket is bunching between teams until the ticket is solved
  • Small ticket takes has a long lead time
  • Small tickets are not always created and requester interrupts team members via direct contact.
  • Incidents with unknown root cause takes long time to solve
UDE/Interference Injection Desired Effect
Small tickets have long lead time and generates delays on requester One ticket queue, two execution pipeline

  • Speed line

Normal line

Prevent interrupts of staff members

Responsive in handlingthe tasks

Tickets are bouncing between team Use speed line for assistance from other teams

Locate the speed line resources from all teams into same location

Stand-up and ask for assistance, joint forces
Urgent Complex incidents are causing “no error here” syndrome ??? No injection yet !!! Work as a task force together to quickly isolate cause and solve incident

Negative effect of unique competence

All companies have people that are “the specialist”, to whom we turn with complicated problems.

In the long run this person might end up being involved in many key activities and by doing so turns into being a bottleneck.

To handle this situation we need to free up this resource so there is time to execute the tasks that requires the unique competence, and also to train other team members to learn this knowledge.

A bit of a background

Last year I was on training in France, learning how to think, with clear validated logic, now I have a method for knowing if I am changing the right thing.

I also learned how to share the logical analyse in a visual way that anyone can understand, power by seeing (picture is like 1000 words they say).

Changing is difficult, only the one proposing a change wants it, to bring others with you, following questions must be answered, in a logical way;

WHY change?

WHAT to change?

What to change TO?

HOW to cause the change?

It might not be enough to bring the people with you, as you must secure that people feel safe or see something good for themselves as part of the change.

Process of change

If you answer the four questions and take it step by step, you have a high possibility for success.

Before starting a change, you need to know the boundaries of what you control, I call it the system.

Its only things that you can control that you can change, but there are areas around you that you can influence and outside of that there is all the things you just have to accept and adjust to.

A simplified view of this is that you live in a flat;

  • You control what happens in the flat and how it looks like
  • You can influence your neighbours
  • You have no control of the weather outside

When changing do it on the things you control.

Why change?

There are many reasons for changing, it can be described with one word, we have a constraint.

The constraint is one of two things;

  • internal constraint
    • we have something in our system (IT Operations) that is not producing with the speed needed
    • we do not have the needed money
    • we are not selling enough (market can consume more than we sell)
  • external constraint
    • our market segment is not interested in our services or products
    • we can produce more than market can consume

We need to find out where in the IT Operations we have our constraint.

What to change?

Let’s look at an example, we shall shorten the time it takes to get to work.

If we shall improve our get to work system we need to find the constraint, in a system there is always one and only one constraint at any point in time, and we always want to focus on solving the constraint as it has biggest impact on the system.

If we take our go to work system and line up the process and add in timing based on standard variations, with a 50% likelihood of success:

Action Distance km Time minutes Speed % time spent % travel
Walk to the train 1,5 15 0,1 24 5
Wait for the train 0 6 10 0
Go by train to next stop 20 15 1,33 24 47
Wait for the train 0 6 10 0
Go by train 10 15 0,67 24 47
Walk to office 0,5 5 0,1 8 16

From this table, we clearly can see the that we the waiting for a train is the constraint, but if we assume that we need to stick to this route and focusing on the steps that is a travel, we can see that the constraint is the walking to the train.

What to change to?

We continue with our get to work system, and must find injections, alternative ways of delivering the same result.

Our constraint is walking to the train, we can come up with alternative, we can run, we can use a kicker, we can drive car, we can bicycle, we can analyse them all and we will find the best solution which is likely to use a bicycle.

We make some calculations and select go by bicycle.

How to cause the change?

We have decided that we shall use a bicycle to get to the station, how shall we make this happen.

It’s time to identify how we will do this and what negative effects that we will get and how we shall mitigate these negative effects.

I am not taking about a project plan, it’s a high-level plan that also is analysing the negative effects of using a bicycle to go to the train. This is what a product manager would do when they are planning to launch a new service, they need to do a high-level plan to understand what is needed and any obstacles on the road to it.

When this is done, we have a baseline for starting the implementation, and at that point we create the project plan with the small details.

Maximize throughput in IT Operations

We did an analyse of “why change?” in chapter 3, and we identified injections for each constrains, doing so we also identified “what to change?”, and suggested how to make the change.

The how to requires some more work!

Here is the table of suggested how to make the change, and the expected result!

Injection Number (Inx) UDE/Interference Injection Desired Effect
IN1 Allocated project resource is not available when project needs them Assign work to team Resources are available to execute the request in time
IN2 Allocated resources are overloaded Assign work to team Balance the work over the team
IN3 Individual responsibility for the delivery to a project Assign work to team Team takes responsibility for the delivery and will offload individual pressure
IN4 Staff is focusing on their own work Assign work to team Stimulate team cooperation
IN5 Staff is multitasking All tasks shall come into one queue WIP is going down

  • Staff is less stressed
  • Lead time is going down
  • Quality is going up
  • Free up time for team mate support

Free time for team continuously improve way of working in the team

IN6 Unclear what task is the most important Unified prioritisation model It shall always be clear what task is the most important
IN7 Due date is set to as early as possible and not when delivery is needed Set due date to when the task must be finished Work is done when it is needed to be delivered
IN8 Tasks has huge lead time buffer, buffer added both estimator and project manager Estimate tasks to a probability of 50% success and add a time buffer based on this
  • Force assignee of task to focus on the deliver.

Remove waste.

IN9 Small tickets have long lead time and generates delays on requester One ticket queue, two execution pipeline

  • Speed line

Normal line

  • Prevent interrupts of staff members
  • Responsive
IN10 Tickets are bouncing between team Use speed line for assistance from other teams

Locate the speed line resources from all teams into same location

Tickets are solved fast and are not bouncing between team

Stand-up and ask for assistance, joint forces

IN11 Urgent Complex incidents are causing “no error here” syndrome ??? No injection yet !!! Work as a task force together to quickly isolate cause and solve incident

One way of verifying that the injections lead to improving the throughput is using a whiteboard with post it notes, we will get a tree looking something like this.

It’s important to both validate that the injections lead to the expected result and that we identify all negative effects that they also can cause, and analyse if they can be mitigated.

The tree is a cause and effect tree, in the bottom we have the identified undesired effects, we use a cause and effect analyse with our injection and sees if they lead to an improved throughput.

We are also catching any obstacles on the way and identifies how we can mitigate them.

Above tree is not a complete cause and effect analyse, I have slimed it down so it now has some long jumps, as the tree otherwise gets too big to fit into this article.

But it is here to visualize how to make the analyse in reality.

Work In progress is determine the load on the staff in your organisation

Work In Progress is controlling the load on the team and with a goad balance we can stop multitasking.

WIP is more than task management, WIP can also be used to not overload the IT organisation overall, by controlling when a new project can start based on what is called the drum in Theory Of Constraint, which is the speed the organisation can consume projects, the drum can be a specific part of the organisation for example.

In general limiting WIP, will stop your staff from being overloaded and by doing so they will deliver in time and they will deliver more.

Control WIP in your team allocate Work to teams instead of individuals

The classic assignment methodology is to assign resources to projects as described earlier in this article, this is causing peaks on the individual level, based on that projects do not coordinate their activities between projects and between resources.

As team members are assigned to projects, they work individually and are not stimulating team cooperation and shared responsibility.

Another phenomenon is that the team is not balanced in workload, this can be very easily solved by breaking up the project assignments to small tasks that can be distributed over the team.

Many project managers have their favourite people and are always requesting them, this is good for the individual as it is nice the be requested (feel good), but it is not good for an overall knowledge base of the team.

We also have the project issue that assigned resources are sick or absent during times when tasks are needed to be delivered.

To solve the above negative effects and other described earlier in this article, tasks must be assigned to the team.

The tasks coming in requires different competence and competence level, some tasks require resources with deep or long experience to solve a task quickly.

We can group tasks into three groups;

  1. Tasks all team shall be able to do
  2. Tasks that all team shall be able to do that requires less than 10 minutes to solve
  3. Tasks need special knowledge, that few in the group possess

It is important that the team is in total growing in experience and competence level, so more tasks falls in to the first category.

We want to keep WIP for each resource in the team low, preferable only one task assigned to each person at any time, to keep focused.

Be aware that many people want to have many tasks on their table to feel important, so watch out and help your staff focus on one task at a time.

By breaking up the assignments into smaller tasks that can be consumed by the whole team, will increase the capacity of delivery as well as increase the reliability of meeting due dates and to balance the work load better in the team.

In all teams, you have specialists that have knowledge that few others have or experiences and it is important that your team mates can offload these resources so they do not end up as constrains (bottle necks), and can focus on solving and helping the team with the complicated tasks, category 3 in the list of task types.

Scrum product owner break down stories into tasks for IT Operation

Title assumes that IT Development is using the agile project methodology Scrum, but the concept is the same if you use waterfall or most other project methodology.

I will use the term product owner and you can replace that with project manager if you have waterfall based projects.

The point is that IT Operations need to help the product owners break down the epics to stories and the stories into tasks for IT Operations.

The best is to create stories for your team and then break them up into tasks that are clear and concise in terms of when they need to be done and what the criteria are for “done”.

When creating the tasks one very important thing is to offload your specialists in the team, so they can focus on the complex parts, if possible break up the tasks into smaller tasks just to offload the specialist.

It is also important that IT Operations feedback information on the predicted resource availability to product owners so they can adjust their planning of next sprint based on available resources.

To be able to do this, you need to have resources participating in the planning meeting held by the product owner, be careful when assigning this work, you need to select a resource that is skilled enough to understand what the project need and be able to define the stories in a way that the team can break down.

It might be that your infrastructure architect in your team is the prime person that are participating in this meeting and is breaking up the stories into tasks for the team.

To make it possible to do a good job here, we need to know which specialists’ competences you have in the team and who are possessing the knowledge.

Create a matrix that is of help during planning.

Multiple scrum team interdependency scenario

Why should projects want to work with us as a team instead of having dedicated resources?

A project manager always want their own resources so that they feel that they are in control.

The only way to control a resource is if you have that resource for 100% and that you also can keep that resource busy.

In all other situations, the resource is working with other things and project manager is not in control.

Many IT Development departments have or are adopting Agile project methodology and then Scrum is the most popular at the wring of this article.

When a scrum team is created, it is staffed by the resources that is expected to be used throughout the projects life cycle and almost never contains all the competences the project needs.

Many times, there is also generic scrum teams that are focusing on a competence area, one example is that we have a scrum team doing frontend development that we utilized by many other scrum teams.

In the case of multiple scrum team, we always have a dependency and a communication between teams.

We can hand over work to the frontend scrum team in two ways, give them a task ad hoc or give them a story that they bring in to their sprint.

As they are a scrum team they will prefer the second option, to get stories to their backlog, break them down into smaller stories if needed, so they can be consumed, together with their product owner and the other team’s product owner agree on prioritisation of the stories and which of them will go into next sprint.

Resource allocation

I was earlier talking about resource allocation planning, you still must continue with this but move from an individual perspective to a team perspective, the planning is needed to get an overview of expected load on the team, to easier plan internal projects.

For some companies this will cause a problem as manning sheets are one key policy’s that are driving the measurement of team performance and is a method for visualizing that the team needs more resources and by so allowed to grow.

But remember “the way you measure, will reflect in how you behave” so make sure the policy’s you have and the metrics you use are helping your organisation perform and not preventing it from performing good.

When you participate in the scrum planning meeting, you must have a feeling for the load on your team and be able to predict when your team can deliver, as during some periods there is a higher load on your team, and this has to be communicated to the product owners so they can take it into account, as many times product owners have the possibility to shift focus a little and order of the sprint content for their main scrum team to work on other things to make it possible for your team to balance out the work and deliver with quality.

Plan and prioritize maintenance task in same way as project tasks

Team projects and maintenance tasks usually ends up with the lowest prioritization, and is done when nothing else is requested.

This results in long lead time, and often a lot later than it is needed.

To sort this out the tasks must be planned and prioritise with the same method as other tasks, so they get done when they are needed.

Single task queue, with speed lane

Take control of the incoming requests, put all requests into the same queue, the team queue!

Your staff get flooded by requests from various channels, below you see the most common;

  • ITIL process related tasks like incidents, change, problem, service requests
  • Project tasks
  • Assisting a colleague solving a problem
  • Responding on demand request via chat, mail and phone
  • Doing a favour for “friends”

To be able to deliver what you have promised the projects you must put all requests into one single queue.

To be able to stop multitasking, you must stop people interrupting you, make sure that all tasks are registered in your team queue instead of a direct contact with you.

We need a single queue, where we can sort, group and prioritise the work.

We can group the tasks into three groups;

  • Standard tasks that all in the team can work on
  • Specialist tasks, that one or few in the team can work on
  • Small tasks that takes 1 to 10 minutes to solve

The small tasks can be critical in time for the requester, and they are today usually requested through direct contact to a person the requester know , it might be to make a restart of an application, assist a project with checking a configuration setting in an application server configuration, it might be to reset the password.

We also have the same need within IT Operation, a team needs assistance from another team, with a quick response to solve a task.

If we start separating out the quick tasks from other tasks, we can put them in a FIFO queue, First In First Out prioritization order, to be executed by dedicated resources that will watch this queue.

The rest of the request should end up in its own queue that we sort in prioritization order based on due date.

In below picture the queue are referred to as pipelines, but as we are an ITIL based organization we will be using some task system, and it is likely easier to have two or three queues

Tasks that requires specialist competence should be prioritized the same way as all the other tasks, but has to be treaded specially when it comes to planning the execution as they can only be done by a few resources.

How many pipelines / queue can a team have?

A team should have one incoming queue and then break it up into as many task pipelines as needed.

The task pipelines are internal queues and depending on what kind of task load you have you might have to have different amount of people assigned to each queue.

For example the speed pipeline or express queue if there is many tasks you might have to move more of your staff to that queue as long as you do not jeopardies any due dates in the queue you moved the resource from.

 

Here is one example with three queues

We have the teams in queue:

  • We separate out the tasks for the express queue
  • We sort the rest of the tasks based on due date
    • We separate tasks into specialist queue and the generic queue

 

Express queue is staffed based on volumes in the express queue but in balance with the standard queue.

If there is urgent tasks like the red task in the queue we have to move some of the staff from express queue to task queue.

The specialists focus on the tasks in the specialist queue as long as there is no red or yellow task in the standard queue.

Teams focus is to secure that all tasks remains green and is finished before due date, which means that staff have to move around to optimize throughput based on need.

Set uniform task priorities

We have two phases of the lifecycle of a task:

  1. Waiting for execution
  2. Executing the task until it is done

During phase 1, “Waiting for execution”, the later we start the higher risk is there to not finish in time

We can express this in how much buffer we have before we have to start working on the task, the less buffer the more urgent it is to get started.

Buffer = “days left to due date” – “lead time”

During phase 2, “Executing the task until it is done”, the less time left in relation to amount of work left defines the risk

We can express this in how much buffer we have before we get late.

Buffer = “time left to due date” – “remaining lead time”

As long as Buffer remains positive we are on the safe side.

We can create a scale to help us define the priority, it could look like this.

I have not defined a proper formula for mapping buffer to the scale, so here is something you can contribute to!

I am using a Kanban board to visualizes the setup.

We have one queue of work, and it is prioritised based on time left in buffer of the task, before we have to start working on it.

Buffer = “days left to due date” – “lead time”

The less buffer the higher priority.

I have selected a 3-graded scale plus a VIP priority, that is overriding all other activity, use it for major incidents and other things that has huge impact to your company.

When we have moved the task to “Work in progress”, we need to track how it goes, if task turns out to be more difficult we might need more time than estimated to finish the task.

The task should then increase in its prioritisation to reflect that we have an issue here that need more attention.

The lead time is estimated based on a 50% probability for success, but as we added a small buffer, we have a small margin that will handle problems.

The less buffer the higher priority.

Buffer = “time left to due date” – “remaining lead time”

Variations, can we control them

In IT Operations we are focusing on standardisation, and has been doing this for years to cut cost and lead time.

Even with the best standardisation you can bring up there is variations in the system that you cannot control.

If I ask you how long time will it take you to get to work, you will answer about x minutes, the word “about” is referring to the variations in the system, in this case the system is taking you from home to the office. So even if you have standardised on how you get to work there is variations. The variations are anything’s that happens during the way, there might be a traffic yam, there might be an electricity outage or a break in the signal system for your local train.

So instead of trying to get rid of all the variations, we must focus on managing the variations, you need a buffer in our system that makes all part flow at maximum speed.

This is simplest explain if we have an assembly line with two stations we name them A and B, A can assembly 10 units, B can assembly 5 units. B is our constraint, to optimize our capacity we need to secure that B always is working with maximum capacity 5 units, this is managed by adding in a buffer in front of B. A can produce 10 units but B can only consume 5 units, so we cannot let A go with full speed, but we can let A produce in a speed that secure that we have 5 units in buffer for example. If A breaks down B will continue working and when A starts up, it can go with full capacity until the buffer is rebuilt.

If you have this kind of process step dependency try to implement the buffer technic as it will help you balance up the work.

Wrapping it up

The focus in this article has been on how we can improve throughput by understanding what things prevent us from maximizing our throughput.

For implementation, I have a few generic suggestions on roles and the flow of work, they are summed up in this chapter.

Plan and prioritize the teams task queues

Sort new tasks into FIFO and standard task queue.

Check if task needs specialist competence.

Plan and follow up so specialist resources are available to handle the specialist tasks.

Remember to free the specialist resources up from other tasks so they do not end up as bottlenecks.

Role Description
Task work planner Group incoming tasks into work queues

Prioritize the new task and the queue

This role is responsible for making sure tasks are started in time for execution

And if specialist is needed they are freed up to work with the task in time

Plan incoming work with projects

This work requires technical expertise as well as a holistic understanding of the need, I would imagine that the work is performed by the technical architect/s of the team in cooperation with the “Task work planner” and the team.

Projects needs assistance in breaking down epics into stories and create stories for your team and help the project plan the execution of the stories in a way that your team can deliver the stories when the result is needed by other stories.

Stories needs to be broken down into tasks for the team and planned with correct due dates and estimated lead time based on a 50% chance for success.

Due Date must by adjusted to include a buffer of 10-25% of the lead time to secure that we are not delaying the project.

Projects usually have planning meetings one to four times during a sprint to plan ahead, to be able to handle all project it is important to work with product owners so we are attending periods in the project when IT Operations are needed.

Prioritise ongoing work

When work has started, it is important to keep track on how it goes.

This can be done as a morning stand-up similar to how scrum team have their stand-up meeting.

Or as an alternative this is handled by the assignee of the task, as part of their responsibility to update status of the task and if the task has problem change the priority based on the prioritisation model.

Remember this formula;

Buffer = “time left to due date” – “remaining lead time”

When buffer is negative we are in the risk soon, and the more negative it gets the worse situation we have.

I do recommend some kind of daily stand-up, to stimulate team cooperation, and knowledge sharing.

WIP Leader

When changing the way we are working, we need to help our staff with minimizing WIP (Work In Progress).

WIP leader can help the staff doing this as well as guard them, by educating requesters that are not using the single queue for their request how they shall work with us.

This is a journey have respect for that changing behaviour takes time.

WIP Leader Guard the staff from being interrupted and help the staff with minimizing WIP

Speed pipeline assignee

Dedicate people to work on the speed pipeline queue, it might be one or more depending on the load.

The work is focusing on quick turnarounds, and if a task is more complicated that estimated less than 10 minutes, the task should be reassigned to the standard queue.

This role should have minimum two people assigned over a day, so there is always one person working on the queue.

I also recommend that this role is monitoring for incoming VIP tasks like sever incidents etc, and starts the initial work and then hand it over to a resource working on the standard queue.

Daily oncall Responsible for executing speed pipeline tasks and be the team’s first contact when sever incident happens.

Incident management

Incidents is not different from any other tasks, they shall be prioritised based on needed due date and in relation to all other tasks.

I here you are saying, but now!

If you have agreed on me that maximizing throughput is one of the three critical success factors we have all tasks are equal, and it is a matter of when it is needed to be delivered.

All incidents do not have to be solved “now”, they can wait.

If incident has high impact, it is a VIP task with a due date now.

Picking a new task

When you are done with a task and shall start to work on next task, you always shall pick the task from the top of the list.

But before you do, check how your colleagues are doing, are they working on a task that is black, red or yellow in its priority, check if you can assist in any way to solve their task.

Remember you are in the team and the team is responsible for delivering in time, and if your team has any tasks ongoing that are in risk for a delay, help out if you can.

Try it out

Give it a try, and if you need help, reach out, I might be able to guide you.

4 thoughts on “IT Operations Getting 50% more work done, with same staff”

  1. This is great stuff. It is so simple and makes a ton of sense. Are there any organizations that you can point to that have adopted many or all of your recommendations? In theory it is great stuff, but “the proof is in the pudding”.

    Thanks for a great article!

    John

    1. Hi John Roza,
      I can not offer you the pudding yet.

      But if you take the express line and implement only that, I know it will be improving your companies total throughput and free up your companies operational expenses to be used in producing more of what you do.

      And the express line should be very easy to get acceptance for on all level of your organization,
      as it is very easy to draw the parallel to a grocery store, but it is also very easy to justify by just looking at all the waste you are preventing by making less projects and resource be in wait mode for small blocking things to be solved.

      But if you are interesting to go the whole way, let me know, and we can have a talk.

      Regards Max

  2. Hi Max. Excellent article!!! Many thanks to put all of this together in a simple way.

    If you allow me I would like to translate this to Portuguese and post here in Brazil (with the proper credits to you). What do you think?

    Best Regards, Ricardo

Leave a Reply

Your email address will not be published. Required fields are marked *