Incident Management Process

Incident Management -Production Issues- processes are created out of desperation. After a few crises, stats out of sync and unhappy users, the engineering team decides a process to handle incidents is a must

The basic version usually goes like this:

A production issue is reported
A user files a ticket
The ticket is assigned to the engineering team
An owner is assigned
The root cause (or temporary fix) is identified
A fix is ready and deployed
Users are again happy

Of course, most of the time is not a real issue. It could be the user was doing something wrong; there was a temporary downtime in the server or in the databases; there was a deployment going on.

When it is a real issue, most of the time it’s not worth to work on a definitive fix. It usually means scaling, redesigning, refactoring, upgrading, the application which is outside of the scope of an incident.

Update your incident management process and add one step at the beginning:

1) Is this really an issue?

And one step at the end:

2) Is this issue a design or scale problem?

Author
Recent Posts

Leo Celis

Founder & CEO at InTheValley

I help startups fix engineering teams that should be moving faster. If you're scaling a startup, you've probably felt the pain: great people on paper, but execution feels slow. I've been building remote teams for startups since 2005 — engineers you can trust who actually deliver and know how to leverage AI to ship faster.

Incident Management Process

Related Posts