Automating Incident & Problem management – Part III: Using Orchestrator to remediate the issue (Counter edition)

After setting up all the prerequisites from Part I and Part II, it is time to create the Orchestrator runbooks that will provide the auto-remediation for active incidents and problems:

Incident Management Automation: This runbook will be triggered by the SQL-related Incidents which were auto-generated in SCSM by the SCOM Alerts. The “fix” for the generating issues, covered in Part I, will be enclosed in a PowerShell Script activity. The result is that an email will be sent to the Incident Owner, notifying him that a problem had been detected and Orchestrator fixed it.

Problem Management Automation: The counters in this runbook will keep track of the number of instances for which an incident was created. If the same incident is triggered and fixed 5 times, a problem will automatically be generated and the runbook will attach any existing and future similar incidents to it. This goes by the principle that a recurring issue should be identified as a problem, not as an incident. Once the problem is marked as “Resolved” or “Closed”, the counter is reset.

SCSM Incident automation, SCSM Problem Automation

Additionally, from Orchestrator’s perspective, this post will also demonstrate why “Counters” are not a very practical solution in most situations, mainly because they are global and not runbook-specific. This idea will be resumed in Part IV, where a more advanced replacement for counters will be implemented.

Before starting the main runbook, besides completing Part I and Part II and installing the IPs, three prerequisites need to be prepared. These include creating two child-runbooks, which might seem out-of-context at this point, but will make sense once they are integrated in the main runbook.

required Integration Packs:

Prerequisite #1 – Creating “IncidentInstance” counter

A new counter needs to be created for this activity. Call it “IncidentInsntace” or other name reminding you that it will keep track of the number of instances in which the same incident was generated. I will not go through defining a counter again, but you can find something similar in this post. Set the default value to “0”.

Prerequisite #2 – Child Runbook: Get Owner Email

Since the incident template defined in Part II defines only the owner, this runbook extracts the Active Directory user identified as the owner of the active incident and returns his email address.

Child Runbook

To start, create a new runbook and name it “Get Owner Email”. Go to runbook properties on the “Return Data” tab and add the parameter “Owner email”

Initialize Data

Define one string parameter and name it “Incident GUID”.

Runbook initialization

Get AD User Relationship

With the incident GUID from “Initialize Data”, use “Get-Relationship” activity to identify the associated Active Directory User.

orchetrator get object

Get Owner

Use “Get-Object” to extract the user defined within the incident. You can do this by finding a “user” object that matches: {Related Object GUID from “Get AD User Relationship”}.

get object active directory user

Get User

The next step is getting additional info about this user. To do this, use the “Get-User” activity form Active Directory IP and make a match on the “Display Name”.

get object filter

Return Data

For the “Owner email” parameter, use the published data to fill in the value {Email from “Get User”}

runbook result

Check-in the “Get Owner Email” runbook.

Prerequisite #3 – Child Runbook: Monitor Problem

When the same incident is triggered 5 times and a problem is created, this runbook will start running, waiting for the problem to be “Solved” or “Closed”. Once the problem is not active anymore, it will reset the “IncidentInstance” counter.


orchestrator child runbook

Initialize Data

Define two string parameters, one for the “IncidentName” and one for the ‘”Problem ID”.

runbook initialization

Get-Object

Configure a Get-Object activity, in order to get the problem whose “SC Object Guid” matches the one received as parameter in the “Initialize Data” activity.

Problem SCSM integration pack

Enable “Looping” for this activity. Define the exit condition when the status of the problem changes to “Closed” or “Solved”.

orchestrator loop

Modify counter

Subsequent to changing the status of the problem to “Closed” or “Solved”, the “IncidentInstance” will be reset.

counter reset

Check-in this runbook.

Main Runbook: Incident and Problem Automation

SCSM ITPA

Create a new runbook and name it “Incident and Problem Automation”.

Initialize Data

Drag the “Initialize Data” activity. Define a string parameter called “RA ID”. This parameter will receive the Runbook Activity ID form the “Incident Template” defined in Part II.

runbook start

Modify Counter

Drag the “Modify Counter” activity and configure it to increment the counter by “1”.

increment counter by 1

Note: As you might have already figured it out, one counter can be used for only one type of incident. This means that in the current version of the “main runbook”, there is a strong limitation making the runbook unreliable in dealing with more than one type of incident. As stated above, in Part IV, the focus will be to replace the “counters” with a better solution.

Get RA

Use the “Get Object” activity from the SCSM IP and choose the class “Runbook Automation Activity”. Filter by ID, matching the value from “Initialzie Data” activity.

SCSM integration pack, RB

Get IR Relationship

There is not much info inside the Runbook Automation Activity, so in order to get to the Incident that triggered this action, use the “Get-Relationship” activity. The incident was generated using the SCOM connector, therefore it is identified by a special class: “Operations Manager-Generated Incident”. This class extends the default incident class with extra fields filled with data from SCOM (i.e. the alert that generated this incident).

get relationship SCSM IP

Get IR

Next, use the “Get Object” activity again, but this time choose the “Operations Manager-Generated Incident”. Filter by “SC Object Guid”, matching the value from {Related Object Guid from “Get ID Relationship”}.

get Operations manager-Generated Incident

Get Owner (child runbook)

Drag the activity “Invoke Runbook” and choose the “Get Owner Email” runbook. Match “Incident GUID” parameter with {SC Object GUID from “Get IR”}.

Enable the “Wait for completion” option.

extract owner from incidnet

Get CI Relationship

Use again the Get Relationship activity, in order to identify the Computer affected by this incident. The resulting “computer info” was populated by the SCOM alert that generated the incident.

incident windows computer relationship

Get Computer

Extract the affected computer, with “Get Object” activity configured with the “Windows Computer” class, by matching “SC Object Guid” with the value resulted from the “Get CI Relationship” above: {Related Object Guid from “Get CI Relationship”}.

get windows computer object

The computer is in the pipeline now.

Check Occurrence

Using “Get Counter” activity, check the value of the counter. Reminder: in our scenario, if the incident occurs more than five times, a problem should be created and the incident should be attached to it.

check counter value

The runbook is split in branches from this point on: it will continue on the green branch if the number of occurrences is less than 5, and on the red branch if the number of occurrences is equal or greater than 5.

Green branch: Execute PowerShell script

First the link condition for the green branch.

powershell

This would probably work with the default “Run .NET Script” activity, but I prefer using the Orchestrator Integration Pack for PowerShell Script Execution 1.2 when running PowerShell commands.

Include in this script the “fix” from Part I, inside the “PS Script 01” field:

Write-EventLog -LogName Application -EntryType Error -Source MSSQLSERVER -EventId 2222 -Message “Everything is OK now”

In the hostname field, type {DNS Name” from “Get Computer”}. Fill the rest of the fields with valid credentials and make sure that the appropriate ports are open, for allowing remote execution of PowerShell commands.

configure script execution activity

Green branch: Create and Send Email.

Now it is time to use the child runbook which extracts the owner’s email address for the incident. You can also customize the body of the email with the fixed “Incident ID”: “Orchestrator took care of the {ID from “Get IR”} Incident”

notify incident owner

Red Branch

As stated above, the “red branch” will follow the scenario where the incident occurrence is equal or greater than 5. The red branch also splits in two sub-branches, depending on whether a problem for the active incident already exists or not. If the problem doesn’t already exist, or it was previously “solved”/“closed”, it will be created and the incident will attach itself to it. In case the problem already exists, but it is not yet “solved”/“closed”, the incident will only attach itself to it.

link condition

Red branch: Check Problem

Using Get Object, configure the class “Problem” and look for a match with the incident title.

get object filter

The red branch continues under the assumption that the problem does not already exist.

link condition

Red branch: New Problem

Configure the “Create Object” activity to create a new problem and use for it the same title as for the incident that triggered this runbook. Fill the rest of the fields with helpful description and classification.

get object SCSM problem

Red branch: Attach Incident

Now that the problem is created, it is time to attach the incident to it. Use the “Create Relationship” activity and define the source as the {SC Object Guid from “Get IR”} and the target as {SC Object Guid from “Create Problem”}. The relationship type has to be “Is Related to Work Item”.

create relationship incident problem

Red branch: Monitor Problem (Child Runbook)

Using “Invoke runbook”, trigger the “Monitor Problem” runbook. This runbook will remain active until the Problem is Solved/Closed.

Configure the “IncidentName” parameter to receive {Title from “Get IR”} and the “ProblemID” parameter to receive {SC Object Guid from “Create Problem”}.

invoke runbook

Red branch: Problem Temp Fix

Even if this is now a problem, you can still apply the usual fix, until further investigations are conducted.

As configured on the green branch, use again the “Execute PowerShell Script” activity and include the “fix” from Part I inside the “PS Script 01” field.

Write-EventLog -LogName Application -EntryType Error -Source MSSQLSERVER -EventId 2222 -Message “Everything is OK now”

execute script

Red branch: Create and send email

Using again the email address provided by the child runbook “Get owner”, send the following email: “Problem {Title From “Get IR”} was created/updated”.

One idea on how to extend this, since there is a “problem”: you could try identifying the direct manager of the incident owner and also notify him through this email.

notify incident owner

Red branch alternative path (blue sub-branch)

The blue sub-branch, part of the red branch of this runbook, follows the scenario where the problem for the active incident was already created.

link condition orhcestrator branch

Attach the incident to the existing problem.

create relationship scsm incident and problem

Check-in this runbook.

Connecting the main runbook to the SCOM-generated incident.

Go to SCSM, under Administration tab and synchronize the Orchestrator connector. Then go to the Library tab and create a new template. Specify the template class as “Runbook Activity Template”, give it a self-explanatory name and save it to the management pack you used before, when creating the incident template.

SCSM Template

Name the runbook activity and, most importantly, enable “Is Ready for Automation”. Without this check, the runbook will not be triggered automatically when a new incident is created.

Configure SCSM runbook activity template

On the runbook tab, select the main runbook you created in Orchestrator. As you do this, two parameters will appear under the “parameter mapping”. For the parameter “RA ID”, by using “Edit Mapping”, match it with the “ID” value found under “Work Item”.

scsm template

Open the incident template created in Part I, go to the “Activities” tab and add the “Runbook Activity” defined above. This means that when a new incident based on this template is created, the “runbook activity” will start, sending its ID to the Orchestrator runbook and continuing with all the activities within itself.

incident activity

Testing the runbook

Go again through the process described in Part II. Once the incident is created, the Orchestrator runbook should start running. If everything is OK, then the log history will look like this:

Orchestrator Lg history

Back in SCSM, the Incident will be marked as resolved.

scsm incident windows

When the counter reaches a value of 5, a problem should be generated.

problem closed

The current incidents and any future incidents will also be attached to it.

problem incident connection

In Part IV, we will revise this runbook and replace the counters with something more flexible.

Advertisements

5 thoughts on “Automating Incident & Problem management – Part III: Using Orchestrator to remediate the issue (Counter edition)

  1. Pingback: Automating Incident & Problem management – Part IV: Replacing Counters with Database tables | System Center Automation Blog

  2. great, thanks for sharing. i would like to ask you if you don’t mind, i know very little of SCOM, but i need to have a way to filter the SCOM alerts coming in to SCSM so that i can apply the correct template and in turn run the corresponding SCORCH runbook, is it something that i have to do with custom properties? , thanks for your advice.

    • While there is still not the integration between SCOM and SCSM I would like it to be, you can go in depth with filtering the exact SCOM class, or the management pack., in the SCOM Connector. Custom fields can be a little tricky, you should read more about them before considering to use them. I will stick only with class/mp filtering. You cloud go further and try to filter specific incident/alerts at the beginning of the runbook…but this means you are triggering (part of) the runbook anyway.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s