[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bigsister-general] simple escalation of alerts [SOLVED!]


Escalation CAN be done out-of-the-box!

I'm answering my own question (from the -devel list) for the list
archive in the hope that this will save someone time some day.

--GOAL:

When an alarm first occurs, notify someone (eg, send an email message).
After some period of time, if the alarm is still active, notify
*additional* people. (As in "Bob didn't fix it after a half hour, better
notify Mary too!")

--SOLUTION:

A little bit of Perl eval() gymnastics in PAGER{} rules within your
bb_event_generator.cfg file. We'll make the first notification of an alarm
go to one recipient (or a set of recipients.) Then we'll make subsequent
*reminders* for the alarm go to more recipients (a light weight sort of
escalation).

The trick is to use:

(time()-$time)

in your PAGER{} preconditions. That's it, that's all the magic.

time() of course eval()'s to the current wallclock-seconds, and $time evals
to the wallclock-seconds when the alarm was first raised. Aside: in my
first post, I pined for an $age variable in the preconditions evaluation.
An $age variable could easily be defined in
share/bigsister/bin/bs_evgen.pm, but this eval() hack works without having
to edit any code.

--EXAMPLE:

# there should be a TAB after the *.* (all hosts, all tests)
# and bear in mind that this rule will generate *reminders* of
# this alarm every 10 minutes (repeate=10) unless some other rule
# changes that.
*.* delay=0 repeat=10 mail=my_escalation_group

# Now use PAGER{} to direct alerts that are to "my_escalation_group"
# to where they need to go based on how long this alarm has been
# raised.
#
# Note that the (time()-$time) yields units of seconds, while the
# repeat=10 equate is in minutes. So <=1800 is less-than-or-equal
# 30min. So I'm using <=1850 (50 seconds of elbow room) to make
# sure the initial alert, and the 10min, 20min and 30min reminders
# go to one set of mail= recipients. And the >1850 makes the
# 40min, etc reminders go to the second set of recipients.
#
# Notice that the initial (<30min) recipients are also in the
# second (>30min) set so they get the "up" message too when
# the alarm clears.
#
# You could also add more preconditions; eg, use weekday and daytime
# preconditions to make schedules if you have different people on
# call at different times/days.

PAGER{ $mail eq 'my_escalation_group' and (time()-$time)<=1850 }
suppressed
PAGER{ $mail eq 'my_escalation_group' and (time()-$time)>1850 }
suppressed,suppressed

--PITFALLS:

If you're going to use my idea of "fake" recipients, and then expand them
in PAGER{} rules, be *certain* you have a matching PAGER{} rule for all
wallclock times. If you initial recipient "set" is just one email address,
just set that in the mail= equate on the alarm rule. Then user a PAGER{}
precondition with timeframe to add more recipients by setting the mail=
equate to suppressed,suppressed . The "fake" recipients just
gives you fine grain control to break out handling of schedules and such. 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bigsister-general mailing list
suppressed
https://lists.sourceforge.net/lists/listinfo/bigsister-general


Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.