Draft 2: 6/22/2001
The information in this document is current as of the Cricket 1.0.3 release.
Although designed as a real-time data collection and trending tool, real-time alerts (or alarms) are a natural extended functionality for Cricket. Unfortunately, because they are not a part of the core design, the alert mechanisms in Cricket are not as cleanly implemented or efficient as they could be. If your interest is purely in a tool to generate real-time alerts, then Cricket is probably not the best choice. But if you already utilize Cricket for data collection and real-time trend analysis and you have the additional need for some light real-time alerting mechanism, then Cricket can meet your needs.
In Cricket, the alert mechanism is called a monitor threshold. Monitor thresholds are set (or enabled) for specific data sources through the monitor-thresholds target dictionary tag. After the data collection pass, Cricket processes each monitor threshold by retrieving the most recent value of a data source from the RRD file and applying some criteria specific to the monitor threshold type. This criteria generates either a pass or fail condition. Depending on the setting of the persistent-alarms tag for the target, Cricket executes a specified action.
Note that the most recent value of a data source from the RRD file will not necessarily agree with the most recent value fetched from by the collector because RRDtool interpolates. For those familiar with RRD tool internals, the "most recent value" is retreived from the first RRA in the file with a consolidation function of AVERAGE. The order of RRAs in the file is specified by the rra tag in the targetType dictionary.
Note that a monitor threshold configured for a multi-instance (aka vector instances) target will be checked and an action possibly executed for each instance. Monitor thresholds are not supported for multi-targets (as multi-targets are purely a construct of the Cricket grapher).
Syntax:
monitor-thresholds = "<monitor-threshold> [ , <monitor-threshold> ... ]"
<monitor-threshold> := <data source>:<monitor type>:<monitor type args>:<action>:<action args>
<data source> := a data source defined for the target
<monitor type> := One of the six supported types: exact, value,
relation, hunt, quotient, or failures.
<monitor type args> := a colon-delimited list of arguments specific
to each monitor type
<action> := One of five supported actions: SNMP, MAIL, EXEC, FUNC,
or FILE.
<action args> := a colon-delimited list of arguments specific to
each action.
Examples:
 
target --default--
mail-pgm = /usr/bin/mailx
trap-address = 127.0.0.1
persistent-alarms = true
monitor-thresholds =
"ifInOctets : value : n : 262144 : SNMP,
ifInOctets : relation : <10 pct : : : 300 : MAIL : %mail-pgm% : me\\\@mydomain.com,
ifInErrors : quotient : 0.1 pct : : ifInUcastPackets : SNMP"
target pop-2
persistent-alarms = false
monitor-thresholds = "users : hunt : 40 : pop-1 : users : FILE : /var/log/cricket-alerts"The first target, presumably a network link, has three monitor thresholds. The first generates a SNMP trap whenever the bandwidth exceeds 2 Mbps. Assuming a polling interval of 300 seconds, the second monitor threshold checks to see if the current ifInOctets value is within 10% of the value recorded for the last interval (300 seconds ago). It computes abs(ifInOctets_now - ifInOctets_then)/ifInOctets_then and compares this with 10% (0.1). If traffic levels have increased more than 10% over the interval, it invokes mailx to send a mail message to me@mydomain.com (note the escaped backslash and escaped '@'). The third monitor threshold checks to see if input errors exceed 0.1% of input packets. If errors exceed this threshold, Cricket generates a SNMP trap.
The second target, pop-2, has a single monitor threshold. Cricket will append an entry to the file /var/log/cricket-alerts when a non-zero number of users are on pop-2 yet pop-1 has not reached 40 users. Once pop-1 reaches 40 users, or pop-2 returns to a zero user count, the entry will be cleared from the file.
Persistent Alarms
By default, the target tag persistent-alarms is set to false. With this setting, the first time a monitor threshold criteria fails, the action is executed. Specifically, the Alarm() subroutine in the Monitor.pm module is invoked; the action and its arguments are passed as arguments. If the criteria continues to fail (at subsequent data collection passes), the action is not executed again. After one or more failures, the first time the monitor threshold criteria passes, the action is executed. In this case, the Clear() subroutine in the Monitor.pm module is invoked, with appropriate action and action arguments. Thus the default behavior is like a switch that toggles states when the result of the monitor threshold criteria changes.
If the target tag persistent-alarms is true, the action is executed (the Alarm subroutine is invoked) every time the monitor threshold criteria fails. An action (and Clear subroutine) is still executed once the first time the criteria passes after a string of failures. With persistent-alarms set to true, monitor threshold behavior is like a bell. It keeps ringing until the problem stops.
Monitor Types
The monitor type determines the criteria used to check a monitor threshold.
Exact monitors are the simplest to use and configure, and allow you to monitor a datasource for an exact match. This is useful in cases where an enumerated (or boolean) SNMP object instruments a condition where a transition to a specific state requires attention. For example, a datasource might return either true(1) or false(2), depending on whether or not a power supply has failed. The exact monitor expects one argument; the value on which the monitor will trigger. For example, monitor-thresholds = "dsPowerFail:exact:1" would cause Cricket to send a trap when the last value of the "dsPowerFail" datasource in the RRD file for this target is 1.
value : The next simplest monitor type, value monitor thresholds take two arguments, a minimum and maximum value. If the data source strays outside of this interval, the monitor threshold criteria fails. To omit the minimum or maximum value, use the character "n".
relation : Relation monitor thresholds are very flexible. A relation monitor considers the difference between two data sources (possibly from different targets), or alternatively, the difference between two temporally distinct values for the same data source. The first data source is the data source for which the relation monitor threshold is defined. The difference can be expressed as absolute value, or as a percentage of the second data source (comparison) value. This difference is compared to a threshold argument with either the greater than or less than operator. The criteria fails when the expression (<absolute or relative difference> <either greater-than or less-than> <threshold>) evaluates to false. The four colon-delimited arguments for a relation monitor are:
Actions
After the monitor threshold is checked for the current value, Cricket may execute one of several actions. Each action requires one or more arguments, which appear as a colon-delimited list following the action tag in the monitor threshold specification.
SNMP: Generating a SNMP trap is the default action if the action tag is omitted from a monitor threshold specification. To support this default and for backwards compatibility, the action SNMP does not use the action arguments field in the monitor threshold specification. The SNMP action instead requires the attribute trap-address to be set for target. The traps Cricket sends are marked with the enterprise OID ".1.3.6.1.4.1.2595.1.1". The generic type is 6 and specific type is 4 for failure (violation) of the monitor threshold criteria and 5 for success (recall the trap is cleared on the first success after one or more failures). There are currently six varbinds: the monitor type, the monitor threshold string, the target name, data source name, cricket user name (set to "cricket" on Win32 platforms), and instance number (to distinguish targets with multiple instances). These varbinds are set (and could be customized) in the sendMonitorTrap() subroutine in Monitor.pm.
MAIL: This action sends email to a specified address via a Berkeley mailx compatible mail program. The first action argument is the program to invoke to send email. It is assumed that this program is compatible with Berkeley mailx. That is, the program accepts piped input as the message body, and supports a "-s" command flag to specify the subject. If you don't have such a program on your system, you may wish to customize the code in the sendEmail() subroutine in Monitor.pm to utilize your email program. The second action argument is the recipient's email address. Note that as in the example, you may need to escape special characters. Both arguments are required. The mail message body includes the following information: the monitor type, the monitor threshold string, the target name, data source name, the value of the data source retrieved from the RRD file, and the instance number (to distinguish targets with multiple instances). To change the contents of the message, customize the sendEmail() subroutine in Monitor.pm.
FILE: This action appends and deletes entries (lines) from a file. When the monitor threshold criteria first fails, a line containing details in a space-delimited format is appended to the file specified as the action argument (the FILE action has only one argument). Subsequent failures do not add multiple lines to the file. The FILE action essentially ignores persistent-alarms = true (though some overhead is incurred to detect duplicate lines, so persistent-alarms should be set to false when possible for targets using the FILE action). When the monitor threshold passes again after one or more failures, the line is deleted from the file. The line details include the target name and the data source name. To include other details, customize the LogToFile() subroutine in Monitor.pm.
EXEC: This action executes a shell command or script. The first action argument is the shell command or script to execute when the monitor threshold criteria fails. The second action argument is the shell command or script to execute when the monitor threshold passes again after one or more failures. The EXEC action provides a mechanism by which automated corrective action can be taken.
FUNC: This action is similar to EXEC, except that a perl subroutine defined in the Cricket scope is executed. The first action argument is the function invoked when the monitor threshold criteria fails. The second action argument is the function invoked when the monitor threshold passes again after one or more failures. To use this action, you must first modify the entry in the func.pm module to set the global variable gMonFuncEnabled. Using this action requires customization (you must write the subroutines). While this mechanism provides complete flexibility in handling special cases, the invoked subroutines cannot easily accept arguments (this can be done, but the argument list must be included by name in the action arguments which can quickly become unwieldy). If your function requires access to arguments available in the Alarm() and Clear() functions, you might consider adding a new action tag (and sharing your work with the Cricket community).
Acknowledgements
Javier Muniz first implemented Monitor Thresholds in Cricket. Jeff Younker implemented the EXEC and FUNC actions. Adam Meltzer implemented the first cut of the MAIL action. Jeff Jensen and Jeff Allen made contributions too numerous to count.
Questions or comments: contact Jake Brutlag