Discussion:
[Check_mk (english)] Alert off a stale check_mk check
Christopher Bowlby
2018-06-14 13:59:48 UTC
Permalink
Hi,

I ran into an issue the other day where the check_mk agent check stalled on
a target server and simply got stuck. A custom script essentially hung
while checking, thereby causing the check_mk agent to hang. I will be
looking into preventing that from our custom check, but it did bring to
light an issue that we did not receive an alert reporting that the results
have gone stale.

My question is, is it possible to alert off a stale state, and if so where
would I configure it to trigger after 5 minutes?
Paul
2018-06-14 14:08:18 UTC
Permalink
You can alert off the (active) Check MK service. Seems like it should have
been in a CRITICAL state per your description below, either due to a time
out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check stalled
on a target server and simply got stuck. A custom script essentially hung
while checking, thereby causing the check_mk agent to hang. I will be
looking into preventing that from our custom check, but it did bring to
light an issue that we did not receive an alert reporting that the results
have gone stale.
My question is, is it possible to alert off a stale state, and if so where
would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Christopher Bowlby
2018-06-14 15:36:25 UTC
Permalink
Hi Paul,

It was in a critical state which why I was confused about not seeing an
alert.

I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable and
configured a fall back address to see if that will help address not seeing
an alert.

Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should have
been in a CRITICAL state per your description below, either due to a time
out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check stalled
on a target server and simply got stuck. A custom script essentially hung
while checking, thereby causing the check_mk agent to hang. I will be
looking into preventing that from our custom check, but it did bring to
light an issue that we did not receive an alert reporting that the results
have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Paul
2018-06-14 15:49:58 UTC
Permalink
Are you receiving notifications from other hosts/services?
Post by Christopher Bowlby
Hi Paul,
It was in a critical state which why I was confused about not seeing an
alert.
I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable and
configured a fall back address to see if that will help address not seeing
an alert.
Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should
have been in a CRITICAL state per your description below, either due to a
time out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check stalled
on a target server and simply got stuck. A custom script essentially hung
while checking, thereby causing the check_mk agent to hang. I will be
looking into preventing that from our custom check, but it did bring to
light an issue that we did not receive an alert reporting that the results
have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Christopher Bowlby
2018-06-14 16:11:12 UTC
Permalink
Yes, and I confirmed that with setting services and hosts into a test
critical state.
Post by Paul
Are you receiving notifications from other hosts/services?
Post by Christopher Bowlby
Hi Paul,
It was in a critical state which why I was confused about not seeing an
alert.
I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable and
configured a fall back address to see if that will help address not seeing
an alert.
Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should
have been in a CRITICAL state per your description below, either due to a
time out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check
stalled on a target server and simply got stuck. A custom script
essentially hung while checking, thereby causing the check_mk agent to
hang. I will be looking into preventing that from our custom check, but it
did bring to light an issue that we did not receive an alert reporting that
the results have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Paul
2018-06-14 16:51:10 UTC
Permalink
So the check maybe went stale before you it had a chance to alert/notify.

You could try increase your stale service setting to allow the host/service
to notify. I think default is 1.5 check cycles, which is short in my
opinion. Mine is set to 5.

The setting is found under global settings in WATO. => Staleness value to
mark hosts / services stale
Post by Christopher Bowlby
Yes, and I confirmed that with setting services and hosts into a test
critical state.
Post by Paul
Are you receiving notifications from other hosts/services?
Post by Christopher Bowlby
Hi Paul,
It was in a critical state which why I was confused about not seeing an
alert.
I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable and
configured a fall back address to see if that will help address not seeing
an alert.
Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should
have been in a CRITICAL state per your description below, either due to a
time out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check
stalled on a target server and simply got stuck. A custom script
essentially hung while checking, thereby causing the check_mk agent to
hang. I will be looking into preventing that from our custom check, but it
did bring to light an issue that we did not receive an alert reporting that
the results have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Christopher Bowlby
2018-06-14 17:09:31 UTC
Permalink
Hi Paul,

I've made the adjustment, I'll keep an eye on it and let you know.
Post by Paul
So the check maybe went stale before you it had a chance to alert/notify.
You could try increase your stale service setting to allow the
host/service to notify. I think default is 1.5 check cycles, which is short
in my opinion. Mine is set to 5.
The setting is found under global settings in WATO. => Staleness value to
mark hosts / services stale
Post by Christopher Bowlby
Yes, and I confirmed that with setting services and hosts into a test
critical state.
Post by Paul
Are you receiving notifications from other hosts/services?
Post by Christopher Bowlby
Hi Paul,
It was in a critical state which why I was confused about not seeing an
alert.
I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable and
configured a fall back address to see if that will help address not seeing
an alert.
Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should
have been in a CRITICAL state per your description below, either due to a
time out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check
stalled on a target server and simply got stuck. A custom script
essentially hung while checking, thereby causing the check_mk agent to
hang. I will be looking into preventing that from our custom check, but it
did bring to light an issue that we did not receive an alert reporting that
the results have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Andreas Döhler
2018-06-14 20:52:03 UTC
Permalink
The stale state is only a state for the web view.
From core view all is ok also if there are some services stale.
The notification problem can be that a service got into the critical state
but only the soft state.
Notifications are only sent if the service or host reaches the hard state.

Best regards
Andreas
Post by Christopher Bowlby
Hi Paul,
I've made the adjustment, I'll keep an eye on it and let you know.
Post by Paul
So the check maybe went stale before you it had a chance to alert/notify.
You could try increase your stale service setting to allow the
host/service to notify. I think default is 1.5 check cycles, which is short
in my opinion. Mine is set to 5.
The setting is found under global settings in WATO. => Staleness value
to mark hosts / services stale
Post by Christopher Bowlby
Yes, and I confirmed that with setting services and hosts into a test
critical state.
Post by Paul
Are you receiving notifications from other hosts/services?
On Thu, Jun 14, 2018 at 8:36 AM, Christopher Bowlby <
Post by Christopher Bowlby
Hi Paul,
It was in a critical state which why I was confused about not seeing
an alert.
I've upgraded the instance from 1.2.8pxx to the latest 1.4.x stable
and configured a fall back address to see if that will help address not
seeing an alert.
Prior to this change I had no blocking or filtering of any service
notifications in anyway and we have received other alerts from the
monitoring instance in the past, so I'm still unsure why no alert was
triggered.
Post by Paul
You can alert off the (active) Check MK service. Seems like it should
have been in a CRITICAL state per your description below, either due to a
time out or some other scenario.
Post by Christopher Bowlby
Hi,
I ran into an issue the other day where the check_mk agent check
stalled on a target server and simply got stuck. A custom script
essentially hung while checking, thereby causing the check_mk agent to
hang. I will be looking into preventing that from our custom check, but it
did bring to light an issue that we did not receive an alert reporting that
the results have gone stale.
My question is, is it possible to alert off a stale state, and if so
where would I configure it to trigger after 5 minutes?
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Loading...