[Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?

Discussion:

Chris C

2011-03-24 12:45:03 UTC

Permalink

Hi everyone,
I know this post was from awhile ago but I would like to ask the question again.

Is it possible to increase the execution time limit for tcp agents and
snmp based checks?

My Nagios / Check_mk host has 210 hosts and 3200 services. Hosts that
are strictly snmp are using bulkwalk. I don't get a lot of false
alarms maybe less than 10 per night but that is 10 less alerts that my
guys get.

Thanks!
/Chris C

Lander, Scott

2011-03-24 12:50:55 UTC

Permalink

This post might be inappropriate. Click to display it.

Chris C

2011-03-24 13:00:49 UTC

Permalink

Hm. Are you sure I should be looking at the Nagios side of the house.

I'm trying to resolve issues like this....
2011-03-24 08:07:30 SERVICE ALERT netapp Check_MK HARD OK - Agent
version (unknown), execution time 7.5 sec
2011-03-24 08:06:34 SERVICE ALERT netapp Check_MK HARD CRIT - SNMP
Error on 172.16.10.44, execution time 11.5 sec

The netapp host is an snmp bulkwalk based based host. The netapp host
is under heavier load during this tmie because of filer side ndmp
backups and client side filesystem backups. Snmp is a low priority
process on the filer so execution times will be slower.

Is there a way to make an adjustment so I don't get an alert on the
slower execution time?

Thanks,
/Chris C

Post by Lander, Scott
Look around in your nagios.cfg file for something like
# TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off. Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands. All values are in
# seconds.
service_check_timeout=900
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
-----Original Message-----
Sent: Thursday, March 24, 2011 8:45 AM
Subject: Re: [Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?
Hi everyone,
I know this post was from awhile ago but I would like to ask the question again.
Is it possible to increase the execution time limit for tcp agents and
snmp based checks?
My Nagios / Check_mk host has 210 hosts and 3200 services. Hosts that
are strictly snmp are using bulkwalk. I don't get a lot of false
alarms maybe less than 10 per night but that is 10 less alerts that my
guys get.
Thanks!
/Chris C
_______________________________________________
checkmk-en mailing list
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

Lander, Scott

2011-03-24 13:19:15 UTC

Permalink

Nope - those errors do not look like timeouts on the nagios side. Sorry. In fact, from what you have posted here, Im not sure why you think it's a timeout error?

7.5 seconds and 11.5 seconds are within the default timeouts. This is really an error being returned from the clients. My advice would be to change the max_check_attempts to something greater then 1, so it will retry a few times before notifying you.

To do that, you would do something like:

extra_service_config['max_check_attempts"] = [
("2", ALL_HOSTS, "Check_MK"),
]

To change it to 2 tries, for instance.

I also set my ping hosts to allow a few failures before notifying;

extra_service_config['max_check_attempts"] = [
("3", ALL_HOSTS, "PING"),
]

This slows down how quickly you get notified, but is well worth it to me to get rid of the majority of false alarms.

Scott

-----Original Message-----
From: checkmk-en-***@lists.mathias-kettner.de [mailto:checkmk-en-***@lists.mathias-kettner.de] On Behalf Of Chris C
Sent: Thursday, March 24, 2011 9:01 AM
To: checkmk-***@lists.mathias-kettner.de
Subject: Re: [Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?

Hm. Are you sure I should be looking at the Nagios side of the house.

I'm trying to resolve issues like this....
2011-03-24 08:07:30 SERVICE ALERT netapp Check_MK HARD OK - Agent
version (unknown), execution time 7.5 sec
2011-03-24 08:06:34 SERVICE ALERT netapp Check_MK HARD CRIT - SNMP
Error on 172.16.10.44, execution time 11.5 sec

The netapp host is an snmp bulkwalk based based host. The netapp host
is under heavier load during this tmie because of filer side ndmp
backups and client side filesystem backups. Snmp is a low priority
process on the filer so execution times will be slower.

Is there a way to make an adjustment so I don't get an alert on the
slower execution time?

Thanks,
/Chris C

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
------------------------------------------------------------------------------------
This e-mail message is intended only for the personal use of the recipient(s) named above. If you are not an intended recipient, you may not review, copy or distribute this message. If you have received this communication in error, please notify the Hearst Service Center (***@hearstsc.com) immediately by email and delete the original message.
------------------------------------------------------------------------------------

Chris C

2011-03-24 13:28:27 UTC

Permalink

Thanks, exactly what I was looking for.

Is there a stalking option in check_mk?

Thanks,
/C

Nope - those errors do not look like timeouts on the nagios side. Sorry. In fact, from what you have posted here, Im not sure why you think it's a timeout error?
7.5 seconds and 11.5 seconds are within the default timeouts. This is really an error being returned from the clients. My advice would be to change the max_check_attempts to something greater then 1, so it will retry a few times before notifying you.
extra_service_config['max_check_attempts"] = [
("2", ALL_HOSTS, "Check_MK"),
]
To change it to 2 tries, for instance.
I also set my ping hosts to allow a few failures before notifying;
extra_service_config['max_check_attempts"] = [
("3", ALL_HOSTS, "PING"),
]
This slows down how quickly you get notified, but is well worth it to me to get rid of the majority of false alarms.
Scott
-----Original Message-----
Sent: Thursday, March 24, 2011 9:01 AM
Subject: Re: [Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?
Hm. Are you sure I should be looking at the Nagios side of the house.
I'm trying to resolve issues like this....
2011-03-24 08:07:30 SERVICE ALERT netapp Check_MK HARD OK - Agent
version (unknown), execution time 7.5 sec
2011-03-24 08:06:34 SERVICE ALERT netapp Check_MK HARD CRIT - SNMP
Error on 172.16.10.44, execution time 11.5 sec
The netapp host is an snmp bulkwalk based based host. The netapp host
is under heavier load during this tmie because of filer side ndmp
backups and client side filesystem backups. Snmp is a low priority
process on the filer so execution times will be slower.
Is there a way to make an adjustment so I don't get an alert on the
slower execution time?
Thanks,
/Chris C

_______________________________________________
checkmk-en mailing list
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

Lander, Scott

2011-03-24 13:33:31 UTC

Permalink

Your right of course. Typed it, as opposed to cut and pasted it. Sorry

-----Original Message-----
From: Sebastian Talmon [mailto:***@talmon.net]
Sent: Thursday, March 24, 2011 9:32 AM
To: Lander, Scott
Cc: Chris C; checkmk-***@lists.mathias-kettner.de
Subject: Re: [Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?

Post by Lander, Scott
extra_service_config['max_check_attempts"] = [
("2", ALL_HOSTS, "Check_MK"),
]

should be "extra_service_conf"

extra_service_conf['max_check_attempts'] = [
("2", ALL_HOSTS, "Check_MK"),
("3", ALL_HOSTS, "PING"),
]

Greetings

Sebastian Talmon
------------------------------------------------------------------------------------
This e-mail message is intended only for the personal use of the recipient(s) named above. If you are not an intended recipient, you may not review, copy or distribute this message. If you have received this communication in error, please notify the Hearst Service Center (***@hearstsc.com) immediately by email and delete the original message.
------------------------------------------------------------------------------------

Chris C

2011-03-24 13:50:54 UTC

Permalink

I read his mind. I knew what he meant. ;)

/C

Your right of course. Typed it, as opposed to cut and pasted it. Sorry
-----Original Message-----
Sent: Thursday, March 24, 2011 9:32 AM
To: Lander, Scott
Subject: Re: [Check_mk (english)] SNMP host - check_mk service: CRIT - SNMP Error in 6.0 sec - possible to increase critical time limit?

Post by Lander, Scott
extra_service_config['max_check_attempts"] = [
("2", ALL_HOSTS, "Check_MK"),
]

Sebastian Talmon

2011-03-24 13:42:25 UTC

Permalink

Post by Lander, Scott
extra_service_config['max_check_attempts"] = [
("2", ALL_HOSTS, "Check_MK"),
]

should be "extra_service_conf"

extra_service_conf['max_check_attempts'] = [
("2", ALL_HOSTS, "Check_MK"),
("3", ALL_HOSTS, "PING"),
]

Greetings

Sebastian Talmon