Trahan, Jonathan
2016-05-25 15:11:36 UTC
Hi,
We are getting some slow execution time with the 1.2.8 Windows agent when wmi_cpuload and msexch sections are included in the .ini configuration file of the agent. We poll the agent every 5 minutes, and we get 1 slow poll out of 4 executions.
06:30:38 06:46:44 2.43% OK Agent version 1.2.8, execution time 2.6 sec
06:46:44 06:51:47 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
06:51:47 07:07:52 2.42% OK Agent version 1.2.8, execution time 2.5 sec
07:07:52 07:12:55 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
07:12:55 07:29:00 2.42% OK Agent version 1.2.8, execution time 2.6 sec
07:29:00 07:34:41 0.86% CRIT CRIT - Check_MK timed out after 60 seconds
07:34:41 07:50:47 2.43% OK Agent version 1.2.8, execution time 2.5 sec
07:50:47 07:55:49 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
07:55:49 08:11:55 2.43% OK Agent version 1.2.8, execution time 2.5 sec
08:11:55 08:16:58 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
The slowness seems to come directly from the agent execution, as a strace on the python execution when the agent is polled shows that it is seems to wait for the result to get back.
44507 17:28:31 connect(3, {sa_family=AF_INET, sin_port=htons(6556), sin_addr=inet_addr("10.16.30.98")}, 16) = -1 EINPROGRESS (Operation now in progress)
44507 17:28:31 poll([{fd=3, events=POLLOUT}], 1, 5000) = 1 ([{fd=3, revents=POLLOUT}])
44507 17:28:31 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
44507 17:28:31 fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
44507 17:28:31 fcntl(3, F_SETFL, O_RDWR) = 0
44507 17:28:31 recvfrom(3, "<<<check_mk>>>\nVersion: 1.2.8\nBu"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "veur DCOM\ndefragsvc stopped/dema"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "osoft Exchange\nMSExchangeTranspo"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "ing/auto HP ProLiant System Shut"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "89120859161 131085989120859161 1"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "760 7601664 14270908928 53718016"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "54 1713826986 1671394714 1499169"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:30:59 recvfrom(3, "el,Name,NameFormat,NetworkServer"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:31:00 recvfrom(3, "etoProcessaFederatedFreeBusyRequ"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:31:00 recvfrom(3, "0,,0,0,1,0,1,11024,0,0,0,0,0,0,0"..., 4096, MSG_WAITALL, NULL, NULL) = 1339
44507 17:31:01 recvfrom(3, "", 4096, MSG_WAITALL, NULL, NULL) = 0
44507 17:31:01 close(3) = 0
Is this a known behaviour? Should I allow more time than 60 seconds for the agent to execute if those sections are included?
Is there a way to get more information from the Windows agent? I am not that familiar with C++ so I do not know if there is any debugging tool that could help me get more information to diagnose the issue.
Cheers,
Jonathan
We are getting some slow execution time with the 1.2.8 Windows agent when wmi_cpuload and msexch sections are included in the .ini configuration file of the agent. We poll the agent every 5 minutes, and we get 1 slow poll out of 4 executions.
06:30:38 06:46:44 2.43% OK Agent version 1.2.8, execution time 2.6 sec
06:46:44 06:51:47 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
06:51:47 07:07:52 2.42% OK Agent version 1.2.8, execution time 2.5 sec
07:07:52 07:12:55 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
07:12:55 07:29:00 2.42% OK Agent version 1.2.8, execution time 2.6 sec
07:29:00 07:34:41 0.86% CRIT CRIT - Check_MK timed out after 60 seconds
07:34:41 07:50:47 2.43% OK Agent version 1.2.8, execution time 2.5 sec
07:50:47 07:55:49 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
07:55:49 08:11:55 2.43% OK Agent version 1.2.8, execution time 2.5 sec
08:11:55 08:16:58 0.76% CRIT CRIT - Check_MK timed out after 60 seconds
The slowness seems to come directly from the agent execution, as a strace on the python execution when the agent is polled shows that it is seems to wait for the result to get back.
44507 17:28:31 connect(3, {sa_family=AF_INET, sin_port=htons(6556), sin_addr=inet_addr("10.16.30.98")}, 16) = -1 EINPROGRESS (Operation now in progress)
44507 17:28:31 poll([{fd=3, events=POLLOUT}], 1, 5000) = 1 ([{fd=3, revents=POLLOUT}])
44507 17:28:31 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
44507 17:28:31 fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
44507 17:28:31 fcntl(3, F_SETFL, O_RDWR) = 0
44507 17:28:31 recvfrom(3, "<<<check_mk>>>\nVersion: 1.2.8\nBu"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "veur DCOM\ndefragsvc stopped/dema"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "osoft Exchange\nMSExchangeTranspo"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "ing/auto HP ProLiant System Shut"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "89120859161 131085989120859161 1"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "760 7601664 14270908928 53718016"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:28:31 recvfrom(3, "54 1713826986 1671394714 1499169"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:30:59 recvfrom(3, "el,Name,NameFormat,NetworkServer"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:31:00 recvfrom(3, "etoProcessaFederatedFreeBusyRequ"..., 4096, MSG_WAITALL, NULL, NULL) = 4096
44507 17:31:00 recvfrom(3, "0,,0,0,1,0,1,11024,0,0,0,0,0,0,0"..., 4096, MSG_WAITALL, NULL, NULL) = 1339
44507 17:31:01 recvfrom(3, "", 4096, MSG_WAITALL, NULL, NULL) = 0
44507 17:31:01 close(3) = 0
Is this a known behaviour? Should I allow more time than 60 seconds for the agent to execute if those sections are included?
Is there a way to get more information from the Windows agent? I am not that familiar with C++ so I do not know if there is any debugging tool that could help me get more information to diagnose the issue.
Cheers,
Jonathan