Adam Chesterton
2018-09-24 23:09:41 UTC
Hi Everyone,
Got another problem cropping up after our upgrade from 1.4.0p7 to 1.5.0p2
(running on CentOS 7). Periodically, the CMC will stop running, and we have
to manually recover it. This happens every 5 days or so.
I've looked at the Check_MK logs, and turned on debug level logs for the
cmc.log, but this hasn't revealed any new information. We get an entry in
the alerts.log file that configuration has changed and it is restarting
itself, and at the same time there is a traceback from an error in the
cmc.log. Shortly after this, we get an another error in cmc.log ("could not
read signal byte: Connection reset by peer") and then things just stop.
Does anyone have any ideas on what is causing this and/or how to resolve it?
An extract of the logs is below.
Regards,
Adam Chesterton
----
ALERTS.LOG
07:52:07 Configuration has changed. Restarting myself.
CMC.LOG
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: Traceback (most recent call last):
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/bin/cmk", line 96, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: exit_status = modes.call(o, a, opts, args)
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/modes/__init__.py", line 80, in
call
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: return mode.handler_function(*handler_args)
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/modes/cee.py",
line 216, in mode_handle_alerts
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.cee.alert_handling as alert_handling
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/cee/alert_handling.py", line 45,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.events as events
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/events.py", line
46, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.core as core
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/core.py", line 44,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.core_nagios as core_nagios
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/core_nagios.py",
line 45, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.data_sources as data_sources
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/data_sources/__init__.py", line
62, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from .ipmi import IPMIManagementBoardDataSource
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/data_sources/ipmi.py", line 27,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.command as ipmi_cmd
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/pyghmi/ipmi/command.py",
line 25, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from pyghmi.ipmi.oem.lookup import get_oem_handler
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lookup.py",
line 16, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.oem.lenovo.handler as lenovo
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lenovo/handler.py", line
33, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from pyghmi.ipmi.oem.lenovo import imm
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lenovo/imm.py", line 25,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.private.session as ipmisession
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/private/session.py", line 273,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: class Session(object):
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/private/session.py", line 309,
in Session
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: _crypto_backend = default_backend()
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/__init__.py",
line 15, in default_backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.hazmat.backends.openssl.backend import backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/openssl/__init__.py",
line 7, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.hazmat.backends.openssl.backend import backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/openssl/backend.py",
line 16, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography import utils, x509
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/x509/__init__.py", line 8, in
<module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.x509 import certificate_transparency
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: ImportError: cannot import name certificate_transparency
..........
2018-09-25 07:52:30 [0] [alert helper 32312] could not read signal byte:
Connection reset by peer
2018-09-25 07:52:30 [5] [alert helper 32312] still 1 unsent events, sending
them now
2018-09-25 07:52:30 [0] [alert helper 32312] could not read signal byte:
Connection reset by peer
2018-09-25 07:52:30 [5] [alert helper 32312] still 1 unsent events, sending
them now
Got another problem cropping up after our upgrade from 1.4.0p7 to 1.5.0p2
(running on CentOS 7). Periodically, the CMC will stop running, and we have
to manually recover it. This happens every 5 days or so.
I've looked at the Check_MK logs, and turned on debug level logs for the
cmc.log, but this hasn't revealed any new information. We get an entry in
the alerts.log file that configuration has changed and it is restarting
itself, and at the same time there is a traceback from an error in the
cmc.log. Shortly after this, we get an another error in cmc.log ("could not
read signal byte: Connection reset by peer") and then things just stop.
Does anyone have any ideas on what is causing this and/or how to resolve it?
An extract of the logs is below.
Regards,
Adam Chesterton
----
ALERTS.LOG
07:52:07 Configuration has changed. Restarting myself.
CMC.LOG
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: Traceback (most recent call last):
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/bin/cmk", line 96, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: exit_status = modes.call(o, a, opts, args)
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/modes/__init__.py", line 80, in
call
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: return mode.handler_function(*handler_args)
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/modes/cee.py",
line 216, in mode_handle_alerts
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.cee.alert_handling as alert_handling
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/cee/alert_handling.py", line 45,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.events as events
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/events.py", line
46, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.core as core
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/core.py", line 44,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.core_nagios as core_nagios
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/cmk_base/core_nagios.py",
line 45, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import cmk_base.data_sources as data_sources
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/data_sources/__init__.py", line
62, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from .ipmi import IPMIManagementBoardDataSource
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cmk_base/data_sources/ipmi.py", line 27,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.command as ipmi_cmd
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/pyghmi/ipmi/command.py",
line 25, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from pyghmi.ipmi.oem.lookup import get_oem_handler
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File "/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lookup.py",
line 16, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.oem.lenovo.handler as lenovo
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lenovo/handler.py", line
33, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from pyghmi.ipmi.oem.lenovo import imm
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/oem/lenovo/imm.py", line 25,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: import pyghmi.ipmi.private.session as ipmisession
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/private/session.py", line 273,
in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: class Session(object):
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/pyghmi/ipmi/private/session.py", line 309,
in Session
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: _crypto_backend = default_backend()
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/__init__.py",
line 15, in default_backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.hazmat.backends.openssl.backend import backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/openssl/__init__.py",
line 7, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.hazmat.backends.openssl.backend import backend
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/hazmat/backends/openssl/backend.py",
line 16, in <module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography import utils, x509
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: File
"/omd/sites/melbourne/lib/python/cryptography/x509/__init__.py", line 8, in
<module>
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: from cryptography.x509 import certificate_transparency
2018-09-25 07:52:07 [3] [alert helper 32312] Invalid response from alert
helper: ImportError: cannot import name certificate_transparency
..........
2018-09-25 07:52:30 [0] [alert helper 32312] could not read signal byte:
Connection reset by peer
2018-09-25 07:52:30 [5] [alert helper 32312] still 1 unsent events, sending
them now
2018-09-25 07:52:30 [0] [alert helper 32312] could not read signal byte:
Connection reset by peer
2018-09-25 07:52:30 [5] [alert helper 32312] still 1 unsent events, sending
them now