Discussion:
[Check_mk (english)] 1.4.0p34 python check_mk.py --keepalive process is coredumping.
Sune Bonde Folkmann
2018-10-16 13:11:07 UTC
Permalink
Hi

A couple of months ago we upgraded our master running on at the time SuSE 11 SP4 to SuSE 12 SP3, and from Check_mk 1.2.8p27 to 1.4.0p34.

After that we have seen that the server is using a lot more CPU than before, and that the check_mk.py -keepalive process is core dumping, without no further information to find.
Like this:
dkamonp-ns01:/etc/sysconfig # coredumpctl dump
PID: 43202 (python)
UID: 109 (nagios01)
GID: 1000 (nagios01)
Signal: 11 (SEGV)
Timestamp: Tue 2018-10-16 14:04:30 CEST (1s ago)
Command Line: python /omd/sites/nagios01/share/check_mk/modules/check_mk.py --keepalive
Executable: /opt/omd/versions/1.4.0p34.cee/bin/python2.7
Control Group: /system.slice/sshd.service
Unit: sshd.service
Slice: system.slice
Boot ID: 699e6f42aa774226a7b9fc36b42b244e
Machine ID: 419d15c7672e90948173fbd353576920
Hostname: dkamonp-ns01
Message: Process 43202 (python) of user 109 dumped core.
Refusing to dump core to tty.

These happens around every 4 seconds in average.

These are the lines from /var/log/messages:
2018-10-16T14:24:36.346230+02:00 dkamonp-ns01 kernel: [11744.456266] python[47529]: segfault at ffffffffffffffff ip 00007ffb79970339 sp 00007fffe625f6b0 error 5 in libpython2.7.so.1.0[7ffb798a4000+21b000]
2018-10-16T14:24:36.360628+02:00 dkamonp-ns01 systemd-coredump[48417]: Core Dumping has been disabled for process 47529 (python).
2018-10-16T14:24:36.361051+02:00 dkamonp-ns01 systemd-coredump[48417]: Process 47529 (python) of user 109 dumped core.
2018-10-16T14:24:37.170229+02:00 dkamonp-ns01 kernel: [11745.277670] traps: python[47997] general protection ip:7f8bac1d3339 sp:7ffccb870930 error:0 in libpython2.7.so.1.0[7f8bac107000+21b000]traps:
2018-10-16T14:24:37.184636+02:00 dkamonp-ns01 systemd-coredump[48419]: Core Dumping has been disabled for process 47997 (python).
2018-10-16T14:24:37.185053+02:00 dkamonp-ns01 systemd-coredump[48419]: Process 47997 (python) of user 109 dumped core.

We have been going through all python environment references and correcting them with no result.

The site I actually still running, but very slow, and sometimes with a lot of timeouts and errors where it cannot connect to the services.

Has anyone seen anything like this ?

We have also tried with a fresh server, a clean install of 1.4.0p34, and a new site. And that works.
As soon as we import a backup from the original site, and restores it, the dumps comes back.


/Sune Folkmann

Loading...