← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1961068] [NEW] nova-ceph-multistore job fails with mysqld got oom-killed

 

Public bug reported:

Searching through the jobs showed that nova-ceph-multistore job fails
time to time with DB crash due to out of memory error.

In the tempest errors the following message can be seen:

tempest.lib.exceptions.ServerFault: Got server fault
Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBConnectionError'>

in mysqld error logs (controller/logs/mysql/error_log.txt) the crash
recovery is visible:

2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery finished.

and around that time in syslog (controller/logs/syslog.txt) the Out of
Memory logs can be seen:

Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: Killed process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB


The error only comes in nova-ceph-multistore job. (see recent occurrences via logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly happens on current master branch (yoga), but example error found in wallaby as well: https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616

** Affects: nova
     Importance: High
         Status: Confirmed


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1961068

Title:
  nova-ceph-multistore job fails with mysqld got oom-killed

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Searching through the jobs showed that nova-ceph-multistore job fails
  time to time with DB crash due to out of memory error.

  In the tempest errors the following message can be seen:

  tempest.lib.exceptions.ServerFault: Got server fault
  Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'oslo_db.exception.DBConnectionError'>

  in mysqld error logs (controller/logs/mysql/error_log.txt) the crash
  recovery is visible:

  2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
  2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery finished.

  and around that time in syslog (controller/logs/syslog.txt) the Out of
  Memory logs can be seen:

  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116
  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: Killed process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0
  Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

  
  The error only comes in nova-ceph-multistore job. (see recent occurrences via logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly happens on current master branch (yoga), but example error found in wallaby as well: https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1961068/+subscriptions



Follow ups