← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2073582] [NEW] random failures in scenario jobs as processes oom-killed

 

Public bug reported:

We started to see jobs fail randomly as process getting oom-killed as swap gets fully utilized, linuxbridge/ovs jobs are configured with 2GB swap, this needs to be increased further as recently flavor ram increased to 192MB from 128MB with [1].
Tempest tests fails randomly as one or other api test fails due to db error or any other error:-

in syslogs:-
Jul 18 17:13:29 np0038010029 kernel: Out of memory: Killed process 153162 (mysqld) total-vm:4847476kB, anon-rss:477648kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:1548kB oom_score_adj:0
Jul 18 17:13:44 np0038010029 kernel: Out of memory: Killed process 153605 (mysqld) total-vm:4026888kB, anon-rss:421676kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:1336kB oom_score_adj:0

Example failures:-
- https://b7cc7c5a5c34fa4ca82f-a41f0516fe3b5b007f4b4a96d9d64f43.ssl.cf5.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/0f4d608/testr_results.html
- https://bd90009aa1732b7b8d4a-e998c5625939f617052baaae6f827bb8.ssl.cf5.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-6/2a7f5e5/testr_results.html
- https://9666b76ef4c07993fbc8-c862a7cee9789a90e28aadd42eae2397.ssl.cf1.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-9/bf476ca/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_05a/924386/3/check/neutron-tempest-plugin-linuxbridge-4/05a17ea/testr_results.html


[1] https://review.opendev.org/c/openstack/devstack/+/924094

** Affects: neutron
     Importance: Critical
     Assignee: yatin (yatinkarel)
         Status: In Progress


** Tags: gate-failure

** Changed in: neutron
     Assignee: (unassigned) => yatin (yatinkarel)

** Changed in: neutron
       Status: New => Triaged

** Changed in: neutron
   Importance: Undecided => Critical

** Tags added: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2073582

Title:
  random failures in scenario jobs as processes oom-killed

Status in neutron:
  In Progress

Bug description:
  We started to see jobs fail randomly as process getting oom-killed as swap gets fully utilized, linuxbridge/ovs jobs are configured with 2GB swap, this needs to be increased further as recently flavor ram increased to 192MB from 128MB with [1].
  Tempest tests fails randomly as one or other api test fails due to db error or any other error:-

  in syslogs:-
  Jul 18 17:13:29 np0038010029 kernel: Out of memory: Killed process 153162 (mysqld) total-vm:4847476kB, anon-rss:477648kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:1548kB oom_score_adj:0
  Jul 18 17:13:44 np0038010029 kernel: Out of memory: Killed process 153605 (mysqld) total-vm:4026888kB, anon-rss:421676kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:1336kB oom_score_adj:0

  Example failures:-
  - https://b7cc7c5a5c34fa4ca82f-a41f0516fe3b5b007f4b4a96d9d64f43.ssl.cf5.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/0f4d608/testr_results.html
  - https://bd90009aa1732b7b8d4a-e998c5625939f617052baaae6f827bb8.ssl.cf5.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-6/2a7f5e5/testr_results.html
  - https://9666b76ef4c07993fbc8-c862a7cee9789a90e28aadd42eae2397.ssl.cf1.rackcdn.com/924386/3/check/neutron-tempest-plugin-openvswitch-9/bf476ca/testr_results.html
  - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_05a/924386/3/check/neutron-tempest-plugin-linuxbridge-4/05a17ea/testr_results.html

  
  [1] https://review.opendev.org/c/openstack/devstack/+/924094

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2073582/+subscriptions



Follow ups