yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95055
[Bug 2091855] [NEW] functional test job randomly failing when running on nodes with 4 CPUs
Public bug reported:
Functional test jobs randomly failing with some processes getting oom-
killed when running on nodes with 4 CPU, currently raxflex-sjc3 provider
nodes have 4 CPUs (normally nodes have 8 CPUs).
When the jobs running on these nodes they utilize all memory and swap
and that results into oom-kill of some processes and thus test fails.
Example failure:-
- https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html
Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/data-
explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron-
functional),type:phrase),query:(match_phrase:(build_name:neutron-
functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex-
sjc3%22'))
We have to check if the memory utilization can be improved in these tests/jobs.
For CI itself we can separate the test runs in groups to unblock this.
** Affects: neutron
Importance: Critical
Status: New
** Tags: functional-tests gate-failure
** Changed in: neutron
Importance: Undecided => Critical
** Tags added: functional-tests gate-failure
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2091855
Title:
functional test job randomly failing when running on nodes with 4 CPUs
Status in neutron:
New
Bug description:
Functional test jobs randomly failing with some processes getting oom-
killed when running on nodes with 4 CPU, currently raxflex-sjc3
provider nodes have 4 CPUs (normally nodes have 8 CPUs).
When the jobs running on these nodes they utilize all memory and swap
and that results into oom-kill of some processes and thus test fails.
Example failure:-
- https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html
Opensearch:-
https://opensearch.logs.openstack.org/_dashboards/app/data-
explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron-
functional),type:phrase),query:(match_phrase:(build_name:neutron-
functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex-
sjc3%22'))
We have to check if the memory utilization can be improved in these tests/jobs.
For CI itself we can separate the test runs in groups to unblock this.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2091855/+subscriptions