yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75493
[Bug 1800204] [NEW] n-cpu.service consuming 100% of CPU indeterminately
Public bug reported:
Description
==============
I used fault injection to assess the robustness of the nova-conductor, and by injecting a specific sequence of failures I saw a failure that can threaten the robustness of the system. The resulting of applying these faults in the interface of nova-conductor prevent the nova-compute provisioning new instances.
Steps to reproduce
=====================
I reproduced this bug 100% from 10 attempts. I used devstack/queens.
The workload I used is of the following steps:
1) First, create a VM with the following flavor: 64MB RAM, 1 VCPU, 0 DISK; and the reference image 'cirros.0.3.4' for instance; all other settings can be the defaults of admin account;
2) Rebuild with an alternative image: for instance, 'cirros 0.4.0';
3) Rebuild with the reference image again;
4) Shelve the instance;
5) Delete the instance;
Below, I describe the faultload. For each time a fault is injected, the workload is executed from its begin. The steps are:
1) Intercept the first RPC message (i.e. AMQP) that calls for 'schedule_and_build_instances';
2) Inject the 'fault' in 'schedule_and_build_instances.args.build_requests->'nova_object.data'.instance.'nova_object.data'.flavor.'nova_object.data'.vcpus'
The pseudo-algorithm:
1. execute workload
2. for each fault in ['2', '-10000000000000000000001', '10000000000000000000000']
2.1. execute workload in parallel with faultload(fault)
3. see the CPU activity for the process n-cpu.service of devstack
Expected result
==================
nova-compute handles the faults not impacting in future requests.
Actual result
================
nova-compute consumes 100% of CPU and new instances is set to 'error' state without any clue about the issue, so it is not possible to create new instances without restarting n-cpu.service
Environment
==============
Devstack/Queens in Single Machine with defaults.
Logs & Configs
=================
Logs attached.
** Affects: nova
Importance: Undecided
Status: New
** Attachment added: "Logs from before to after applying the tests"
https://bugs.launchpad.net/bugs/1800204/+attachment/5205956/+files/sys-100p-now.logs
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1800204
Title:
n-cpu.service consuming 100% of CPU indeterminately
Status in OpenStack Compute (nova):
New
Bug description:
Description
==============
I used fault injection to assess the robustness of the nova-conductor, and by injecting a specific sequence of failures I saw a failure that can threaten the robustness of the system. The resulting of applying these faults in the interface of nova-conductor prevent the nova-compute provisioning new instances.
Steps to reproduce
=====================
I reproduced this bug 100% from 10 attempts. I used devstack/queens.
The workload I used is of the following steps:
1) First, create a VM with the following flavor: 64MB RAM, 1 VCPU, 0 DISK; and the reference image 'cirros.0.3.4' for instance; all other settings can be the defaults of admin account;
2) Rebuild with an alternative image: for instance, 'cirros 0.4.0';
3) Rebuild with the reference image again;
4) Shelve the instance;
5) Delete the instance;
Below, I describe the faultload. For each time a fault is injected, the workload is executed from its begin. The steps are:
1) Intercept the first RPC message (i.e. AMQP) that calls for 'schedule_and_build_instances';
2) Inject the 'fault' in 'schedule_and_build_instances.args.build_requests->'nova_object.data'.instance.'nova_object.data'.flavor.'nova_object.data'.vcpus'
The pseudo-algorithm:
1. execute workload
2. for each fault in ['2', '-10000000000000000000001', '10000000000000000000000']
2.1. execute workload in parallel with faultload(fault)
3. see the CPU activity for the process n-cpu.service of devstack
Expected result
==================
nova-compute handles the faults not impacting in future requests.
Actual result
================
nova-compute consumes 100% of CPU and new instances is set to 'error' state without any clue about the issue, so it is not possible to create new instances without restarting n-cpu.service
Environment
==============
Devstack/Queens in Single Machine with defaults.
Logs & Configs
=================
Logs attached.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1800204/+subscriptions
Follow ups