yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87541
[Bug 1949051] [NEW] nova compute service running IronicDriver maybe leak memory
Public bug reported:
Description
===========
We run nova-compute service with IronicDriver in k8s cluster as statefulSet pod, with 1GiB memory limit and only this service in POD.
There are about 40 nodes in our test environment. Most of them have instances and in active provision state.
Some nodes fail to connect to the IPMI. As a result, they cannot obtain the power status.
In about 12 hours, the memory limit is exceeded and the POD is restarted.
Steps to reproduce
==================
Nothing need to do.
Note the flowing:
1. The more nodes there are, the faster the memory grows and the shorter the time limit is exceeded.
2. Even with only one node, the memory limit will be exceeded, but for a long time.
3. In our environment, the frequency of memory growth is around 10min, so we suspect that is caused by periodic task, maybe `_sync_power_states` task.
4. I am not sure whether the IPMI connection has any impact.
Expected result
===============
Memory of the pod should be in a stable state when we are not performing operations on nodes/instances.
Actual result
=============
Memory keeps increasing until the limit is exceeded and the POD is restarted.
Environment
===========
openstack version
- nova: 22.0.1
- ironic: 16-0-1
Logs & Configs
==============
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1949051
Title:
nova compute service running IronicDriver maybe leak memory
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
We run nova-compute service with IronicDriver in k8s cluster as statefulSet pod, with 1GiB memory limit and only this service in POD.
There are about 40 nodes in our test environment. Most of them have instances and in active provision state.
Some nodes fail to connect to the IPMI. As a result, they cannot obtain the power status.
In about 12 hours, the memory limit is exceeded and the POD is restarted.
Steps to reproduce
==================
Nothing need to do.
Note the flowing:
1. The more nodes there are, the faster the memory grows and the shorter the time limit is exceeded.
2. Even with only one node, the memory limit will be exceeded, but for a long time.
3. In our environment, the frequency of memory growth is around 10min, so we suspect that is caused by periodic task, maybe `_sync_power_states` task.
4. I am not sure whether the IPMI connection has any impact.
Expected result
===============
Memory of the pod should be in a stable state when we are not performing operations on nodes/instances.
Actual result
=============
Memory keeps increasing until the limit is exceeded and the POD is restarted.
Environment
===========
openstack version
- nova: 22.0.1
- ironic: 16-0-1
Logs & Configs
==============
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1949051/+subscriptions
Follow ups