yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1949051] [NEW] nova compute service running IronicDriver maybe leak memory

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Simon Li <1949051@xxxxxxxxxxxxxxxxxx>
Date: Thu, 28 Oct 2021 07:44:46 -0000
Reply-to: Bug 1949051 <1949051@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Public bug reported:

Description
===========
We run nova-compute service with IronicDriver in k8s cluster as statefulSet pod, with 1GiB memory limit and only this service in POD. 
There are about 40 nodes in our test environment. Most of them have instances and in active provision state. 
Some nodes fail to connect to the IPMI. As a result, they cannot obtain the power status.
In about 12 hours, the memory limit is exceeded and the POD is restarted.

Steps to reproduce
==================
Nothing need to do.
Note the flowing:
1. The more nodes there are, the faster the memory grows and the shorter the time limit is exceeded.
2. Even with only one node, the memory limit will be exceeded, but for a long time.
3. In our environment, the frequency of memory growth is around 10min, so we suspect that is caused by periodic task, maybe `_sync_power_states` task.
4. I am not sure whether the IPMI connection has any impact.


Expected result
===============
Memory of the pod should be in a stable state when we are not performing operations on nodes/instances.

Actual result
=============
Memory keeps increasing until the limit is exceeded and the POD is restarted.

Environment
===========
openstack version
   - nova: 22.0.1
   - ironic: 16-0-1

Logs & Configs
==============

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1949051

Title:
  nova compute service running IronicDriver  maybe leak memory

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  We run nova-compute service with IronicDriver in k8s cluster as statefulSet pod, with 1GiB memory limit and only this service in POD. 
  There are about 40 nodes in our test environment. Most of them have instances and in active provision state. 
  Some nodes fail to connect to the IPMI. As a result, they cannot obtain the power status.
  In about 12 hours, the memory limit is exceeded and the POD is restarted.

  Steps to reproduce
  ==================
  Nothing need to do.
  Note the flowing:
  1. The more nodes there are, the faster the memory grows and the shorter the time limit is exceeded.
  2. Even with only one node, the memory limit will be exceeded, but for a long time.
  3. In our environment, the frequency of memory growth is around 10min, so we suspect that is caused by periodic task, maybe `_sync_power_states` task.
  4. I am not sure whether the IPMI connection has any impact.

  
  Expected result
  ===============
  Memory of the pod should be in a stable state when we are not performing operations on nodes/instances.

  Actual result
  =============
  Memory keeps increasing until the limit is exceeded and the POD is restarted.

  Environment
  ===========
  openstack version
     - nova: 22.0.1
     - ironic: 16-0-1

  Logs & Configs
  ==============

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1949051/+subscriptions

Follow ups

[Bug 1949051] Re: nova compute service running IronicDriver maybe leak memory
From: Launchpad Bug Tracker, 2022-01-02