← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2121607] Re: Nova-api showing latency after upgrading to Caracal

 

** Also affects: python-attrs (Ubuntu Jammy)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2121607

Title:
  Nova-api showing latency after upgrading to Caracal

Status in OpenStack Compute (nova):
  Confirmed
Status in python-attrs package in Ubuntu:
  New
Status in python-attrs source package in Jammy:
  New

Bug description:
  After upgrading to Caracal, we noticed the duration of GET calls to
  nova-api is increasing over time, and same for the memory usage of
  nova-api. We first noticed that in telegraf metrics, to validate that,
  I created a brand new cluster of VMs without telegraf, with only one
  headnode running nova-api, and have multiple nodes sending GET request
  to that and monitor the duration.

  Script to send requests:
  # --- Get a fresh token (requires openrc sourced first) ---
  get_token() {
    openstack token issue -f value -c id
  }
  OS_TOKEN=$(get_token)
  echo "Using token: $OS_TOKEN"

  # --- Send requests at 10 per second ---
  COUNT=0
  while true; do
    COUNT=$((COUNT+1))
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
      -H "X-Auth-Token: $OS_TOKEN" \
      -H "Accept: application/json" \
      "$NOVA_URL/servers/detail")

    echo "$(date +'%F %T') [$COUNT] HTTP $STATUS"

    if [ "$STATUS" = "401" ]; then
      echo "[$(date)] Got 401 → refreshing token..."
      OS_TOKEN=$(get_token)
      continue   # retry next loop with fresh token
    fi

    sleep 0.1   # 0.1 sec → 10 per second
  done

  script to monitor the duration (avg per 5 minutes)
  grep 'servers/detail' /var/log/nova/nova-api.log | awk '
      # Example line:
    # 2025-08-21 17:27:08.859 ... "GET /v2.1/os-quota-sets/..." ... time: 0.6598654
    match($0, /^([0-9-]+) ([0-9]{2}):([0-9]{2}):([0-9]{2})(\.[0-9]+)?.* time: ([0-9.]+)/, m) {
        ymd = m[1]; hh = m[2]; mm = m[3]; dur = m[6]
        bmin = int(mm/5)*5                           # floor minute to 5-min bucket
        key = sprintf("%s %s:%02d", ymd, hh, bmin)   # e.g., 2025-08-21 17:25
        sum[key] += dur; cnt[key]++
    }
    END {
        for (k in sum) printf "%s,%.3f\n", k, sum[k]/cnt[k] | "sort"
    }'

  I use systemctl status to track the memory usage, it increased about
  500MB during a weekend (I'm testing on a small cluster). The duration
  of the GET request also showed obvious increment, and seems no
  restriction limit.

  Wondering if it is a memory leak thing, but want to get confirmation
  from team. Thanks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2121607/+subscriptions



References