yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96344
[Bug 2121607] [NEW] Nova-api showing latency after upgrading to Caracal
Public bug reported:
After upgrading to Caracal, we noticed the duration of GET calls to
nova-api is increasing over time, and same for the memory usage of nova-
api. We first noticed that in telegraf metrics, to validate that, I
created a brand new cluster of VMs without telegraf, with only one
headnode running nova-api, and have multiple nodes sending GET request
to that and monitor the duration.
Script to send requests:
# --- Get a fresh token (requires openrc sourced first) ---
get_token() {
openstack token issue -f value -c id
}
OS_TOKEN=$(get_token)
echo "Using token: $OS_TOKEN"
# --- Send requests at 10 per second ---
COUNT=0
while true; do
COUNT=$((COUNT+1))
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-H "X-Auth-Token: $OS_TOKEN" \
-H "Accept: application/json" \
"$NOVA_URL/servers/detail")
echo "$(date +'%F %T') [$COUNT] HTTP $STATUS"
if [ "$STATUS" = "401" ]; then
echo "[$(date)] Got 401 → refreshing token..."
OS_TOKEN=$(get_token)
continue # retry next loop with fresh token
fi
sleep 0.1 # 0.1 sec → 10 per second
done
script to monitor the duration (avg per 5 minutes)
grep 'servers/detail' /var/log/nova/nova-api.log | awk '
# Example line:
# 2025-08-21 17:27:08.859 ... "GET /v2.1/os-quota-sets/..." ... time: 0.6598654
match($0, /^([0-9-]+) ([0-9]{2}):([0-9]{2}):([0-9]{2})(\.[0-9]+)?.* time: ([0-9.]+)/, m) {
ymd = m[1]; hh = m[2]; mm = m[3]; dur = m[6]
bmin = int(mm/5)*5 # floor minute to 5-min bucket
key = sprintf("%s %s:%02d", ymd, hh, bmin) # e.g., 2025-08-21 17:25
sum[key] += dur; cnt[key]++
}
END {
for (k in sum) printf "%s,%.3f\n", k, sum[k]/cnt[k] | "sort"
}'
I use systemctl status to track the memory usage, it increased about
500MB during a weekend (I'm testing on a small cluster). The duration of
the GET request also showed obvious increment, and seems no restriction
limit.
Wondering if it is a memory leak thing, but want to get confirmation
from team. Thanks.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2121607
Title:
Nova-api showing latency after upgrading to Caracal
Status in OpenStack Compute (nova):
New
Bug description:
After upgrading to Caracal, we noticed the duration of GET calls to
nova-api is increasing over time, and same for the memory usage of
nova-api. We first noticed that in telegraf metrics, to validate that,
I created a brand new cluster of VMs without telegraf, with only one
headnode running nova-api, and have multiple nodes sending GET request
to that and monitor the duration.
Script to send requests:
# --- Get a fresh token (requires openrc sourced first) ---
get_token() {
openstack token issue -f value -c id
}
OS_TOKEN=$(get_token)
echo "Using token: $OS_TOKEN"
# --- Send requests at 10 per second ---
COUNT=0
while true; do
COUNT=$((COUNT+1))
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-H "X-Auth-Token: $OS_TOKEN" \
-H "Accept: application/json" \
"$NOVA_URL/servers/detail")
echo "$(date +'%F %T') [$COUNT] HTTP $STATUS"
if [ "$STATUS" = "401" ]; then
echo "[$(date)] Got 401 → refreshing token..."
OS_TOKEN=$(get_token)
continue # retry next loop with fresh token
fi
sleep 0.1 # 0.1 sec → 10 per second
done
script to monitor the duration (avg per 5 minutes)
grep 'servers/detail' /var/log/nova/nova-api.log | awk '
# Example line:
# 2025-08-21 17:27:08.859 ... "GET /v2.1/os-quota-sets/..." ... time: 0.6598654
match($0, /^([0-9-]+) ([0-9]{2}):([0-9]{2}):([0-9]{2})(\.[0-9]+)?.* time: ([0-9.]+)/, m) {
ymd = m[1]; hh = m[2]; mm = m[3]; dur = m[6]
bmin = int(mm/5)*5 # floor minute to 5-min bucket
key = sprintf("%s %s:%02d", ymd, hh, bmin) # e.g., 2025-08-21 17:25
sum[key] += dur; cnt[key]++
}
END {
for (k in sum) printf "%s,%.3f\n", k, sum[k]/cnt[k] | "sort"
}'
I use systemctl status to track the memory usage, it increased about
500MB during a weekend (I'm testing on a small cluster). The duration
of the GET request also showed obvious increment, and seems no
restriction limit.
Wondering if it is a memory leak thing, but want to get confirmation
from team. Thanks.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2121607/+subscriptions