← Back to team overview

openstack team mailing list archive

Help, erroneous resource tracker preventing instances from starting

 

Hi All,

I have a growing problem in which compute nodes are puzzlingly over
reporting their resource utilization and thus appearing to be over utilized
when they are in fact empty.  System is Ubuntu 12.04 using cloud archive
Folsom (2012.2-0ubuntu5~cloud0) problem appeared on a single node after
upgrade from Essex some months ago and has now grown to 5 nodes (the lowest
numbered 5 nodes both by IP and lexically by name)

For example on the compute node "nova-1":

2013-01-07 10:39:43 INFO nova.compute.manager [-] Updating host status
2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free ram (MB):
-397134
2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free disk (GB):
-3318
2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -215
2013-01-07 10:41:02 INFO nova.compute.resource_tracker [-] Compute_service
record updated for nova-1

Oddly even though no instances are scheduled teh resource utilization does
vary, for example in the last 5hours:

root@nova-1:~# grep 'Free VCPUS:' /var/log/nova/nova-compute.log|awk
'{print $NF}'|sort -n |uniq -c
    156 -218
      3 -216
      5 -215
      2 -214
      2 -212
      1 -211
      1 -210
      5 -209
      5 -208

# but no instances are running
root@nova-1:~# virsh list
 Id    Name                           State
----------------------------------------------------

root@nova-1:~#

# nor does OpenStack seem to *think* any instances are running or reserved
by any projects
# as seen by nova-manage service describe_resource nova-1

HOST                              PROJECT     cpu mem(mb)     hdd
nova-1          (total)                        24   48295     602
nova-1          (used_now)                    233  433141    3740
nova-1          (used_max)                      0       0       0
# note lack of a list of tenants here

I can't replicate the issue intetionally but also can't clear appaerent
resource utilization.  Tried direct manipulation of the database but that
gets reset by computenode reports, tried rebooting the nodes.  I can always
fall back to just reinstalling them, but since this is still a
pre-production cluster I'd liek to understand what is happening.

Anyone have an insight into why nova.compute.resource_tracker is so
confused or how I can force it to understand what resources are in use?
Operationally it isn't painful to reinstall, but it does hurt a bit not
knowing what's going on here.

Thanks,
-Jon

Follow ups