openstack team mailing list archive

Thread
Date

Re: Help, erroneous resource tracker preventing instances from starting

To: Jonathan Proulx <jon@xxxxxxxxxxxxx>
From: Byron McCollum <byron.mccollum@xxxxxxxxxxxxx>
Date: Tue, 8 Jan 2013 01:02:05 +0000
Accept-language: en-US
Cc: "<openstack@xxxxxxxxxxxxxxxxxxx>" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CABZB-sjq5cJq2pUTXCQ+ES8TroUmaJK27PBpTLJv+2bd7zHSUA@mail.gmail.com>
Thread-index: AQHN7O7cwrjRG03D5Uqaa4JJ6ZYDQJg/AkIA
Thread-topic: [Openstack] Help, erroneous resource tracker preventing instances from starting

See if this bug might be related to your problem...

https://bugs.launchpad.net/nova/+bug/1060363

Byron


Begin forwarded message "[Openstack] Base images removed in upgrade essex -> folsom and other stories":

> We also came across an issue where some compute nodes were reporting bogus resource stats. Eg:
> 
> 2012-11-13 05:04:38 INFO nova.compute.manager [-] Updating host status
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -739665
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 12654
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -188
> 2012-11-13 05:06:14 INFO nova.compute.resource_tracker [-] Compute_service record updated for np-rcc6
> 
> This happened to be addressed by the following bug, it turns out it does a regex for the db filter.
> https://bugs.launchpad.net/nova/+bug/1060363
> 
> So a compute node of np-rcc5 would also pull in np-rcc50, np-rcc51.. and so on and so on. 
> 


On Jan 7, 2013, at 9:50 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote:

> Hi All,
> 
> I have a growing problem in which compute nodes are puzzlingly over reporting their resource utilization and thus appearing to be over utilized when they are in fact empty.  System is Ubuntu 12.04 using cloud archive Folsom (2012.2-0ubuntu5~cloud0) problem appeared on a single node after upgrade from Essex some months ago and has now grown to 5 nodes (the lowest numbered 5 nodes both by IP and lexically by name)
> 
> For example on the compute node "nova-1":
> 
> 2013-01-07 10:39:43 INFO nova.compute.manager [-] Updating host status
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -397134
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free disk (GB): -3318
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -215
> 2013-01-07 10:41:02 INFO nova.compute.resource_tracker [-] Compute_service record updated for nova-1 
> 
> Oddly even though no instances are scheduled teh resource utilization does vary, for example in the last 5hours:
> 
> root@nova-1:~# grep 'Free VCPUS:' /var/log/nova/nova-compute.log|awk '{print $NF}'|sort -n |uniq -c
>     156 -218
>       3 -216
>       5 -215
>       2 -214
>       2 -212
>       1 -211
>       1 -210
>       5 -209
>       5 -208
> 
> # but no instances are running
> root@nova-1:~# virsh list
>  Id    Name                           State
> ----------------------------------------------------
> 
> root@nova-1:~# 
> 
> # nor does OpenStack seem to *think* any instances are running or reserved by any projects
> # as seen by nova-manage service describe_resource nova-1
> 
> HOST                              PROJECT     cpu mem(mb)     hdd
> nova-1          (total)                        24   48295     602
> nova-1          (used_now)                    233  433141    3740
> nova-1          (used_max)                      0       0       0
> # note lack of a list of tenants here
> 
> I can't replicate the issue intetionally but also can't clear appaerent resource utilization.  Tried direct manipulation of the database but that gets reset by computenode reports, tried rebooting the nodes.  I can always fall back to just reinstalling them, but since this is still a pre-production cluster I'd liek to understand what is happening.
> 
> Anyone have an insight into why nova.compute.resource_tracker is so confused or how I can force it to understand what resources are in use? Operationally it isn't painful to reinstall, but it does hurt a bit not knowing what's going on here.
> 
> Thanks,
> -Jon
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

References

Help, erroneous resource tracker preventing instances from starting
From: Jonathan Proulx, 2013-01-07