openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #19871
Re: Help, erroneous resource tracker preventing instances from starting
See if this bug might be related to your problem...
https://bugs.launchpad.net/nova/+bug/1060363
Byron
Begin forwarded message "[Openstack] Base images removed in upgrade essex -> folsom and other stories":
> We also came across an issue where some compute nodes were reporting bogus resource stats. Eg:
>
> 2012-11-13 05:04:38 INFO nova.compute.manager [-] Updating host status
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -739665
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 12654
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -188
> 2012-11-13 05:06:14 INFO nova.compute.resource_tracker [-] Compute_service record updated for np-rcc6
>
> This happened to be addressed by the following bug, it turns out it does a regex for the db filter.
> https://bugs.launchpad.net/nova/+bug/1060363
>
> So a compute node of np-rcc5 would also pull in np-rcc50, np-rcc51.. and so on and so on.
>
On Jan 7, 2013, at 9:50 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote:
> Hi All,
>
> I have a growing problem in which compute nodes are puzzlingly over reporting their resource utilization and thus appearing to be over utilized when they are in fact empty. System is Ubuntu 12.04 using cloud archive Folsom (2012.2-0ubuntu5~cloud0) problem appeared on a single node after upgrade from Essex some months ago and has now grown to 5 nodes (the lowest numbered 5 nodes both by IP and lexically by name)
>
> For example on the compute node "nova-1":
>
> 2013-01-07 10:39:43 INFO nova.compute.manager [-] Updating host status
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -397134
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free disk (GB): -3318
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -215
> 2013-01-07 10:41:02 INFO nova.compute.resource_tracker [-] Compute_service record updated for nova-1
>
> Oddly even though no instances are scheduled teh resource utilization does vary, for example in the last 5hours:
>
> root@nova-1:~# grep 'Free VCPUS:' /var/log/nova/nova-compute.log|awk '{print $NF}'|sort -n |uniq -c
> 156 -218
> 3 -216
> 5 -215
> 2 -214
> 2 -212
> 1 -211
> 1 -210
> 5 -209
> 5 -208
>
> # but no instances are running
> root@nova-1:~# virsh list
> Id Name State
> ----------------------------------------------------
>
> root@nova-1:~#
>
> # nor does OpenStack seem to *think* any instances are running or reserved by any projects
> # as seen by nova-manage service describe_resource nova-1
>
> HOST PROJECT cpu mem(mb) hdd
> nova-1 (total) 24 48295 602
> nova-1 (used_now) 233 433141 3740
> nova-1 (used_max) 0 0 0
> # note lack of a list of tenants here
>
> I can't replicate the issue intetionally but also can't clear appaerent resource utilization. Tried direct manipulation of the database but that gets reset by computenode reports, tried rebooting the nodes. I can always fall back to just reinstalling them, but since this is still a pre-production cluster I'd liek to understand what is happening.
>
> Anyone have an insight into why nova.compute.resource_tracker is so confused or how I can force it to understand what resources are in use? Operationally it isn't painful to reinstall, but it does hurt a bit not knowing what's going on here.
>
> Thanks,
> -Jon
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
References