openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #24249
Folsom's `nova-compute` reports wrong amount of free disk space
Hello,
we are experiencing failures in starting new instances on our local
OpenStack (Folsom) installation. The problem seems to be that VMs are
being started on nodes that do not have enough free space. Relevant
extract from `nova-compute.log` on a compute node:
... 19314 TRACE nova.compute.manager [instance: ...]
ProcessExecutionError: Unexpected error while running command.
... 19314 TRACE nova.compute.manager [instance: ...] Command:
qemu-img convert -O raw
/var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.part
/var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.converted
... 19314 TRACE nova.compute.manager [instance: ...] Exit code: 1
... 19314 TRACE nova.compute.manager [instance: ...] Stdout: ''
... 19314 TRACE nova.compute.manager [instance: ...] Stderr:
'qemu-img: error while writing sector 1720383: No space left on
device\n'
Now, we have two questions:
(1) Why does this happen, i.e., how do we debug the misreporting of
free disk space?
(2) Most disk space seems to be wasted by old (and presumably failed)
image conversion attempts in `/var/lib/nova/instances/_base`. How
can we clean those up? Can we just delete the files or some
special care / database edit is needed?
More details on the node state follows:
(1) There is a curious mismatch between the values that Nova gets from
hypervisor, and those that it apparently reports to the Scheduler:
... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
ram (MB): 30284 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:470
... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
disk (GB): 7 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:471
... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
VCPUs: 6 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:476
... 19314 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 30189
... 19314 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 87
... 19314 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 5
... 19314 INFO nova.compute.resource_tracker [-] Compute_service
record updated for node-09-02-11
Why are 7GB of free disk detected, but 87 reported? And why does
this not happen for the free RAM?
There are only two instances actually running on the node:
# pgrep -lf kvm
1419 kvm-irqfd-clean
3225 /usr/bin/kvm -name instance-00002ab1 ...
3228 kvm-pit-wq
24110 /usr/bin/kvm -name instance-00002800 ...
24113 kvm-pit-wq
(2) The disk on the node is nearly full (`/var/lib/nova` is on the root fs):
root@node-09-02-11:/var/lib/nova/instances# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/node--09--02--11-root 103G 96G 2,0G 99% /
...
Most of the space is used in the `instances/_base` subdirectory:
root@node-09-02-11:/var/lib/nova/instances# du -sch *
93G _base
672M instance-00002800
8,0K instance-00002835
17M instance-00002ab1
8,0K instance-00002ab2
8,0K instance-00002ab3
4,0K snapshots
93G total
root@node-09-02-11:/var/lib/nova/instances/_base# ls -l
total 96645084
-rw-r--r-- 1 nova nova 5368709120 mag 14 19:12
04df06d78ce815d1e1dbac931318253f5423480f
-rw-r--r-- 1 libvirt-qemu kvm 5368709120 mag 14 19:13
04df06d78ce815d1e1dbac931318253f5423480f_5
-rw-r--r-- 1 nova nova 10737418240 mag 22 14:24
0c09def8c324019bff1d98d88fa3266a964acd78
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 22 14:25
0c09def8c324019bff1d98d88fa3266a964acd78_20
-rw-rw-r-- 1 nova nova 1276903424 mag 22 14:24
0c09def8c324019bff1d98d88fa3266a964acd78.part
-rw-r--r-- 1 nova nova 21474836480 mag 31 11:33
23f7cddeb51dd8892cbaa076eef7693ee43fa295
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 31 11:34
23f7cddeb51dd8892cbaa076eef7693ee43fa295_20
-rw-r--r-- 1 libvirt-qemu kvm 42949672960 giu 3 10:11
23f7cddeb51dd8892cbaa076eef7693ee43fa295_40
-rw-rw-r-- 1 nova nova 2400321536 mag 31 11:32
23f7cddeb51dd8892cbaa076eef7693ee43fa295.part
-rw-r--r-- 1 nova nova 21474836480 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549_20
-rw-r--r-- 1 libvirt-qemu kvm 42949672960 mag 31 10:04
3ea65202fb7a4664fc1983dff33bf3d007627549_40
-rw-rw-r-- 1 nova nova 1581711360 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549.part
-rw-r--r-- 1 nova nova 21474836480 mag 2 22:02
4a00a749fd7741fde579b69202475606c800a053
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 2 22:03
4a00a749fd7741fde579b69202475606c800a053_5
-rw-r--r-- 1 nova nova 21474836480 apr 26 14:27
7e93ee4745fe5a825aff33d0731b12ade576fffc
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 apr 26 14:35
7e93ee4745fe5a825aff33d0731b12ade576fffc_20
-rw-r--r-- 1 nova nova 5368709120 apr 10 11:24
90551e161412bd971a2f323fd9cb9d8b96a56f5f
-rw-r--r-- 1 libvirt-qemu kvm 107374182400 apr 29 17:16
90551e161412bd971a2f323fd9cb9d8b96a56f5f_100
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 apr 10 11:37
90551e161412bd971a2f323fd9cb9d8b96a56f5f_20
-rw-r--r-- 1 libvirt-qemu kvm 5368709120 apr 10 11:24
90551e161412bd971a2f323fd9cb9d8b96a56f5f_5
-rw-r--r-- 1 libvirt-qemu kvm 85899345920 apr 17 20:42
90551e161412bd971a2f323fd9cb9d8b96a56f5f_80
-rw-r--r-- 1 nova nova 21474836480 mag 14 16:40
db5ff3aeff506b37035f0ff1c07acf96311f1172
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 14 16:43
db5ff3aeff506b37035f0ff1c07acf96311f1172_20
-rw-r--r-- 1 nova nova 107374182400 apr 29 17:17
ephemeral_0_100_None
-rw-r--r-- 1 libvirt-qemu kvm 107374182400 apr 29 17:17
ephemeral_0_100_None_100
-rw-r--r-- 1 nova nova 21474836480 mag 27 14:46 ephemeral_0_20_None
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 27 14:46
ephemeral_0_20_None_20
However, only one file in `instances/_base` is in use (both instances
use the same base image):
root@node-09-02-11:/var/lib/nova/instances/_base# fuser -v *
USER PID ACCESS COMMAND
/var/lib/nova/instances/_base/90551e161412bd971a2f323fd9cb9d8b96a56f5f_5:
libvirt-qemu 3225 f.... kvm
libvirt-qemu 24110 f.... kvm
Can the other files in `_base` be safely deleted?
Thanks for any help,
Riccardo
--
Riccardo Murri
http://www.gc3.uzh.ch/people/rm
Grid Computing Competence Centre
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888
Follow ups