← Back to team overview

openstack team mailing list archive

Folsom's `nova-compute` reports wrong amount of free disk space

 

Hello,

we are experiencing failures in starting new instances on our local
OpenStack (Folsom) installation.  The problem seems to be that VMs are
being started on nodes that do not have enough free space. Relevant
extract from `nova-compute.log` on a compute node:

    ... 19314 TRACE nova.compute.manager [instance: ...]
ProcessExecutionError: Unexpected error while running command.
    ... 19314 TRACE nova.compute.manager [instance: ...] Command:
qemu-img convert -O raw
/var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.part
/var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.converted
    ... 19314 TRACE nova.compute.manager [instance: ...] Exit code: 1
    ... 19314 TRACE nova.compute.manager [instance: ...] Stdout: ''
    ... 19314 TRACE nova.compute.manager [instance: ...] Stderr:
'qemu-img: error while writing sector 1720383: No space left on
device\n'

Now, we have two questions:

(1) Why does this happen, i.e., how do we debug the misreporting of
    free disk space?

(2) Most disk space seems to be wasted by old (and presumably failed)
    image conversion attempts in `/var/lib/nova/instances/_base`.  How
    can we clean those up?  Can we just delete the files or some
    special care / database edit is needed?

More details on the node state follows:

(1) There is a curious mismatch between the values that Nova gets from
hypervisor, and those that it apparently reports to the Scheduler:

    ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
ram (MB): 30284 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:470
    ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
disk (GB): 7 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:471
    ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free
VCPUs: 6 _report_hypervisor_resource_view
/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:476
    ... 19314 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 30189
    ... 19314 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 87
    ... 19314 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 5
    ... 19314 INFO nova.compute.resource_tracker [-] Compute_service
record updated for node-09-02-11

Why are 7GB of free disk detected, but 87 reported?  And why does
this not happen for the free RAM?

There are only two instances actually running on the node:

    # pgrep -lf kvm
    1419 kvm-irqfd-clean
    3225 /usr/bin/kvm -name instance-00002ab1 ...
    3228 kvm-pit-wq
    24110 /usr/bin/kvm -name instance-00002800 ...
    24113 kvm-pit-wq


(2) The disk on the node is nearly full (`/var/lib/nova` is on the root fs):

    root@node-09-02-11:/var/lib/nova/instances# df -h
    Filesystem                         Size  Used Avail Use% Mounted on
    /dev/mapper/node--09--02--11-root  103G   96G  2,0G  99% /
    ...

Most of the space is used in the `instances/_base` subdirectory:

    root@node-09-02-11:/var/lib/nova/instances# du -sch *
    93G	_base
    672M	instance-00002800
    8,0K	instance-00002835
    17M	instance-00002ab1
    8,0K	instance-00002ab2
    8,0K	instance-00002ab3
    4,0K	snapshots
    93G	total

    root@node-09-02-11:/var/lib/nova/instances/_base# ls -l
    total 96645084
    -rw-r--r-- 1 nova         nova   5368709120 mag 14 19:12
04df06d78ce815d1e1dbac931318253f5423480f
    -rw-r--r-- 1 libvirt-qemu kvm    5368709120 mag 14 19:13
04df06d78ce815d1e1dbac931318253f5423480f_5
    -rw-r--r-- 1 nova         nova  10737418240 mag 22 14:24
0c09def8c324019bff1d98d88fa3266a964acd78
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag 22 14:25
0c09def8c324019bff1d98d88fa3266a964acd78_20
    -rw-rw-r-- 1 nova         nova   1276903424 mag 22 14:24
0c09def8c324019bff1d98d88fa3266a964acd78.part
    -rw-r--r-- 1 nova         nova  21474836480 mag 31 11:33
23f7cddeb51dd8892cbaa076eef7693ee43fa295
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag 31 11:34
23f7cddeb51dd8892cbaa076eef7693ee43fa295_20
    -rw-r--r-- 1 libvirt-qemu kvm   42949672960 giu  3 10:11
23f7cddeb51dd8892cbaa076eef7693ee43fa295_40
    -rw-rw-r-- 1 nova         nova   2400321536 mag 31 11:32
23f7cddeb51dd8892cbaa076eef7693ee43fa295.part
    -rw-r--r-- 1 nova         nova  21474836480 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549_20
    -rw-r--r-- 1 libvirt-qemu kvm   42949672960 mag 31 10:04
3ea65202fb7a4664fc1983dff33bf3d007627549_40
    -rw-rw-r-- 1 nova         nova   1581711360 mag 30 15:30
3ea65202fb7a4664fc1983dff33bf3d007627549.part
    -rw-r--r-- 1 nova         nova  21474836480 mag  2 22:02
4a00a749fd7741fde579b69202475606c800a053
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag  2 22:03
4a00a749fd7741fde579b69202475606c800a053_5
    -rw-r--r-- 1 nova         nova  21474836480 apr 26 14:27
7e93ee4745fe5a825aff33d0731b12ade576fffc
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 apr 26 14:35
7e93ee4745fe5a825aff33d0731b12ade576fffc_20
    -rw-r--r-- 1 nova         nova   5368709120 apr 10 11:24
90551e161412bd971a2f323fd9cb9d8b96a56f5f
    -rw-r--r-- 1 libvirt-qemu kvm  107374182400 apr 29 17:16
90551e161412bd971a2f323fd9cb9d8b96a56f5f_100
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 apr 10 11:37
90551e161412bd971a2f323fd9cb9d8b96a56f5f_20
    -rw-r--r-- 1 libvirt-qemu kvm    5368709120 apr 10 11:24
90551e161412bd971a2f323fd9cb9d8b96a56f5f_5
    -rw-r--r-- 1 libvirt-qemu kvm   85899345920 apr 17 20:42
90551e161412bd971a2f323fd9cb9d8b96a56f5f_80
    -rw-r--r-- 1 nova         nova  21474836480 mag 14 16:40
db5ff3aeff506b37035f0ff1c07acf96311f1172
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag 14 16:43
db5ff3aeff506b37035f0ff1c07acf96311f1172_20
    -rw-r--r-- 1 nova         nova 107374182400 apr 29 17:17
ephemeral_0_100_None
    -rw-r--r-- 1 libvirt-qemu kvm  107374182400 apr 29 17:17
ephemeral_0_100_None_100
    -rw-r--r-- 1 nova         nova  21474836480 mag 27 14:46 ephemeral_0_20_None
    -rw-r--r-- 1 libvirt-qemu kvm   21474836480 mag 27 14:46
ephemeral_0_20_None_20

However, only one file in `instances/_base` is in use (both instances
use the same base image):

    root@node-09-02-11:/var/lib/nova/instances/_base# fuser -v *
                         USER        PID ACCESS COMMAND
    /var/lib/nova/instances/_base/90551e161412bd971a2f323fd9cb9d8b96a56f5f_5:
                         libvirt-qemu   3225 f.... kvm
                         libvirt-qemu  24110 f.... kvm

Can the other files in `_base` be safely deleted?

Thanks for any help,
Riccardo

--
Riccardo Murri
http://www.gc3.uzh.ch/people/rm

Grid Computing Competence Centre
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888


Follow ups