yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #39139
[Bug 1493760] Re: rbd backend reports wrong 'local_gb_used' for compute node
If you look at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4628
you can see the three different functions used for getting available disk space.
'get_volume_group_info'
'get_pool_info' and
'get_fs_info'
All of these methods are going to return the ACTUAL disk space used,
rather than the theoretical maximum of all the instance sizes. This is
because disks stored locally will be stored as sparse qcow images. LVM
disks are sparse volumes.
I believe that the intention of 'local_gb_used' is to report the actual
disk space.
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1493760
Title:
rbd backend reports wrong 'local_gb_used' for compute node
Status in OpenStack Compute (nova):
Invalid
Bug description:
When instance's disk in rbd backend, compute node reports the whole
ceph cluster status, that makes sense. We get the local_gb usage in
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L313
def get_pool_info(self):
with RADOSClient(self) as client:
stats = client.cluster.get_cluster_stats()
return {'total': stats['kb'] * units.Ki,
'free': stats['kb_avail'] * units.Ki,
'used': stats['kb_used'] * units.Ki}
This reports same disk usages with command 'ceph -s', for example:
[root@node-1 ~]# ceph -s
cluster e598930a-0807-491b-b191-d57244d3c8e2
health HEALTH_OK
monmap e1: 1 mons at {node-1=192.168.0.1:6789/0}, election epoch 1, quorum 0 node-1
osdmap e28: 2 osds: 2 up, 2 in
pgmap v3985: 576 pgs, 5 pools, 295 MB data, 57 objects
21149 MB used, 76479 MB / 97628 MB avail
576 active+clean
[root@node-1 ~]# rbd -p compute ls
45892200-97cb-4fa4-9c29-35a0ff9e16f6_disk
8c6a5555-394f-4c24-b7ff-e05fdf322155_disk
944d9028-ac59-45fd-9be3-69066c8bc4e5
9ea375dc-f0b8-472e-ba53-4d83e5721771_disk
9fce4606-6871-40ca-bf8f-6146c05068e6_disk
cedce585-8747-4798-885f-0c47337f0f6f_disk
e17c9391-2032-4144-8fa1-85b092239e66_disk
e19143c7-228c-4f89-9735-c27c333adce4_disk
f9caf4a7-2b62-46c2-b2e1-f99cb4ce3f57_disk
[root@node-1 ~]# rbd -p compute info 45892200-97cb-4fa4-9c29-35a0ff9e16f6_disk
rbd image '45892200-97cb-4fa4-9c29-35a0ff9e16f6_disk':
size 20480 MB in 2560 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.39ab250fe98b
format: 2
features: layering
parent: compute/944d9028-ac59-45fd-9be3-69066c8bc4e5@snap
overlap: 40162 kB
In above example. we have two compute node , and can create 4
instances with 20G disk in each compute. The interesting thing is the
total local_gb is 95G, and allocate 160G for instances.
The root cause is client.cluster.get_cluster_stats() returns actual used size, means 20G instance disk maybe only occupy 200M bytes. This is dangerous when instance use all of their disk.
An alternative solution fo calcuate all instance's disk size by some
way as local_gb_used.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1493760/+subscriptions
References