canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #04465
[Bug 2067628] Re: openstack core count may be inaccurate
I've built a tiny script to compare what is reported by the quota and a
manual count.
It prints the quota, then the count, then the quota again, because since
it can take a bit of time to compute the manual count, this shows
quickly if there are inconsistencies between the first and second quota
displayed.
During the manual count, it will also print any instance that is not either `ACTIVE` or `ERROR`, because those are two very common status, and it has been verified that they are correctly counted in the quota. `SHUTOFF` has also been verified as counting in the quota, but is sufficiently rare as to not make too much noise, and that also helps in cleaning up those usually old VMs.
So far, all the printed VMs are in the `BUILD` state, and since I've always observed inconsistencies between the reported quota and the manual count, I guess they got added to the quota during this state.
Here are example outputs from the script, with some comments:
# lcy02:
quota: {'core': 92, 'instance': 43}
adt-noble-amd64-liblocale-us-perl-20231129-134245-juju-7f2275-prod-proposed-migration-environment-3 - BUILD - autopkgtest - 2
adt-focal-amd64-nvidia-graphics-drivers-525-20231129-140832-juju-7f2275-prod-proposed-migration-environment-2 - BUILD - autopkgtest - 2
adt-noble-amd64-systemd-upstream-20240625-110038-juju-7f2275-prod-proposed-migration-environment-3-2792f97d-63b5-4c80-a259-8ef43a4d062e - BUILD - autopkgtest - 2
adt-oracular-amd64-dgit-20240625-082704-juju-7f2275-prod-proposed-migration-environment-3-1cb642ce-cf37-4f8e-a233-f561d63f5557 - BUILD - autopkgtest - 2
adt-noble-amd64-systemd-upstream-20240625-101614-juju-7f2275-prod-proposed-migration-environment-3-93c90fad-c9eb-4c82-9f69-b8d4a267b8e2 - BUILD - autopkgtest - 2
count: {'core': 96, 'instance': 45}
quota: {'core': 92, 'instance': 43}
This is a very common example: a few VMs in `BUILD`, and the reported
quota is a bit below the manual count.
# bos03-arm64:
quota: {'core': 186, 'instance': 42}
adt-oracular-arm64-r-cran-ps-20240617-073206-juju-7f2275-prod-proposed-migration-environment-3-82b1810f-7af3-447f-b772-c474b3675c87 - BUILD - autopkgtest - 2
count: {'core': 188, 'instance': 43}
quota: {'core': 186, 'instance': 42}
This one is interesting, because the delta between the counted and
reported values is exactly the only VM that is displayed. In addition,
trying to `openstack server show` this VM reports `No server with a name
or ID of '[...]' exists.`, confirming that OpenStack is clearly
inconsistent with this one.
# bos02-arm64:
quota: {'core': 145, 'instance': 84}
count: {'core': 55, 'instance': 25}
quota: {'core': 145, 'instance': 84}
This one is really weird: lots of instances counting in the quota, but only a third displayed in `openstack server list`. This is probably a case where we should ask IS to run some magic.
# bos03-s390x and bos03-ppc64el:
quota: {'core': 0, 'instance': 0}
count: {'core': 0, 'instance': 0}
quota: {'core': 0, 'instance': 0}
Not very interesting, but at least it's consistent: if we don't use
those OpenStack, the zeros are everywhere.
** Attachment added: "check-quota.py"
https://bugs.launchpad.net/auto-package-testing/+bug/2067628/+attachment/5792230/+files/check-quota.py
** Changed in: auto-package-testing
Status: New => In Progress
--
You received this bug notification because you are a member of
Canonical's Ubuntu QA, which is subscribed to Auto Package Testing.
https://bugs.launchpad.net/bugs/2067628
Title:
openstack core count may be inaccurate
Status in Auto Package Testing:
In Progress
Bug description:
Its has been reported (thanks ginggs) that failures can happen in
autopkgtest-cloud with error:
Quota exceeded for cores: Requested 2, but already used 514 of 515
cores (HTTP 403)
Full log at [1]. I checked number of running VMs shortly after, and I
counted less than 400 cores in use (taking into account that
autopkgtest-big instances have 4 cores).
This may be a side effect of dropping the flock [2]. It may be that
the instance deletion is asynchronous, and cores are freed only after
the delete operation is complete.
We should do something like:
1. Figure out a way to query openstack for the current quota usage, and check how it matches the number of running VMs.
2. Check if in the worker we can do something like instance.delete(wait=True) so that we want for the VM to be deleted before proceeding. I made that option up, but given that the CLI tool has a wait parameter, delete() is likely to also have something like that.
3. Check whether this improves the comparison of point (1.)
[1] https://autopkgtest.ubuntu.com/results/autopkgtest-xenial/xenial/i386/u/ubuntu-advantage-tools/20240528_152118_2d212@/log.gz
[2] https://salsa.debian.org/ubuntu-ci-team/autopkgtest/-/commit/49f5760dddcdf7b3f70c177f3000391d1db0dbdd
To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/2067628/+subscriptions
References