yahoo-eng-team team mailing list archive

Thread
Date

[Bug 2045168] Re: instances page fails to load if it takes more than 26 seconds

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Rodrigo Barbieri <2045168@xxxxxxxxxxxxxxxxxx>
Date: Fri, 16 Feb 2024 14:22:56 -0000
Reply-to: Bug 2045168 <2045168@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

I am removing the Horizon from the affected projects because I found out
that the issue is caused by haproxy. The charmed installation of horizon
installs and configures haproxy under-the-hood (which I didn't know and
had assumed that the installation was equivalent to upstream, but I was
wrong. Usually charms have haproxy added optionally for HA, but the
horizon charm is an exception) and that one is causing the 30 second
limit. The problem is solving by configuring the charm with the setting

juju config openstack-dashboard haproxy-server-timeout=300000

** No longer affects: horizon

** Changed in: charm-openstack-dashboard
Status: New => Invalid

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/2045168

Title:
instances page fails to load if it takes more than 26 seconds

Status in OpenStack Dashboard Charm:
Invalid

Bug description:
Focal-ussuri customer env with lots of resources.

when trying to load the project>instance page, if the total amount of
time loading data takes more than 26 seconds, the page enters a reload
loop until the browser times out in 5 minutes.

The 26 seconds number was obtained in the following way:

1) 5 minute browser timeout was observed when trying to load the page
2) logs were inspected and noticed that some queries were taking very long, like glance ~12 secs, neutron ~8 seconds, etc. Queries to nova take at most 3 seconds.
3) in a separate env with zero resources where it would load instantly, I added a time.sleep in the api/glance.py file when invoking glance for images (glance is invoked multiple times when loading the instances page). Sleeping 14 seconds times out on 5 minutes, but sleeping 13 seconds does not timeout and loads quickly. When it times out with 14 seconds, I tailed the logs and noticed that the same group of requests were being repeated for a while, always starting with the flavors request. With the 13 seconds sleep the requests would not repeat.
4) Removed the sleep from the api/glance.py file and added a sleep of 26 secs in the project/instances/views.py file get_data method right after

image_dict, flavor_dict, volume_dict =
futurist_utils.call_functions_parallel(self._get_images,
self._get_flavors, self._get_volumes)

With 26 seconds sleep it does not timeout nor repeat the requests, the
page loads fine. But with 27 seconds sleep it times out on 5 minutes
and keeps repeating the requests on the logs.

My conclusion is that the get_data method does not tolerate taking
longer than 26 seconds to finish loading the page, and "reloads"
itself, entering a loop that never finishes if the page cannot be
loaded in less than 26 seconds.

Ideally this internal timeout that causes a reload loop should be
configurable and more tolerant by default.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-openstack-dashboard/+bug/2045168/+subscriptions

References

[Bug 2045168] [NEW] instances page fails to load if it takes more than 26 seconds
From: Rodrigo Barbieri, 2023-11-29