yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #32071
[Bug 1445675] [NEW] missing index on virtual_interfaces can cause long queries that can cause timeouts in launching instances
Public bug reported:
In a load test where a nova environment w/ networking enabled was set up
to have ~250K instances, attempting to launch 50 instances would cause
many to time out, with the error "Timeout while waiting on RPC response
- topic: "network", RPC method: "allocate_for_instance". The tester
isolated the latency here to queries against the virtual_interfaces
table, which in this test is executed some 500 times, spending ~.5
seconds per query for a total of 200 seconds. An example query looks
like:
SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '9774e729-7695-4e2b-a9b2-a104a4b020d0'
LIMIT 1;
Query profiling against this table /query directly proceeded as follows:
I scripted up direct DB access to get 250K rows in a blank database:
MariaDB [nova]> select count(*) from virtual_interfaces;
+----------+
| count(*) |
+----------+
| 250000 |
+----------+
1 row in set (0.09 sec)
emitting the query when the row is found, on this particular system is
returning in .03 sec:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.03 sec)
we can see that for a row not found, where it has to scan the whole table, it's 10x longer:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.14 sec)
There's nothing mysterious going on here as an EXPLAIN shows plainly that we are doing a full table scan:
MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | virtual_interfaces | ALL | NULL | NULL | NULL | NULL | 250170 | Using where |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
adding an index on the uuid field via "create index vuidx on virtual_interfaces(uuid)", the EXPLAIN now shows the index used:
MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| 1 | SIMPLE | virtual_interfaces | ref | vuidx | vuidx | 111 | const | 1 | Using index condition; Using where |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)
and we get 0.00 response time for both queries:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.00 sec)
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)
whether or not the index includes "deleted" doesn't really matter. If we're searching for UUIDs, we get that UUID row first, then the "deleted=0" is checked, not a big deal.
For an immediate fix, I propose to add the aforementioned index to the
virtual_interfaces.uuid column.
** Affects: nova
Importance: Undecided
Assignee: Mike Bayer (zzzeek)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1445675
Title:
missing index on virtual_interfaces can cause long queries that can
cause timeouts in launching instances
Status in OpenStack Compute (Nova):
In Progress
Bug description:
In a load test where a nova environment w/ networking enabled was set
up to have ~250K instances, attempting to launch 50 instances would
cause many to time out, with the error "Timeout while waiting on RPC
response - topic: "network", RPC method: "allocate_for_instance".
The tester isolated the latency here to queries against the
virtual_interfaces table, which in this test is executed some 500
times, spending ~.5 seconds per query for a total of 200 seconds. An
example query looks like:
SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '9774e729-7695-4e2b-a9b2-a104a4b020d0'
LIMIT 1;
Query profiling against this table /query directly proceeded as
follows:
I scripted up direct DB access to get 250K rows in a blank database:
MariaDB [nova]> select count(*) from virtual_interfaces;
+----------+
| count(*) |
+----------+
| 250000 |
+----------+
1 row in set (0.09 sec)
emitting the query when the row is found, on this particular system is
returning in .03 sec:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.03 sec)
we can see that for a row not found, where it has to scan the whole table, it's 10x longer:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.14 sec)
There's nothing mysterious going on here as an EXPLAIN shows plainly that we are doing a full table scan:
MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | virtual_interfaces | ALL | NULL | NULL | NULL | NULL | 250170 | Using where |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
adding an index on the uuid field via "create index vuidx on virtual_interfaces(uuid)", the EXPLAIN now shows the index used:
MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| 1 | SIMPLE | virtual_interfaces | ref | vuidx | vuidx | 111 | const | 1 | Using index condition; Using where |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)
and we get 0.00 response time for both queries:
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.00 sec)
MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)
whether or not the index includes "deleted" doesn't really matter. If we're searching for UUIDs, we get that UUID row first, then the "deleted=0" is checked, not a big deal.
For an immediate fix, I propose to add the aforementioned index to
the virtual_interfaces.uuid column.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1445675/+subscriptions
Follow ups
References