yahoo-eng-team team mailing list archive

Thread
Date

[Bug 2024258] [NEW] Performance degradation archiving DB with large numbers of FK related records

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: melanie witt <2024258@xxxxxxxxxxxxxxxxxx>
Date: Fri, 16 Jun 2023 18:31:23 -0000
Reply-to: Bug 2024258 <2024258@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Public bug reported:

Observed downstream in a large scale cluster with constant create/delete
server activity and hundreds of thousands of deleted instances rows.

Currently, we archive deleted rows in batches of max_rows parents +
their child rows in a single database transaction. Doing it that way
limits how high a value of max_rows can be specified by the caller
because of the size of the database transaction it could generate.

For example, in a large scale deployment with hundreds of thousands of
deleted rows and constant server creation and deletion activity, a
value of max_rows=1000 might exceed the database's configured maximum
packet size or timeout due to a database deadlock, forcing the operator
to use a much lower max_rows value like 100 or 50.

And when the operator has e.g. 500,000 deleted instances rows (and
millions of deleted rows total) they are trying to archive, being
forced to use a max_rows value several orders of magnitude lower than
the number of rows they need to archive is a poor user experience and
also makes it unclear if archive progress is actually being made.

** Affects: nova
     Importance: Undecided
     Assignee: melanie witt (melwitt)
         Status: New

** Affects: nova/antelope
     Importance: Undecided
         Status: New

** Affects: nova/wallaby
     Importance: Undecided
         Status: New

** Affects: nova/xena
     Importance: Undecided
         Status: New

** Affects: nova/yoga
     Importance: Undecided
         Status: New

** Affects: nova/zed
     Importance: Undecided
         Status: New


** Tags: db performance

** Description changed:

- Observed downstream in a large scale cluster with constant create/delete 
+ Observed downstream in a large scale cluster with constant create/delete
  server activity and hundreds of thousands of deleted instances rows.
  
  Currently, we archive deleted rows in batches of max_rows parents +
  their child rows in a single database transaction. Doing it that way
  limits how high a value of max_rows can be specified by the caller
  because of the size of the database transaction it could generate.
  
  For example, in a large scale deployment with hundreds of thousands of
  deleted rows and constant server creation and deletion activity, a
  value of max_rows=1000 might exceed the database's configured maximum
  packet size or timeout due to a database deadlock, forcing the operator
  to use a much lower max_rows value like 100 or 50.
  
  And when the operator has e.g. 500,000 deleted instances rows (and
  millions of deleted rows total) they are trying to archive, being
  forced to use a max_rows value several orders of magnitude lower than
  the number of rows they need to archive is a poor user experience and
- makes it unclear if archive progress is actually being made.
+ also makes it unclear if archive progress is actually being made.

** Also affects: nova/xena
   Importance: Undecided
       Status: New

** Also affects: nova/antelope
   Importance: Undecided
       Status: New

** Also affects: nova/zed
   Importance: Undecided
       Status: New

** Also affects: nova/wallaby
   Importance: Undecided
       Status: New

** Also affects: nova/yoga
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2024258

Title:
  Performance degradation archiving DB with large numbers of FK related
  records

Status in OpenStack Compute (nova):
  New
Status in OpenStack Compute (nova) antelope series:
  New
Status in OpenStack Compute (nova) wallaby series:
  New
Status in OpenStack Compute (nova) xena series:
  New
Status in OpenStack Compute (nova) yoga series:
  New
Status in OpenStack Compute (nova) zed series:
  New

Bug description:
  Observed downstream in a large scale cluster with constant create/delete
  server activity and hundreds of thousands of deleted instances rows.

  Currently, we archive deleted rows in batches of max_rows parents +
  their child rows in a single database transaction. Doing it that way
  limits how high a value of max_rows can be specified by the caller
  because of the size of the database transaction it could generate.

  For example, in a large scale deployment with hundreds of thousands of
  deleted rows and constant server creation and deletion activity, a
  value of max_rows=1000 might exceed the database's configured maximum
  packet size or timeout due to a database deadlock, forcing the operator
  to use a much lower max_rows value like 100 or 50.

  And when the operator has e.g. 500,000 deleted instances rows (and
  millions of deleted rows total) they are trying to archive, being
  forced to use a max_rows value several orders of magnitude lower than
  the number of rows they need to archive is a poor user experience and
  also makes it unclear if archive progress is actually being made.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions

Follow ups

[Bug 2024258] Fix included in openstack/nova 27.5.1
From: OpenStack Infra, 2024-10-24
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: melanie witt, 2024-10-01
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: Edward Hope-Morley, 2024-09-30
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: James Page, 2024-09-03
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: Launchpad Bug Tracker, 2024-09-02
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: Launchpad Bug Tracker, 2024-08-22
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: Chengen Du, 2024-07-12
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: Chengen Du, 2024-05-28
[Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records
From: melanie witt, 2023-10-24