← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1290700] [NEW] Nova-manage db archive_deleted_rows stops at the first failure with insufficient diagnostic

 

Public bug reported:

1- After a long run provisioning which creates and deletes VMs for days
we have a quite long history  of deleted instances in the DB that could
be archived

2- We have attempted to run:
nova-manage db archive_deleted_rows --max_rows  1000000

which was accepted but did not complete in  a short time and then was
stopped using ˆc.

Possibly the same happens when you get a time-out because the command plays a multiple insert in the target shadow table and only after this deletes the entries that where logically deleted in the on-line table. Not sure if both are in the same commit cycle and if the first  is rolled back when the second  is unable to complete.
Also it is not clear if the command can be executed in concurrency form multiple users without problems. It happened to us to that the DB was left in an inconsistent state with rows still present in the on-line tables and  already copied to the shadow tables.

3- As consequence of this situation any further invocation of the
command also with a limited max_row number will fail. This is not good
as it could be better to skip the one in error and continue with the
other, reporting which one failed and needs further actions. This leads
the user with the suspect that the archiving doesn't work at all, as
many are saying in the OpenStack  forums

4- The problem here is a serviceability one.  as point one, the command
doesn't return any output if everything went fine, that doesn't help to
make clarity

5- As point two , the output of the command in case something went wrong is not clear about what happened. It is just list the SQL transaction that goes wrong. If the transaction is a multiple insert with a large set of values, that may be the case triggered by an high max row parameter, the output is capable of showing only the final part of the statement.
f the   max_rows parameter is big, the part of the output that fits in the shell is just a list of the values of the last field in the multiple insert, usually the content of the 'deleted' field for the row processed, which is a counter and not so meaningful to the user .
e.g. .......1401601, 1401602,  1401603, 1401604, 1401605, 1401606, 1401607, 1401608, 1401609, 1401610, 1401611)
Please note that in this case the command can be  partially executed and any further attempt blocks at the same point.

6- as work around the user may only execute the command with max_row =1,
see the output and fix every problem manually in the DB. Not really
practical for the purpose of the command.

** Affects: nova-project
     Importance: Undecided
         Status: New

** Project changed: neutron => nova-project

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1290700

Title:
  Nova-manage db archive_deleted_rows stops at the first failure with
  insufficient diagnostic

Status in The Nova Project:
  New

Bug description:
  1- After a long run provisioning which creates and deletes VMs for
  days we have a quite long history  of deleted instances in the DB that
  could be archived

  2- We have attempted to run:
  nova-manage db archive_deleted_rows --max_rows  1000000

  which was accepted but did not complete in  a short time and then was
  stopped using ˆc.

  Possibly the same happens when you get a time-out because the command plays a multiple insert in the target shadow table and only after this deletes the entries that where logically deleted in the on-line table. Not sure if both are in the same commit cycle and if the first  is rolled back when the second  is unable to complete.
  Also it is not clear if the command can be executed in concurrency form multiple users without problems. It happened to us to that the DB was left in an inconsistent state with rows still present in the on-line tables and  already copied to the shadow tables.

  3- As consequence of this situation any further invocation of the
  command also with a limited max_row number will fail. This is not good
  as it could be better to skip the one in error and continue with the
  other, reporting which one failed and needs further actions. This
  leads the user with the suspect that the archiving doesn't work at
  all, as many are saying in the OpenStack  forums

  4- The problem here is a serviceability one.  as point one, the
  command doesn't return any output if everything went fine, that
  doesn't help to make clarity

  5- As point two , the output of the command in case something went wrong is not clear about what happened. It is just list the SQL transaction that goes wrong. If the transaction is a multiple insert with a large set of values, that may be the case triggered by an high max row parameter, the output is capable of showing only the final part of the statement.
  f the   max_rows parameter is big, the part of the output that fits in the shell is just a list of the values of the last field in the multiple insert, usually the content of the 'deleted' field for the row processed, which is a counter and not so meaningful to the user .
  e.g. .......1401601, 1401602,  1401603, 1401604, 1401605, 1401606, 1401607, 1401608, 1401609, 1401610, 1401611)
  Please note that in this case the command can be  partially executed and any further attempt blocks at the same point.

  6- as work around the user may only execute the command with max_row
  =1, see the output and fix every problem manually in the DB. Not
  really practical for the purpose of the command.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova-project/+bug/1290700/+subscriptions


Follow ups

References