← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1755906] [NEW] Occasional deadlock during db_sync --contract during Newton to Pike live upgrade

 

Public bug reported:

We are testing upgrade from Newton to Pike (skip Ocata). The Newton
system has two controllers. On controller0 (active) we upgrade
controller1 to Pike. Then we swact to make controller1 active and
controller0 standby, then we upgrade controller0 to Pike. After that we
upgrade the other nodes to Pike.

Now with all the nodes running Pike and controller1 as active, we
finalize the upgrade by running "keystone-manage db_sync --contract".
Occasionally the contract failed with a keystone DB deadlock.

The log from postgres during the deadlock:
------------------------------------------
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR:  deadlock detected
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3]     Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4]     Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5]     Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6]     FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7]     FROM "user"
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8]     WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT:  See server log for query details.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE

The log from keystone during the deadlock:
------------------------------------------

keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
HINT:  See server log for query details.
 [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/bin/keystone-manage", line 10, in <module>
2018-02-22 21:26:47.447 76427 ERROR keystone     sys.exit(main())
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
2018-02-22 21:26:47.447 76427 ERROR keystone     cli.main(argv=sys.argv, config_files=config_files)
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
2018-02-22 21:26:47.447 76427 ERROR keystone     CONF.command.cmd_class.main()
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
2018-02-22 21:26:47.447 76427 ERROR keystone     upgrades.contract_schema()
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
2018-02-22 21:26:47.447 76427 ERROR keystone     _sync_repo(repo_name=CONTRACT_REPO)
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
2018-02-22 21:26:47.447 76427 ERROR keystone     init_version=init_version, sanity_check=False)
2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
2018-02-22 21:26:47.447 76427 ERROR keystone     raise exception.DbMigrationError(ex)
2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22 21:26:47.447 76427 ERROR keystone HINT:  See server log for query details.
2018-02-22 21:26:47.447 76427 ERROR keystone  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone

Looks like the two processes involves are the keystone-management
command and a query from a openstack service. The keystone tables
involved are "user" and "local_user". But we do see the deadlock
happened on nonlocal_user table as well.

When re-run keystone-manage db_sync --contract, it would failed and complaining:
--------------------------------------------------------------------------------
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR:  relation "ixu_user_id_domain_id" already exists
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)

This seems to be because the previous failed (deadlock) run successfully
created the unique constraint in step 014 of contract_repo, just before
the deadlock happens.

If the deadlock can't be fixed completely, making db_sync --contract re-
runnable seems to be a reasonable solution as well.

** Affects: keystone
     Importance: Undecided
         Status: New

** Description changed:

  We are testing upgrade from Newton to Pike (skip Ocata). The Newton
  system has two controllers. On controller0 (active) we upgrade
  controller1 to Pike. Then we swact to make controller1 active and
  controller0 standby, then we upgrade controller0 to Pike. After that we
  upgrade the other nodes to Pike.
  
  Now with all the nodes running Pike and controller1 as active, we
  finalize the upgrade by running "keystone-manage db_sync --contract".
  Occasionally the contract failed with a keystone DB deadlock.
  
-  
  The log from postgres during the deadlock:
  ------------------------------------------
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR:  deadlock detected
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3]     Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4]     Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5]     Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6]     FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7]     FROM "user"
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8]     WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT:  See server log for query details.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
  
  The log from keystone during the deadlock:
  ------------------------------------------
  
  keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  HINT:  See server log for query details.
-  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
+  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/bin/keystone-manage", line 10, in <module>
  2018-02-22 21:26:47.447 76427 ERROR keystone     sys.exit(main())
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     cli.main(argv=sys.argv, config_files=config_files)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     CONF.command.cmd_class.main()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     upgrades.contract_schema()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
  2018-02-22 21:26:47.447 76427 ERROR keystone     _sync_repo(repo_name=CONTRACT_REPO)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
  2018-02-22 21:26:47.447 76427 ERROR keystone     init_version=init_version, sanity_check=False)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
  2018-02-22 21:26:47.447 76427 ERROR keystone     raise exception.DbMigrationError(ex)
  2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22 21:26:47.447 76427 ERROR keystone HINT:  See server log for query details.
  2018-02-22 21:26:47.447 76427 ERROR keystone  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone
  
  Looks like the two processes involves are the keystone-management
  command and a query from a openstack service. The keystone tables
- involved are "user" and "local_user".
+ involved are "user" and "local_user". But we do see the deadlock
+ happened on nonlocal_user table as well.
  
  When re-run keystone-manage db_sync --contract, it would failed and complaining:
  --------------------------------------------------------------------------------
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR:  relation "ixu_user_id_domain_id" already exists
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)
  
  This seems to be because the previous failed (deadlock) run successfully
  created the unique constraint in step 014 of contract_repo, just before
  the deadlock happens.
  
  If the deadlock can't be fixed completely, making db_sync --contract re-
  runnable seems to be a reasonable solution as well.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1755906

Title:
  Occasional deadlock during db_sync --contract during Newton to Pike
  live upgrade

Status in OpenStack Identity (keystone):
  New

Bug description:
  We are testing upgrade from Newton to Pike (skip Ocata). The Newton
  system has two controllers. On controller0 (active) we upgrade
  controller1 to Pike. Then we swact to make controller1 active and
  controller0 standby, then we upgrade controller0 to Pike. After that
  we upgrade the other nodes to Pike.

  Now with all the nodes running Pike and controller1 as active, we
  finalize the upgrade by running "keystone-manage db_sync --contract".
  Occasionally the contract failed with a keystone DB deadlock.

  The log from postgres during the deadlock:
  ------------------------------------------
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR:  deadlock detected
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3]     Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4]     Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5]     Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6]     FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7]     FROM "user"
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8]     WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT:  See server log for query details.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE

  The log from keystone during the deadlock:
  ------------------------------------------

  keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  HINT:  See server log for query details.
   [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/bin/keystone-manage", line 10, in <module>
  2018-02-22 21:26:47.447 76427 ERROR keystone     sys.exit(main())
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     cli.main(argv=sys.argv, config_files=config_files)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     CONF.command.cmd_class.main()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     upgrades.contract_schema()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
  2018-02-22 21:26:47.447 76427 ERROR keystone     _sync_repo(repo_name=CONTRACT_REPO)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
  2018-02-22 21:26:47.447 76427 ERROR keystone     init_version=init_version, sanity_check=False)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
  2018-02-22 21:26:47.447 76427 ERROR keystone     raise exception.DbMigrationError(ex)
  2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22 21:26:47.447 76427 ERROR keystone HINT:  See server log for query details.
  2018-02-22 21:26:47.447 76427 ERROR keystone  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone

  Looks like the two processes involves are the keystone-management
  command and a query from a openstack service. The keystone tables
  involved are "user" and "local_user". But we do see the deadlock
  happened on nonlocal_user table as well.

  When re-run keystone-manage db_sync --contract, it would failed and complaining:
  --------------------------------------------------------------------------------
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR:  relation "ixu_user_id_domain_id" already exists
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)

  This seems to be because the previous failed (deadlock) run
  successfully created the unique constraint in step 014 of
  contract_repo, just before the deadlock happens.

  If the deadlock can't be fixed completely, making db_sync --contract
  re-runnable seems to be a reasonable solution as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1755906/+subscriptions


Follow ups