← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1755906] Re: Occasional deadlock during db_sync --contract during Newton to Pike live upgrade

 

[Expired for OpenStack Identity (keystone) because there has been no
activity for 60 days.]

** Changed in: keystone
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1755906

Title:
  Occasional deadlock during db_sync --contract during Newton to Pike
  live upgrade

Status in OpenStack Identity (keystone):
  Expired

Bug description:
  We are testing upgrade from Newton to Pike (skip Ocata). The Newton
  system has two controllers. On controller0 (active) we upgrade
  controller1 to Pike. Then we swact to make controller1 active and
  controller0 standby, then we upgrade controller0 to Pike. After that
  we upgrade the other nodes to Pike.

  Now with all the nodes running Pike and controller1 as active, we
  finalize the upgrade by running "keystone-manage db_sync --contract".
  Occasionally the contract failed with a keystone DB deadlock.

  The log from postgres during the deadlock:
  ------------------------------------------
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR:  deadlock detected
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3]     Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4]     Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5]     Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6]     FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7]     FROM "user"
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8]     WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT:  See server log for query details.
  2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE

  The log from keystone during the deadlock:
  ------------------------------------------

  keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  HINT:  See server log for query details.
   [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/bin/keystone-manage", line 10, in <module>
  2018-02-22 21:26:47.447 76427 ERROR keystone     sys.exit(main())
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     cli.main(argv=sys.argv, config_files=config_files)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     CONF.command.cmd_class.main()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
  2018-02-22 21:26:47.447 76427 ERROR keystone     upgrades.contract_schema()
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
  2018-02-22 21:26:47.447 76427 ERROR keystone     _sync_repo(repo_name=CONTRACT_REPO)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
  2018-02-22 21:26:47.447 76427 ERROR keystone     init_version=init_version, sanity_check=False)
  2018-02-22 21:26:47.447 76427 ERROR keystone   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
  2018-02-22 21:26:47.447 76427 ERROR keystone     raise exception.DbMigrationError(ex)
  2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
  2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL:  Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
  2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
  2018-02-22 21:26:47.447 76427 ERROR keystone HINT:  See server log for query details.
  2018-02-22 21:26:47.447 76427 ERROR keystone  [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
  2018-02-22 21:26:47.447 76427 ERROR keystone

  Looks like the two processes involves are the keystone-management
  command and a query from a openstack service. The keystone tables
  involved are "user" and "local_user". But we do see the deadlock
  happened on nonlocal_user table as well.

  When re-run keystone-manage db_sync --contract, it would failed and complaining:
  --------------------------------------------------------------------------------
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR:  relation "ixu_user_id_domain_id" already exists
  2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT:  ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)

  This seems to be because the previous failed (deadlock) run
  successfully created the unique constraint in step 014 of
  contract_repo, just before the deadlock happens.

  If the deadlock can't be fixed completely, making db_sync --contract
  re-runnable seems to be a reasonable solution as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1755906/+subscriptions


References