yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71674
[Bug 1755906] [NEW] Occasional deadlock during db_sync --contract during Newton to Pike live upgrade
Public bug reported:
We are testing upgrade from Newton to Pike (skip Ocata). The Newton
system has two controllers. On controller0 (active) we upgrade
controller1 to Pike. Then we swact to make controller1 active and
controller0 standby, then we upgrade controller0 to Pike. After that we
upgrade the other nodes to Pike.
Now with all the nodes running Pike and controller1 as active, we
finalize the upgrade by running "keystone-manage db_sync --contract".
Occasionally the contract failed with a keystone DB deadlock.
The log from postgres during the deadlock:
------------------------------------------
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR: deadlock detected
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3] Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4] Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5] Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6] FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7] FROM "user"
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8] WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT: See server log for query details.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
The log from keystone during the deadlock:
------------------------------------------
keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
HINT: See server log for query details.
[SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
2018-02-22 21:26:47.447 76427 ERROR keystone File "/bin/keystone-manage", line 10, in <module>
2018-02-22 21:26:47.447 76427 ERROR keystone sys.exit(main())
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
2018-02-22 21:26:47.447 76427 ERROR keystone cli.main(argv=sys.argv, config_files=config_files)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
2018-02-22 21:26:47.447 76427 ERROR keystone CONF.command.cmd_class.main()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
2018-02-22 21:26:47.447 76427 ERROR keystone upgrades.contract_schema()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
2018-02-22 21:26:47.447 76427 ERROR keystone _sync_repo(repo_name=CONTRACT_REPO)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
2018-02-22 21:26:47.447 76427 ERROR keystone init_version=init_version, sanity_check=False)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
2018-02-22 21:26:47.447 76427 ERROR keystone raise exception.DbMigrationError(ex)
2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22 21:26:47.447 76427 ERROR keystone HINT: See server log for query details.
2018-02-22 21:26:47.447 76427 ERROR keystone [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone
Looks like the two processes involves are the keystone-management
command and a query from a openstack service. The keystone tables
involved are "user" and "local_user". But we do see the deadlock
happened on nonlocal_user table as well.
When re-run keystone-manage db_sync --contract, it would failed and complaining:
--------------------------------------------------------------------------------
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR: relation "ixu_user_id_domain_id" already exists
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)
This seems to be because the previous failed (deadlock) run successfully
created the unique constraint in step 014 of contract_repo, just before
the deadlock happens.
If the deadlock can't be fixed completely, making db_sync --contract re-
runnable seems to be a reasonable solution as well.
** Affects: keystone
Importance: Undecided
Status: New
** Description changed:
We are testing upgrade from Newton to Pike (skip Ocata). The Newton
system has two controllers. On controller0 (active) we upgrade
controller1 to Pike. Then we swact to make controller1 active and
controller0 standby, then we upgrade controller0 to Pike. After that we
upgrade the other nodes to Pike.
Now with all the nodes running Pike and controller1 as active, we
finalize the upgrade by running "keystone-manage db_sync --contract".
Occasionally the contract failed with a keystone DB deadlock.
-
The log from postgres during the deadlock:
------------------------------------------
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR: deadlock detected
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3] Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4] Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5] Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6] FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7] FROM "user"
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8] WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT: See server log for query details.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
The log from keystone during the deadlock:
------------------------------------------
keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
HINT: See server log for query details.
- [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
+ [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
2018-02-22 21:26:47.447 76427 ERROR keystone File "/bin/keystone-manage", line 10, in <module>
2018-02-22 21:26:47.447 76427 ERROR keystone sys.exit(main())
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
2018-02-22 21:26:47.447 76427 ERROR keystone cli.main(argv=sys.argv, config_files=config_files)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
2018-02-22 21:26:47.447 76427 ERROR keystone CONF.command.cmd_class.main()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
2018-02-22 21:26:47.447 76427 ERROR keystone upgrades.contract_schema()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
2018-02-22 21:26:47.447 76427 ERROR keystone _sync_repo(repo_name=CONTRACT_REPO)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
2018-02-22 21:26:47.447 76427 ERROR keystone init_version=init_version, sanity_check=False)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
2018-02-22 21:26:47.447 76427 ERROR keystone raise exception.DbMigrationError(ex)
2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22 21:26:47.447 76427 ERROR keystone HINT: See server log for query details.
2018-02-22 21:26:47.447 76427 ERROR keystone [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone
Looks like the two processes involves are the keystone-management
command and a query from a openstack service. The keystone tables
- involved are "user" and "local_user".
+ involved are "user" and "local_user". But we do see the deadlock
+ happened on nonlocal_user table as well.
When re-run keystone-manage db_sync --contract, it would failed and complaining:
--------------------------------------------------------------------------------
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR: relation "ixu_user_id_domain_id" already exists
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)
This seems to be because the previous failed (deadlock) run successfully
created the unique constraint in step 014 of contract_repo, just before
the deadlock happens.
If the deadlock can't be fixed completely, making db_sync --contract re-
runnable seems to be a reasonable solution as well.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1755906
Title:
Occasional deadlock during db_sync --contract during Newton to Pike
live upgrade
Status in OpenStack Identity (keystone):
New
Bug description:
We are testing upgrade from Newton to Pike (skip Ocata). The Newton
system has two controllers. On controller0 (active) we upgrade
controller1 to Pike. Then we swact to make controller1 active and
controller0 standby, then we upgrade controller0 to Pike. After that
we upgrade the other nodes to Pike.
Now with all the nodes running Pike and controller1 as active, we
finalize the upgrade by running "keystone-manage db_sync --contract".
Occasionally the contract failed with a keystone DB deadlock.
The log from postgres during the deadlock:
------------------------------------------
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-1] db=keystone,user=admin-keystone ERROR: deadlock detected
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-2] db=keystone,user=admin-keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-3] Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-4] Process 76959: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-5] Process 76955: SELECT local_user.id AS local_user_id, local_user.user_id AS local_user_user_id, local_user.domain_id AS local_user_domain_id, local_user.name AS local_user_name, local_user.failed_auth_count AS local_user_failed_auth_count, local_user.failed_auth_at AS local_user_failed_auth_at, anon_1.user_id AS anon_1_user_id, anon_1.user_domain_id AS anon_1_user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-6] FROM (SELECT "user".id AS user_id, "user".domain_id AS user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-7] FROM "user"
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-8] WHERE "user".id = '20bd4216910340bc8e6019f6d826f9d7') AS anon_1 JOIN local_user ON anon_1.user_id = local_user.user_id AND anon_1.user_domain_id = local_user.domain_id ORDER BY anon_1.user_id, anon_1.user_domain_id
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-9] db=keystone,user=admin-keystone HINT: See server log for query details.
2018-02-22T21:26:47.000 controller-1 postgres[76959]: warning [2-10] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE
The log from keystone during the deadlock:
------------------------------------------
keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
HINT: See server log for query details.
[SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
2018-02-22 21:26:47.447 76427 ERROR keystone File "/bin/keystone-manage", line 10, in <module>
2018-02-22 21:26:47.447 76427 ERROR keystone sys.exit(main())
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 44, in main
2018-02-22 21:26:47.447 76427 ERROR keystone cli.main(argv=sys.argv, config_files=config_files)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1337, in main
2018-02-22 21:26:47.447 76427 ERROR keystone CONF.command.cmd_class.main()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 528, in main
2018-02-22 21:26:47.447 76427 ERROR keystone upgrades.contract_schema()
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 346, in contract_schema
2018-02-22 21:26:47.447 76427 ERROR keystone _sync_repo(repo_name=CONTRACT_REPO)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
2018-02-22 21:26:47.447 76427 ERROR keystone init_version=init_version, sanity_check=False)
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
2018-02-22 21:26:47.447 76427 ERROR keystone raise exception.DbMigrationError(ex)
2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22 21:26:47.447 76427 ERROR keystone HINT: See server log for query details.
2018-02-22 21:26:47.447 76427 ERROR keystone [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_user_id_fkey FOREIGN KEY(user_id, domain_id) REFERENCES "user" (id, domain_id) ON DELETE CASCADE ON UPDATE CASCADE']
2018-02-22 21:26:47.447 76427 ERROR keystone
Looks like the two processes involves are the keystone-management
command and a query from a openstack service. The keystone tables
involved are "user" and "local_user". But we do see the deadlock
happened on nonlocal_user table as well.
When re-run keystone-manage db_sync --contract, it would failed and complaining:
--------------------------------------------------------------------------------
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-1] db=keystone,user=admin-keystone ERROR: relation "ixu_user_id_domain_id" already exists
2018-02-23T20:27:27.000 controller-1 postgres[13174]: warning [2-2] db=keystone,user=admin-keystone STATEMENT: ALTER TABLE "user" ADD CONSTRAINT ixu_user_id_domain_id UNIQUE (id, domain_id)
This seems to be because the previous failed (deadlock) run
successfully created the unique constraint in step 014 of
contract_repo, just before the deadlock happens.
If the deadlock can't be fixed completely, making db_sync --contract
re-runnable seems to be a reasonable solution as well.
To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1755906/+subscriptions
Follow ups