← Back to team overview

touch-packages team mailing list archive

[Bug 1347147] Re: krb5 database operations enter infinite loop

 

** Description changed:

- In some conditions, propagating a kerberos database to a slave KDC server or performing other database operations can stall.  As we've investigated the issue, it looks like a database with more than a few hundred principals is very likely to run into this issue.
+ [Impact]
+ 
+ On krb5 KDC databases with more than a few hundred principals,
+ operations can enter an infinite loop in the database library.  This
+ affects both read and write operations.  If operators are fortunate,
+ they will encounter this bug while testing a migration.  If they are not
+ so fortunate, they will encounter this bug in a production KDC when the
+ number of principals crosses the threshold where this bug manifests,
+ resulting in a service outage and possible database corruption.
+ Probably the only way to restore service in that situation is to install
+ a patched KDC or to downgrade to an unaffected version.
+ 
+ Both Trusty and Utopic amd64 have been verified to have this issue.
+ 
+ One concrete reported example is an invocation of kdb5_util load (as
+ part of a slave KDC propagation) spinning:
+ 
+ http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html
+ 
+ Additional failure modes are likely
+ 
+ The proposed fix at https://launchpad.net/~hartmans/+archive/ubuntu/ubuntu-fixes
+ works around a compiler optimizer bug in the gcc-4.8 series, which incorrectly deduces that a strict aliasing violation has occurred and miscompiles part of the bundled libdb2 library that the KDC database back end depends upon.  The miscompilation causes a data structure to contain an inappropriate cycle, which leads to an infinite loop when the structure is traversed.
+ 
+ [Test Case]
+ 
+ apt-get install krb5-kdc krb5-admin-server
+ kdb5_util -W -r T create -s
+ awk 'BEGIN{ for (i = 0; i < 1024; i++) { printf("ank -randkey a%06d\n", i) } }' /dev/null | kadmin.local -r T
+ 
+ (Enter any password for the master key when requested.)
+ 
+ On platforms with this issue, kadmin.local spins consuming 100% CPU
+ after a few hundred principals have been created.  (This is "a000762" on
+ two examples.)
+ 
+ To clean up,
+ 
+ rm /etc/krb5kdc/principal*
+ 
+ or
+ 
+ krb5kdc -r T destroy
+ 
+ but the latter can possibly enter the same infinite loop.
+ 
+ [Regression Potential]
+ 
+ Negligible.
+ 
+ It is theoretically possible that our upstream workaround, which
+ involves using TAILQ macros instead of CIRCLEQ macros in the bundled
+ libdb2 that backs the KDC database, will have some as-yet undiscovered
+ bugs or compiler interactions with consequences worse than this current
+ issue.  I think this is rather unlikely.
+ 
+ The patched libdb2 passes both the extensive libdb2 test suite and the
+ rest of the krb5 test suite.  Prior to patching, compiling krb5 with an
+ affected gcc would cause the krb5 test suite to stall when it reached
+ the libdb2 test suite.  (The test suite stall is how we became aware of
+ the gcc optimizer bug.)
+ 
+ The BSD TAILQ macros are generally considered to be safer than the
+ CIRCLEQ macros, and the various open-source BSD derivatives have made
+ the corresponding change to their libdb sources years ago, with no
+ reported ill effects that I can see.
+ 
+ 
+ Original report from Ben Kaduk:
+ 
+ ==========
+ 
+ In some conditions, propagating a kerberos database to a slave KDC server can stall.
  This is due to a misoptimization by gcc 4.8 of the CIRCLEQ famliy of macros, apparently due to overzealous strict aliasing deductions.
  
  One case of this stall is reported at
  http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html (and the
  rest of the thread), and there is an entry in the upstream bugtracker at
  http://krbdev.mit.edu/rt/Ticket/Display.html?id=7860 .
  
  gcc 4.9 (as used in Debian unstable at present) is not believed to
  induce this problem.  Upstream has patched their code to use the TAILQ
  family of macros instead, as a workaround, but that workaround has not
  yet appeared in an upstream release:
  https://github.com/krb5/krb5/commit/26d8744129
  
- A branch is linked including  this upstream work around and two other
- patches to bugs already nominated for trusty applied to the krb5 in
- trusty.  We believe the impact is significant because this is likely to
- be a problem for sites with a large database running trusty.  The
- regression potential is very small.  The upstream work around changes
- from one family of queue macros that are stable and well-tested to
- another.
- 
- For utopic, the simplest fix is to rebuild krb5 with the compiler
- currently in utopic.  An alternative is to request that the Debian
- maintainers (both monitoring this bug for such a request) upload the
- upstream work around to Debian and sync that.  You could do an ubuntu-
- specific upload but it seems undesirable to introduce a change between
- Ubuntu and Debian when all the right parties are happy to avoid it.
- 
  Because of the different compiler versions used on Debian and Ubuntu, I
  am filing this as an Ubuntu-specific bug.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to gcc-4.8 in Ubuntu.
https://bugs.launchpad.net/bugs/1347147

Title:
  krb5 database operations enter infinite loop

Status in The GNU Compiler Collection:
  Unknown
Status in Network Authentication System:
  Unknown
Status in “gcc-4.8” package in Ubuntu:
  Confirmed
Status in “krb5” package in Ubuntu:
  Triaged

Bug description:
  [Impact]

  On krb5 KDC databases with more than a few hundred principals,
  operations can enter an infinite loop in the database library.  This
  affects both read and write operations.  If operators are fortunate,
  they will encounter this bug while testing a migration.  If they are
  not so fortunate, they will encounter this bug in a production KDC
  when the number of principals crosses the threshold where this bug
  manifests, resulting in a service outage and possible database
  corruption.  Probably the only way to restore service in that
  situation is to install a patched KDC or to downgrade to an unaffected
  version.

  Both Trusty and Utopic amd64 have been verified to have this issue.

  One concrete reported example is an invocation of kdb5_util load (as
  part of a slave KDC propagation) spinning:

  http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html

  Additional failure modes are likely

  The proposed fix at https://launchpad.net/~hartmans/+archive/ubuntu/ubuntu-fixes
  works around a compiler optimizer bug in the gcc-4.8 series, which incorrectly deduces that a strict aliasing violation has occurred and miscompiles part of the bundled libdb2 library that the KDC database back end depends upon.  The miscompilation causes a data structure to contain an inappropriate cycle, which leads to an infinite loop when the structure is traversed.

  [Test Case]

  apt-get install krb5-kdc krb5-admin-server
  kdb5_util -W -r T create -s
  awk 'BEGIN{ for (i = 0; i < 1024; i++) { printf("ank -randkey a%06d\n", i) } }' /dev/null | kadmin.local -r T

  (Enter any password for the master key when requested.)

  On platforms with this issue, kadmin.local spins consuming 100% CPU
  after a few hundred principals have been created.  (This is "a000762"
  on two examples.)

  To clean up,

  rm /etc/krb5kdc/principal*

  or

  krb5kdc -r T destroy

  but the latter can possibly enter the same infinite loop.

  [Regression Potential]

  Negligible.

  It is theoretically possible that our upstream workaround, which
  involves using TAILQ macros instead of CIRCLEQ macros in the bundled
  libdb2 that backs the KDC database, will have some as-yet undiscovered
  bugs or compiler interactions with consequences worse than this
  current issue.  I think this is rather unlikely.

  The patched libdb2 passes both the extensive libdb2 test suite and the
  rest of the krb5 test suite.  Prior to patching, compiling krb5 with
  an affected gcc would cause the krb5 test suite to stall when it
  reached the libdb2 test suite.  (The test suite stall is how we became
  aware of the gcc optimizer bug.)

  The BSD TAILQ macros are generally considered to be safer than the
  CIRCLEQ macros, and the various open-source BSD derivatives have made
  the corresponding change to their libdb sources years ago, with no
  reported ill effects that I can see.

  
  Original report from Ben Kaduk:

  ==========

  In some conditions, propagating a kerberos database to a slave KDC server can stall.
  This is due to a misoptimization by gcc 4.8 of the CIRCLEQ famliy of macros, apparently due to overzealous strict aliasing deductions.

  One case of this stall is reported at
  http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html (and
  the rest of the thread), and there is an entry in the upstream
  bugtracker at http://krbdev.mit.edu/rt/Ticket/Display.html?id=7860 .

  gcc 4.9 (as used in Debian unstable at present) is not believed to
  induce this problem.  Upstream has patched their code to use the TAILQ
  family of macros instead, as a workaround, but that workaround has not
  yet appeared in an upstream release:
  https://github.com/krb5/krb5/commit/26d8744129

  Because of the different compiler versions used on Debian and Ubuntu,
  I am filing this as an Ubuntu-specific bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/1347147/+subscriptions


References