← Back to team overview

touch-packages team mailing list archive

[Bug 1347147] Re: krb5 database operations enter infinite loop

 

Launchpad has imported 19 comments from the remote bug at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61964.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2014-07-30T13:21:26+00:00 Anders Kaseorg wrote:

Kerberos is miscompiled by gcc-4.8.  The impact is detailed at
https://bugs.launchpad.net/bugs/1347147, but here is a reduced test
case.  The expected return is 0, but when compiled with gcc-4.8 -O2, it
returns 1.

$ cat bug.c

struct node { struct node *next, *prev; } node;
struct head { struct node *first; } heads[5];
int k = 2;
struct head *head = &heads[2];

int main()
{
  node.prev = (void *)head;
  head->first = &node;

  struct node *n = head->first;
  struct head *h = &heads[k];

  if (n->prev == (void *)h)
    h->first = n->next;
  else
    n->prev->next = n->next;

  n->next = h->first;
  return n->next == &node;
}

$ gcc-4.7 -Wall -O2 bug.c -o bug; ./bug; echo $?
0
$ gcc-4.8 -Wall -O2 bug.c -o bug; ./bug; echo $?
1
$ gcc-4.9 -Wall -O2 bug.c -o bug; ./bug; echo $?
0
$ dpkg -l gcc-4.7 gcc-4.8 gcc-4.9
[…]
ii  gcc-4.7  4.7.4-2ubuntu1  amd64  GNU C compiler
ii  gcc-4.8  4.8.3-6ubuntu1  amd64  GNU C compiler
ii  gcc-4.9  4.9.1-3ubuntu2  amd64  GNU C compiler

I bisected the point where the problem disappeared between 4.8 and 4.9
at r202525.  However, I don’t understand why.  I’m scared by the fact
that r202525 was intended to fix a “missed-optimization” bug (bug
58404).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/7

------------------------------------------------------------------------
On 2014-07-30T13:39:40+00:00 Rguenth wrote:

The testcase is violating strict-aliasing rules as you access a struct head
as struct node here:

  if (n->prev == (void *)h)
    h->first = n->next;
  else
    n->prev->next = n->next;

as n->prev points to &heads[0] while h is &heads[2] (an out-of-bound pointer).
So n->prev is a struct head and you access a next field of a struct node of it.

Changing k to 0 makes the testcase pass (now you don't run into the bogus
path).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/8

------------------------------------------------------------------------
On 2014-07-30T14:31:43+00:00 Greg Hudson wrote:

How do you conclude that n->prev points to &heads[0]?  node.prev
receives the value (void *)head, where head is initialized to &heads[2].
I cannot see any uses of &heads[0] in the test program.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/9

------------------------------------------------------------------------
On 2014-07-30T20:21:04+00:00 Anders Kaseorg wrote:

(In reply to Richard Biener from comment #1)
> The testcase is violating strict-aliasing rules as you access a struct head
> as struct node here:

Agree with ghudson here: n->prev points to &heads[2], not &heads[0].

Although assigning a casted struct head * to a struct node * is arguably
sloppy, the standard does not prohibit it, as long as it is never
dereferenced in that form.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/10

------------------------------------------------------------------------
On 2014-07-31T04:19:53+00:00 Anders Kaseorg wrote:

Another bisect between 4.7 and 4.8 shows that the bug appeared with
r189321 (bug 52009).

My test case has triggers the bug in more versions than Kerberos does:
as far as I can tell, Kerberos was unaffected until r192604.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/18

------------------------------------------------------------------------
On 2014-07-31T09:29:28+00:00 Mikpelinux wrote:

I've been staring as this test case, and I cannot find any dereference
of a wrong-typed pointer value.  The only oddity I can find is that at

  if (n->prev == (void *)h)

n == &node, n->prev == (struct node *)&heads[2] (so wrong-typed), h ==
&heads[2], so there is a '==' being applied to a wrong-typed pointer.
Is that undefined behaviour?  I'll note that changing the test to

  if ((void *)n->prev == (void *)h)

still reproduces the wrong-code while looking technically Ok.

Also, there is no out-of-bounds error.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/19

------------------------------------------------------------------------
On 2014-07-31T09:48:33+00:00 Andreas Schwab wrote:

Equality test against pointer to void is explicitly allowed by the
standard and implicitly converts the other pointer to void*.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/20

------------------------------------------------------------------------
On 2014-07-31T10:05:28+00:00 Rguenth wrote:

(In reply to Anders Kaseorg from comment #4)
> Another bisect between 4.7 and 4.8 shows that the bug appeared with r189321
> (bug 52009).
> 
> My test case has triggers the bug in more versions than Kerberos does: as
> far as I can tell, Kerberos was unaffected until r192604.

Thanks - that pin-points it.  tail-merging concludes that

  <bb 3>:
  _11 = n_7->next;
  MEM[(struct head *)_10].first = _11;
  goto <bb 5>;

and

  <bb 4>:
  _13 = n_7->next;
  _10->next = _13;

are equivalent.  But they are not - the stores are performed using
different alias sets.

Note that the actual miscompile happens during RTL instruction
scheduling.

In 4.9 and trunk tail-merging is faced with

  <bb 3>:
  _11 = n_7->next;
  MEM[(struct head *)&heads][k.1_8].first = _11;
  goto <bb 5>;

  <bb 4>:
  _13 = n_7->next;
  _10->next = _13;

instead but I bet the issue is still there.

So it simply does (on the 4.8 branch):

    case GIMPLE_ASSIGN:
      lhs1 = gimple_get_lhs (s1);
      lhs2 = gimple_get_lhs (s2);
      if (TREE_CODE (lhs1) != SSA_NAME
          && TREE_CODE (lhs2) != SSA_NAME)
        return (vn_valueize (gimple_vdef (s1))
                == vn_valueize (gimple_vdef (s2)));

which shows that we value-number the VDEFs the same.

IMHO VDEF value-numbering is dangerous here.

I have a patch.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/21

------------------------------------------------------------------------
On 2014-07-31T12:19:46+00:00 Vries wrote:

Using this patch on the example from the description field, I can modify the example on the command line:
...
$ diff -u bug-orig.c bug-mod.c 
--- bug-orig.c	2014-07-31 14:00:50.663275717 +0200
+++ bug-mod.c	2014-07-31 14:01:49.403276412 +0200
@@ -11,7 +11,7 @@
   struct node *n = head->first;
   struct head *h = &heads[k];
 
-  if (n->prev == (void *)h)
+  if (FORCE n->prev == (void *)h)
     h->first = n->next;
   else
     n->prev->next = n->next;
...

1. -DFORCE="" gives the original
2. -DFORCE="1 ||" forces the condition to true
3. -DFORCE="0 &&" forces the confition to false

In this experiment, we don't use tree-tail-merge:
...
$ gcc -DFORCE="1 ||" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="1 ||" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="0 &&" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="0 &&" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
1
...

The last result seems to suggest that the example is not type-safe.

My understanding is that the problem is in the line:
  n->prev->next = n->next;
where we effectively do:
  /* ((struct node*)&heads[2])->next = node.next */
which is type-unsafe.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/22

------------------------------------------------------------------------
On 2014-07-31T12:24:06+00:00 Rguenther-suse wrote:

On Thu, 31 Jul 2014, vries at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61964
> 
> vries at gcc dot gnu.org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |vries at gcc dot gnu.org
> 
> --- Comment #8 from vries at gcc dot gnu.org ---
> Using this patch on the example from the description field, I can modify the
> example on the command line:
> ...
> $ diff -u bug-orig.c bug-mod.c 
> --- bug-orig.c    2014-07-31 14:00:50.663275717 +0200
> +++ bug-mod.c    2014-07-31 14:01:49.403276412 +0200
> @@ -11,7 +11,7 @@
>    struct node *n = head->first;
>    struct head *h = &heads[k];
> 
> -  if (n->prev == (void *)h)
> +  if (FORCE n->prev == (void *)h)
>      h->first = n->next;
>    else
>      n->prev->next = n->next;
> ...
> 
> 1. -DFORCE="" gives the original
> 2. -DFORCE="1 ||" forces the condition to true
> 3. -DFORCE="0 &&" forces the confition to false
> 
> In this experiment, we don't use tree-tail-merge:
> ...
> $ gcc -DFORCE="1 ||" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="1 ||" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="0 &&" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="0 &&" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 1
> ...
> 
> The last result seems to suggest that the example is not type-safe.
> 
> My understanding is that the problem is in the line:
>   n->prev->next = n->next;
> where we effectively do:
>   /* ((struct node*)&heads[2])->next = node.next */
> which is type-unsafe.

But that line is never executed at runtime (well, unless tail
merging comes along and makes it the only version present).

Richard.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/23

------------------------------------------------------------------------
On 2014-07-31T14:07:31+00:00 Rguenth wrote:

Author: rguenth
Date: Thu Jul 31 14:06:59 2014
New Revision: 213375

URL: https://gcc.gnu.org/viewcvs?rev=213375&root=gcc&view=rev
Log:
2014-07-31  Richard Biener  <rguenther@xxxxxxx>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_equal_p): Handle non-SSA LHS solely
	by structural equality.

        * gcc.dg/torture/pr61964.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-tail-merge.c

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/24

------------------------------------------------------------------------
On 2014-07-31T14:10:24+00:00 Rguenth wrote:

Fixed on trunk sofar.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/25

------------------------------------------------------------------------
On 2014-07-31T17:09:47+00:00 Vries wrote:

Created attachment 33220
Alternative patch

> But that line is never executed at runtime (well, unless tail
> merging comes along and makes it the only version present).

Ah, right, we consider a program with dead type-unsafe code valid.

This follow-up patch attempts to fix things less conservatively on
trunk. Shall I put this through testing or do you see a problem with
this approach?

Furthermore, I suspect that a similar issue exists for loads, I'll look
into that.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/31

------------------------------------------------------------------------
On 2014-08-01T07:32:07+00:00 Rguenth wrote:

(In reply to vries from comment #12)
> Created attachment 33220 [details]
> Alternative patch
> 
> > But that line is never executed at runtime (well, unless tail
> > merging comes along and makes it the only version present).
> 
> Ah, right, we consider a program with dead type-unsafe code valid.
> 
> This follow-up patch attempts to fix things less conservatively on trunk.
> Shall I put this through testing or do you see a problem with this approach?

Hum.  I don't like guarding optimizations with !flag_strict_aliasing, that is,
-fno-strict-aliasing shouldn't get us more optimization.

Also on trunk I'd like to rip out the use of the SCCVN lattice from
tail-merging as there FRE/PRE value-replace every SSA name which means
we don't need it.  The tight entanglement between PRE and tail-merge has
given me more headaches recently.

> Furthermore, I suspect that a similar issue exists for loads, I'll look into
> that.

I don't think so.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/34

------------------------------------------------------------------------
On 2014-08-01T07:36:48+00:00 Rguenth wrote:

Author: rguenth
Date: Fri Aug  1 07:36:16 2014
New Revision: 213404

URL: https://gcc.gnu.org/viewcvs?rev=213404&root=gcc&view=rev
Log:
2014-08-01  Richard Biener  <rguenther@xxxxxxx>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_equal_p): Handle non-SSA LHS solely
        by structural equality.

	* gcc.dg/torture/pr61964.c: New testcase.
	* gcc.dg/pr51879-18.c: XFAIL.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr51879-18.c
    branches/gcc-4_9-branch/gcc/tree-ssa-tail-merge.c

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/35

------------------------------------------------------------------------
On 2014-08-01T07:40:33+00:00 Rguenth wrote:

Author: rguenth
Date: Fri Aug  1 07:40:01 2014
New Revision: 213405

URL: https://gcc.gnu.org/viewcvs?rev=213405&root=gcc&view=rev
Log:
2014-08-01  Richard Biener  <rguenther@xxxxxxx>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_operand_equal_value_p): New
	function merged from trunk.
	(gimple_equal_p): Handle non-SSA LHS solely by structural
	equality.

	* gcc.dg/torture/pr61964.c: New testcase.
	* gcc.dg/pr51879-18.c: XFAIL.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr51879-18.c
    branches/gcc-4_8-branch/gcc/tree-ssa-tail-merge.c

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/36

------------------------------------------------------------------------
On 2014-08-01T08:17:18+00:00 Rguenth wrote:

Fixed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/38

------------------------------------------------------------------------
On 2014-08-01T08:18:37+00:00 Anders Kaseorg wrote:

Thanks.  I verified that GCC 4.8 r213405 fixes my test case and the
original Kerberos problem.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/39

------------------------------------------------------------------------
On 2014-08-04T00:24:57+00:00 Vries wrote:

(In reply to Richard Biener from comment #13)
> (In reply to vries from comment #12)
> > Furthermore, I suspect that a similar issue exists for loads, I'll look into
> > that.
> 
> I don't think so.

How about PR62004 ?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/krb5/+bug/1347147/comments/43


** Changed in: gcc
       Status: Unknown => Fix Released

** Changed in: gcc
   Importance: Unknown => High

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to gcc-4.8 in Ubuntu.
https://bugs.launchpad.net/bugs/1347147

Title:
  krb5 database operations enter infinite loop

Status in The GNU Compiler Collection:
  Fix Released
Status in Network Authentication System:
  Unknown
Status in “gcc-4.8” package in Ubuntu:
  Fix Released
Status in “gcc-4.9” package in Ubuntu:
  Fix Released
Status in “krb5” package in Ubuntu:
  Triaged
Status in “gcc-4.8” source package in Trusty:
  New
Status in “krb5” source package in Trusty:
  Triaged

Bug description:
  [Impact]

  On krb5 KDC databases with more than a few hundred principals,
  operations can enter an infinite loop in the database library.  This
  affects both read and write operations.  If operators are fortunate,
  they will encounter this bug while testing a migration.  If they are
  not so fortunate, they will encounter this bug in a production KDC
  when the number of principals crosses the threshold where this bug
  manifests, resulting in a service outage and possible database
  corruption.  Probably the only way to restore service in that
  situation is to install a patched KDC or to downgrade to an unaffected
  version.

  Both Trusty and Utopic amd64 have been verified to have this issue.

  One concrete reported example is an invocation of kdb5_util load (as
  part of a slave KDC propagation) spinning:

  http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html

  Additional failure modes are likely

  A branch is linked including the upstream work around for this bug,
  along with two other patches to bugs already nominated for trusty
  applied to the krb5 in trusty.

  For utopic, the simplest fix is to rebuild krb5 with the compiler
  currently in utopic.  An alternative is to request that the Debian
  maintainers (both monitoring this bug for such a request) upload the
  upstream work around to Debian and sync that.  You could do an ubuntu-
  specific upload but it seems undesirable to introduce a change between
  Ubuntu and Debian when all the right parties are happy to avoid it.

  The upstream patch works around a compiler optimizer bug in the
  gcc-4.8 series, which incorrectly deduces that a strict aliasing
  violation has occurred and miscompiles part of the bundled libdb2
  library that the KDC database back end depends upon.  The
  miscompilation causes a data structure to contain an inappropriate
  cycle, which leads to an infinite loop when the structure is
  traversed.

  [Test Case]

  apt-get install krb5-kdc krb5-admin-server
  kdb5_util -W -r T create -s
  awk 'BEGIN{ for (i = 0; i < 1024; i++) { printf("ank -randkey a%06d\n", i) } }' /dev/null | kadmin.local -r T

  (Enter any password for the master key when requested.)

  On platforms with this issue, kadmin.local spins consuming 100% CPU
  after a few hundred principals have been created.  (This is "a000762"
  on two examples.)

  To clean up,

  rm /etc/krb5kdc/principal*

  or

  krb5kdc -r T destroy

  but the latter can possibly enter the same infinite loop.

  [Regression Potential]

  Negligible.

  It is theoretically possible that our upstream workaround, which
  involves using TAILQ macros instead of CIRCLEQ macros in the bundled
  libdb2 that backs the KDC database, will have some as-yet undiscovered
  bugs or compiler interactions with consequences worse than this
  current issue.  I think this is rather unlikely.

  The patched libdb2 passes both the extensive libdb2 test suite and the
  rest of the krb5 test suite.  Prior to patching, compiling krb5 with
  an affected gcc would cause the krb5 test suite to stall when it
  reached the libdb2 test suite.  (The test suite stall is how we became
  aware of the gcc optimizer bug.)

  The BSD TAILQ macros are generally considered to be safer than the
  CIRCLEQ macros, and the various open-source BSD derivatives have made
  the corresponding change to their libdb sources years ago, with no
  reported ill effects that I can see.

  Original report from Ben Kaduk:

  ==========

  In some conditions, propagating a kerberos database to a slave KDC server can stall.
  This is due to a misoptimization by gcc 4.8 of the CIRCLEQ famliy of macros, apparently due to overzealous strict aliasing deductions.

  One case of this stall is reported at
  http://mailman.mit.edu/pipermail/kerberos/2014-July/020007.html (and
  the rest of the thread), and there is an entry in the upstream
  bugtracker at http://krbdev.mit.edu/rt/Ticket/Display.html?id=7860 .

  gcc 4.9 (as used in Debian unstable at present) is not believed to
  induce this problem.  Upstream has patched their code to use the TAILQ
  family of macros instead, as a workaround, but that workaround has not
  yet appeared in an upstream release:
  https://github.com/krb5/krb5/commit/26d8744129

  Because of the different compiler versions used on Debian and Ubuntu,
  I am filing this as an Ubuntu-specific bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/1347147/+subscriptions


References