← Back to team overview

openjdk team mailing list archive

[Bug 309407] [NEW] Strange openjdk hang in FUTEX_WAIT

 

Public bug reported:

Best way to reproduce:

1) Go to my ppa:  http://launchpad.net/~pktoss/+archive

2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download
to an intrepid machine

3) Start a build: either amd64/i386 will work :(
   - or when at home: cd eclipse-3.4.1 && debuild

4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)

5) Observe it hang with "Generate X" where X is between 1 and 5

In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...)  (as an strace will convince you)

A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)

Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.

You have to run the full "debuild" for the bug to appear :(

Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).

The bug has been reproduced in the following kernel/arch configurations:

       * 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
       * 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
       * 2.6.21/i386 (an EC2 node)
       * Whatever the autobuilders run / both i386 and amd64

In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.

The problem is of course that debian sid also has a different libc ;-)

Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.

So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.

Thanks

** Affects: openjdk-6 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
Strange openjdk hang in FUTEX_WAIT
https://bugs.launchpad.net/bugs/309407
You received this bug notification because you are a member of OpenJDK,
which is subscribed to openjdk-6 in ubuntu.

Status in “openjdk-6” source package in Ubuntu: New

Bug description:
Best way to reproduce:

1) Go to my ppa:  http://launchpad.net/~pktoss/+archive

2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download to an intrepid machine

3) Start a build: either amd64/i386 will work :(
   - or when at home: cd eclipse-3.4.1 && debuild

4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)

5) Observe it hang with "Generate X" where X is between 1 and 5

In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...)  (as an strace will convince you)

A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)

Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.

You have to run the full "debuild" for the bug to appear :(

Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).

The bug has been reproduced in the following kernel/arch configurations:

       * 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
       * 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
       * 2.6.21/i386 (an EC2 node)
       * Whatever the autobuilders run / both i386 and amd64

In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.

The problem is of course that debian sid also has a different libc ;-)

Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.

So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.

Thanks



Follow ups

References