← Back to team overview

openjdk team mailing list archive

[Bug 309407] Re: Strange openjdk hang in FUTEX_WAIT

 

I get the same problem with a FUTEX_WAIT hang when *starting* Eclipse
3.5 on my Debian amd64 squeeze/sid system (kernel 2.6.31.4, libc6
2.10.1-5, openjdk-6-jre 6b16-1.6.1-2). No fakeroot involved.

-- 
Strange openjdk hang in FUTEX_WAIT
https://bugs.launchpad.net/bugs/309407
You received this bug notification because you are a member of OpenJDK,
which is subscribed to openjdk-6 in ubuntu.

Status in “openjdk-6” package in Ubuntu: Invalid

Bug description:
Best way to reproduce:

1) Go to my ppa:  http://launchpad.net/~pktoss/+archive

2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download to an intrepid machine

3) Start a build: either amd64/i386 will work :(
   - or when at home: cd eclipse-3.4.1 && debuild

4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)

5) Observe it hang with "Generate X" where X is between 1 and 5

In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...)  (as an strace will convince you)

A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)

Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.

You have to run the full "debuild" for the bug to appear :(

Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).

The bug has been reproduced in the following kernel/arch configurations:

       * 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
       * 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
       * 2.6.21/i386 (an EC2 node)
       * Whatever the autobuilders run / both i386 and amd64

In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.

The problem is of course that debian sid also has a different libc ;-)

Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.

So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.

Thanks



References