← Back to team overview

openjdk team mailing list archive

[Bug 309407] Re: Strange openjdk hang in FUTEX_WAIT

 

Updating fakeroot to latest debian sid version (it is in my ppa) seems
to fix the problem. So the bug should probably be closed as invalid,
since openjdk is not actually at fault (?)

-- 
Strange openjdk hang in FUTEX_WAIT
https://bugs.launchpad.net/bugs/309407
You received this bug notification because you are a member of OpenJDK,
which is subscribed to openjdk-6 in ubuntu.

Status in “openjdk-6” source package in Ubuntu: New

Bug description:
Best way to reproduce:

1) Go to my ppa:  http://launchpad.net/~pktoss/+archive

2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download to an intrepid machine

3) Start a build: either amd64/i386 will work :(
   - or when at home: cd eclipse-3.4.1 && debuild

4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)

5) Observe it hang with "Generate X" where X is between 1 and 5

In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...)  (as an strace will convince you)

A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)

Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.

You have to run the full "debuild" for the bug to appear :(

Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).

The bug has been reproduced in the following kernel/arch configurations:

       * 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
       * 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
       * 2.6.21/i386 (an EC2 node)
       * Whatever the autobuilders run / both i386 and amd64

In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.

The problem is of course that debian sid also has a different libc ;-)

Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.

So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.

Thanks



References