openjdk team mailing list archive
-
openjdk team
-
Mailing list archive
-
Message #02656
[Bug 309407] Re: Strange openjdk hang in FUTEX_WAIT
I get the same problem with a FUTEX_WAIT hang when *starting* Eclipse
3.5 on my Debian amd64 squeeze/sid system (kernel 2.6.31.4, libc6
2.10.1-5, openjdk-6-jre 6b16-1.6.1-2). No fakeroot involved.
--
Strange openjdk hang in FUTEX_WAIT
https://bugs.launchpad.net/bugs/309407
You received this bug notification because you are a member of OpenJDK,
which is subscribed to openjdk-6 in ubuntu.
Status in “openjdk-6” package in Ubuntu: Invalid
Bug description:
Best way to reproduce:
1) Go to my ppa: http://launchpad.net/~pktoss/+archive
2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download to an intrepid machine
3) Start a build: either amd64/i386 will work :(
- or when at home: cd eclipse-3.4.1 && debuild
4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)
5) Observe it hang with "Generate X" where X is between 1 and 5
In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...) (as an strace will convince you)
A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)
Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.
You have to run the full "debuild" for the bug to appear :(
Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).
The bug has been reproduced in the following kernel/arch configurations:
* 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
* 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
* 2.6.21/i386 (an EC2 node)
* Whatever the autobuilders run / both i386 and amd64
In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.
The problem is of course that debian sid also has a different libc ;-)
Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.
So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.
Thanks
References