registry team mailing list archive
-
registry team
-
Mailing list archive
-
Message #15193
[Bug 507062] Re: synaptic assert failure: synaptic: ../../src/xcb_io.c:385: _XAllocID: Assertion `ret != inval_id' failed.
Launchpad has imported 12 comments from the remote bug at
http://bugs.freedesktop.org/show_bug.cgi?id=27552.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2010-04-09T06:02:34+00:00 Martin Pitt wrote:
libx11, 1.3.2 (sorry, I didn't see a libX11 component)
We are getting tons of crash reports in Ubuntu
(https://launchpad.net/bugs/507062) about programs crashing with
_XAllocID: Assertion `ret != inval_id' failed
At first we thought this would be the same problem that was recently
discussed and fixed in libXext:
http://lists.freedesktop.org/archives/xcb/2009-October/005102.html
http://cgit.freedesktop.org/xorg/lib/libXext/commit/?id=956fd30e1046e5779ac0b6c07ec4f0e87250869a
However, we already have that fixed version, and the stack traces of
above bug reports does not go through XShmAttach(), so it does seem to
be a different cause.
They all have this piece in common:
#4 0xb74d7199 in _XAllocID (dpy=0x8116770) at ../../src/xcb_io.c:378
ret = 4294967295
__PRETTY_FUNCTION__ = "_XAllocID"
#5 0xb74ad048 in XCreatePixmap (dpy=0x8116770, d=265, width=24, height=24,
depth=32) at ../../src/CrPixmap.c:58
i. e. they all come through XCreatePixmap() (which is called from
various functions in the duplicate bugs, like XcursorImageLoadCursor(),
_cairo_xlib_surface_create_similar_with_format(), etc.)
I checked that the current libX11's XCreatePixmap() already calls
_XAllocID() in a LockDisplay() block, so it's not the same cause as the
recent libXext fix.
Beyond that I'm afraid I don't know enough about this API to be able to
continue debugging on my own. Obviously nothing must call _XAllocID()
two times in succession without an _XIDHandler() in between (the only
other place where next_xid is set is _XConnectXCB(), but that's only
called on program initialization through XOpenDisplay(), right?)
Do you have some further hints how to debug this, or what could go wrong
here?
For reference, here are some links to the full stack traces:
http://launchpadlibrarian.net/23717282/Stacktrace.txt
http://launchpadlibrarian.net/35702381/Stacktrace.txt
http://launchpadlibrarian.net/39577843/Stacktrace.txt
http://launchpadlibrarian.net/37855566/Stacktrace.txt
Thanks!
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/11
------------------------------------------------------------------------
On 2010-04-09T17:35:03+00:00 Jamey-minilop wrote:
It would be really nice to know where the last call to XIDAlloc was
before the one that fails, and for that I'd love to have a minimal test
case.
Any chance these applications are using libX11 from more than one
thread? I notice in some stack traces, like
<http://launchpadlibrarian.net/31837770/ThreadStacktrace.txt> that a
second thread is waiting in libpulse; and in
<http://launchpadlibrarian.net/31032259/ThreadStacktrace.txt>,
g_main_loop_run is in a different thread than the cairo call that
precedes the assert.
I could believe there's a race in how Xlib asks XCB for XIDs, but I'd
like to know if that would be a plausible explanation for the reports
you're actually seeing. It looks like a fair number of the reported
stack traces are single-threaded, making that an unlikely explanation
for those reports. :-) It's possible there are two separate bugs here, I
suppose...
In lieu of a minimal test, perhaps you can tell me which X extensions
these applications have in common? Perhaps the client library for some
extension allocates an XID but is missing a SyncHandle call; if the next
X request also tries to allocate an XID you'd get this assertion
failure.
I've just reviewed everything in xorg/lib, just in case, for bad use of
SyncHandle, XAllocID, or XAllocIDs. I found quite a few things to
quibble with, but nothing that seems likely to have caused these
reports. I've pushed fixes in these modules, so you can check whether
anything there seems plausible.
libXcomposite/
libXdamage/
libXext/
libXfixes/
libXi/
libXrender/
libXTrap/
In short, I don't have a good hypothesis yet.
> http://launchpadlibrarian.net/23717282/Stacktrace.txt
I'm ignoring this one: the application was exiting, and a destructor
tried to access the Display to free a resource. I'd guess the display
was already gone at that point. I've seen that kind of pattern in quite
a few Qt stack traces.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/17
------------------------------------------------------------------------
On 2010-04-15T05:48:01+00:00 Martin Pitt wrote:
(In reply to comment #1)
> It would be really nice to know where the last call to XIDAlloc was before the
> one that fails, and for that I'd love to have a minimal test case.
Indeed; I'm afraid I can't say more about this myself, but I asked for a
reproduction recipe on the downstream bug a few days ago. Once we have a
way to reproduce this, we can strip that down to a test case.
> Any chance these applications are using libX11 from more than one
thread?
Some certainly do, but amongst the many duplicates are a lot of progams
which are single-threaded; for example services-admin (from gnome-
system-tools, LP#432122) or gigolo (LP#468544)
> It looks like a fair number of the reported stack traces are
> single-threaded, making that an unlikely explanation for those reports. :-)
> It's possible there are two separate bugs here, I suppose...
Possibly. It seems that about half of the reports are in gnome-panel and
firefox, but since they are also much more popular than the apps
mentioned above that kind of bias is expected.
> In lieu of a minimal test, perhaps you can tell me which X extensions these
> applications have in common? Perhaps the client library for some extension
> allocates an XID but is missing a SyncHandle call; if the next X request also
> tries to allocate an XID you'd get this assertion failure.
The bugs unfortunately don't have Xorg logs, but I don't think that they
will differ that much between different installations. I searched
Launchpad for a few Xorg logs for various drivers, and in fact for a
given graphics driver the extension lists were exactly identical (I took
samples of about 5 per driver):
common for intel, ati, fglrx, nvidia: MIT-SCREEN-SAVER
XFree86-VidModeExtension XFree86-DGA DPMS XVideo XVideo-
MotionCompensation X-Resource DOUBLE-BUFFER GLX RECORD XFree86-DRI DRI2
extra extensions for fglrx: ATIFGLRXDRI FGLRXEXTENSION GLESX AMDXVOPL (I
suppose those are uninteresting)
I asked in the downstram bug for a few Xorg logs from those people who
can reproduce the crash. Perhaps it will turn out that they are all
using the same graphics driver.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/19
------------------------------------------------------------------------
On 2010-04-15T09:58:31+00:00 Jamey-minilop wrote:
(In reply to comment #2)
> > In lieu of a minimal test, perhaps you can tell me which X extensions these
> > applications have in common? Perhaps the client library for some extension
> > allocates an XID but is missing a SyncHandle call; if the next X request also
> > tries to allocate an XID you'd get this assertion failure.
>
> I asked in the downstram bug for a few Xorg logs from those people who can
> reproduce the crash. Perhaps it will turn out that they are all using the same
> graphics driver.
I don't expect it to be a server-side issue, but then there was a recent
report that turned out to be a byte-swapping bug in the server, and I
didn't expect it then either... :-)
I meant, which extensions are these applications actually using? I'm
hoping to pin the blame on a client-side extension library rather than
libX11 itself. ;-) The list of loaded libraries from GDB would be a
start. Is there some Launchpad way to download all the attachments from
all the public duplicate bugs at once?
I don't have very high hopes though, especially after I went through
checking for SyncHandle bugs in every extension library I could find,
last week.
Regardless, I hope your effort to get more information from downstream
bug reporters works out.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/20
------------------------------------------------------------------------
On 2010-04-16T01:58:10+00:00 Martin Pitt wrote:
Just to collect more puzzle pieces, I got three replies by now about my
"which driver" question, and these three are all using the proprietary
NVIDIA driver. It might be just pure luck with a sample size of three,
of course, so I asked whether anyone who got this crash has a different
card/driver.
(In reply to comment #3)
> I meant, which extensions are these applications actually using? I'm hoping to
> pin the blame on a client-side extension library rather than libX11 itself. ;-)
> The list of loaded libraries from GDB would be a start.
We routinely collect /proc/<pid>/maps in crash reports, they all have a
"ProcMaps.txt" attachment. However, this just says which libraries the
program dynamically links against, not which extensions it's actually
using, right?
Is there a better way to see which extensions a program is currently
using? Some command that I could ask the bug reporters to run? We could
even add that to our standard Apport hooks, so that it's collected for
all crash reports.
> Is there some Launchpad way to download all the attachments from all
the public duplicate bugs at once?
Bryce Harrington (CC added) has put together a lot of scripts for these
purposes (X.org bug triaging). Bryce, do you have a script like that?
Anyway, it's really easy using the Python API, so for this particular
purpose I wrote a quick hack which downloads a particular attachment
name from a bug and all its duplicates and dumps them to stdout:
$ ./cat-bugattachment.py 507062 ProcMaps.txt
---- 507062 ---
00110000-001ae000 r-xp 00000000 08:05 526394 /usr/lib/libgdk-x11-2.0.so.0.1902.0
[...]
---- 340441 ---
00400000-00408000 r-xp 00000000 09:01 94936 /usr/lib/firefox-3.0.7/firefox
[...]
and so on. The script takes the master bug number and an attachment
name, I put it on http://people.canonical.com/~pitti/scripts/cat-
bugattachment.py for now.
When I do some simple post-processing of the output of above command
(which I directed into file "out") I get this:
$ grep lib/libX out | awk '{print $6}' | sort -u
/usr/lib/libX11.so.6.2.0
/usr/lib/libX11.so.6.3.0
/usr/lib/libXau.so.6.0.0
/usr/lib/libXcomposite.so.1.0.0
/usr/lib/libXcursor.so.1.0.2
/usr/lib/libXdamage.so.1.1.0
/usr/lib/libXdmcp.so.6.0.0
/usr/lib/libXext.so.6.4.0
/usr/lib/libXfixes.so.3.1.0
/usr/lib/libXft.so.2.1.13
/usr/lib/libXinerama.so.1.0.0
/usr/lib/libXi.so.6.0.0
/usr/lib/libXi.so.6.1.0
/usr/lib/libXmu.so.6.2.0
/usr/lib/libXrandr.so.2.2.0
/usr/lib/libXrender.so.1.3.0
/usr/lib/libXRes.so.1.0.0
/usr/lib/libXss.so.1.0.0
/usr/lib/libXt.so.6.0.0
/usr/lib/libXtst.so.6.1.0
/usr/lib/libXxf86misc.so.1.1.0
/usr/lib/libXxf86vm.so.1.0.0
Not sure how helpful that actually is, though, since those are still a
fair number of extensions.
Thank you!
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/26
------------------------------------------------------------------------
On 2010-04-16T02:09:29+00:00 Martin Pitt wrote:
(In reply to comment #4)
> Just to collect more puzzle pieces, I got three replies by now about my "which
> driver" question, and these three are all using the proprietary NVIDIA driver.
> It might be just pure luck with a sample size of three, of course, so I asked
> whether anyone who got this crash has a different card/driver.
Got another reply that someone is using Intel, so let's discard the
"driver dependent" theory for now.
> $ grep lib/libX out | awk '{print $6}' | sort -u
Ah, BS, sorry. We want the intersection, not the union.
$ sed -n '/^-/ p; /lib\/libX/ { s/^.*usr.lib.//; s/\.so\..*$//; p}'
out|uniq|less
By inspection, it seems that those bugs have only a very small set:
---- 528503 ---
libXfixes
libXrender
libXcursor
libXdmcp
libXau
libX11
---- 558407 ---
libXt
libXext
libX11
libXmu
libXau
libXdmcp
So I believe the common set is libXdmcp, libXau, and libX11 itself.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/27
------------------------------------------------------------------------
On 2010-06-21T09:13:45+00:00 Joris Guisson wrote:
The same problem has been reported for ktorrent:
https://bugs.kde.org/show_bug.cgi?id=242306
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/39
------------------------------------------------------------------------
On 2010-06-21T11:19:06+00:00 Jamey-minilop wrote:
(In reply to comment #6)
> The same problem has been reported for ktorrent:
>
> https://bugs.kde.org/show_bug.cgi?id=242306
In that bug report, you wrote, "Nothing we can do about, the X people
have to figure this one out." You could try providing information I've
asked for in this bug. Above all, if I have a minimal test application
that demonstrates the bug, I'm likely to be able to fix it quickly.
As things stand now, I can't reproduce the problem and I have no
reasonable hypotheses about its cause, so it is unlikely that I'll be
able to fix it.
One thing I hadn't asked for yet that might help: Run a failing
application under `xtrace` and attach the log to this bug.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/40
------------------------------------------------------------------------
On 2010-06-21T23:35:57+00:00 Pauli wrote:
Has anyone tried to run any of multi-threaded applications that are
crashing in hellgrind? That would give pretty fast conclusion if the bug
is about locking problems in those cases.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/41
------------------------------------------------------------------------
On 2010-08-12T03:47:00+00:00 Hules wrote:
I have a small application that manages to reproduce that crash quite
easily on ubuntu 10.04, but only when running under valgrind. But that
application itself behaves quite badly as it calls xlib from different
threads using the same "Display*" struct for all threads, so I'm not
sure if that might be of any use for tracking this issue. However I
think it calls XInitThreads() on startup, so that should allow it to
share the same display for all threads, isn't it ?
hellgrind outputs loads of stuff like:
==2950== Possible data race during write of size 4 at 0x44bd06c by thread #1
==2950== at 0x4084764: ??? (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x4069352: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x81E0F46: juce::LinuxComponentPeer::isMinimised() const (juce_linux_Windowing.cpp:902)
==2950== by 0x8164890: juce::ComponentPeer::handleMovedOrResized() (juce_amalgamated.cpp:71167)
==2950== by 0x81E296B: juce::LinuxComponentPeer::handleWindowMessage(_XEvent*) (juce_linux_Windowing.cpp:1579)
==2950== by 0x81E549F: juce::juce_windowMessageReceive(_XEvent*) (juce_linux_Windowing.cpp:2617)
==2950== by 0x81DE71F: juce::juce_dispatchNextXEvent() (juce_linux_Messaging.cpp:384)
==2950== by 0x81DE899: juce::juce_dispatchNextMessageOnSystemQueue(bool) (juce_linux_Messaging.cpp:431)
==2950== by 0x8137170: juce::MessageManager::dispatchNextMessage(bool, bool*) (juce_amalgamated.cpp:34719)
==2950== by 0x8137204: juce::MessageManager::runDispatchLoop() (juce_amalgamated.cpp:34763)
==2950== by 0x81348A1: juce::JUCEApplication::main(juce::StringArray&, juce::JUCEApplication*) (juce_amalgamated.cpp:14770)
==2950== by 0x8134A7B: juce::JUCEApplication::main(int, char**, juce::JUCEApplication*) (juce_amalgamated.cpp:14859)
==2950== This conflicts with a previous read of size 4 by thread #6
==2950== at 0x4082F0C: _XSetPrivSyncFunction (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x408BBEF: _XAllocID (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x40611A7: XCreatePixmap (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x50BF8ED: XcursorImageLoadCursor (in /usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C0765: XcursorImagesLoadCursors (in /usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C0886: XcursorImagesLoadCursor (in /usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C4187: XcursorTryShapeCursor (in /usr/lib/libXcursor.so.1.0.2)
==2950== by 0x4060C5C: XCreateGlyphCursor (in /usr/lib/libX11.so.6.3.0)
before the failed assertion.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/44
------------------------------------------------------------------------
On 2010-08-12T09:10:59+00:00 Jamey-minilop wrote:
(In reply to comment #9)
> I have a small application that manages to reproduce that crash quite easily on
> ubuntu 10.04, but only when running under valgrind.
Cool, what's the application? Can I get the source somewhere? I see it's
using a toolkit (juce) that I hadn't heard of before though, so I'm not
sure it's close to a minimal test case yet...
> hellgrind outputs loads of stuff like:
>
> ==2950== Possible data race during write of size 4 at 0x44bd06c by thread #1
> ==2950== at 0x4084764: ??? (in /usr/lib/libX11.so.6.3.0)
> ==2950== by 0x4069352: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0)
> ...
> ==2950== This conflicts with a previous read of size 4 by thread #6
> ==2950== at 0x4082F0C: _XSetPrivSyncFunction (in /usr/lib/libX11.so.6.3.0)
> ==2950== by 0x408BBEF: _XAllocID (in /usr/lib/libX11.so.6.3.0)
> ==2950== by 0x40611A7: XCreatePixmap (in /usr/lib/libX11.so.6.3.0)
> ...
>
> before the failed assertion.
I'm guessing that's the bug we fixed in git commit
a6d974dc59f2722b36e2df9d4f07aeee4f83ce43, which is included in libX11
1.3.4 and later. What version of libX11 do you have?
> But that application itself
> behaves quite badly as it calls xlib from different threads using the same
> "Display*" struct for all threads, so I'm not sure if that might be of any use
> for tracking this issue. However I think it calls XInitThreads() on startup, so
> that should allow it to share the same display for all threads, isn't it ?
Yes, that's supposed to be safe. However, I introduced a variety of bugs
for multi-threaded apps some time ago, and only got most of them fixed
in libX11 1.3.4. You should definitely re-test with that version or
later if you don't already have it.
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/45
------------------------------------------------------------------------
On 2010-08-13T02:52:51+00:00 Hules wrote:
I have put the sources for this test here:
http://hules.free.fr/xcbbug.tbz2
JUCE is a small cross-platform toolkit, but this is definitively not a
minimal test case as there are ~250000 lines of code.. The test
application is not using it properly with respect to multithreading so
if you run it in helgrind it will spit a lot of warnings. But when run
in valgrind or helgrind, it ends up 75% of time with the _XallocID
assert failed.
I have the libx11 of ubuntu 10.04, that is libx11 1.3.2. I'm not sure to
proceed to test 1.3.4
Reply at: https://bugs.launchpad.net/xlibs/+bug/507062/comments/46
** Changed in: xlibs
Status: Unknown => Confirmed
** Changed in: xlibs
Importance: Unknown => Medium
** Bug watch added: KDE Bug Tracking System #242306
http://bugs.kde.org/show_bug.cgi?id=242306
--
synaptic assert failure: synaptic: ../../src/xcb_io.c:385: _XAllocID: Assertion `ret != inval_id' failed.
https://bugs.launchpad.net/bugs/507062
You received this bug notification because you are a member of Registry
Administrators, which is the registrant for xlibs.