← Back to team overview

ubuntu-x-swat team mailing list archive

[Bug 567696] Re: [mi] EQ overflowing. The server is probably stuck in an infinite loop

 

I've been focusing on recreating this bug to capture a full stack
backtrace from X over a remote serial gdb session.

The target (system exhibiting the bug) and host (system running the debugger) are connected via a serial-to-USB cable.
(I was also doing it via a network connection but after several crashes where the host's gdb lost contact with the target, I decided to try serial since the hardware interface is much simpler and hopefully less likely to fail if the kernel becomes unstable.)

The steps to reproduce this debugging session are:

On the host, install the debug symbol libraries for the Xorg core and
the video driver used on the faulting system.

You may need to install Debug Symbol packages (ddeb) from the Ubuntu
ddeb archive for some packages. See
https://wiki.ubuntu.com/DebuggingProgramCrash for help on working with
debug symbol packages.

In my case the faulting system uses the Radeon driver, so:

sudo apt-get install xserver-xorg-core-dbg xserver-xorg-video-radeon-dbg

In addition you'll likely need the -dbgsym packages for the installed
version of libdrm2 and libc6. Use the wiki article referred to above to
identify the package version to be installed. In this case I needed:

sudo apt-get install libdrm2-dbgsym=2.4.18-1ubuntu3
libc6-i686-dbgsym=2.11.1-0ubuntu7.8


Unfortunately, libdrm2-dbgsym object files are missing the symbol tables so gdb does not show the symbols for functions in that library:
-----
objdump -t /usr/lib/debug/lib/libdrm.so.2.4.0

/usr/lib/debug/lib/libdrm.so.2.4.0:     file format elf32-i386

SYMBOL TABLE:
no symbols

(gdb) symbol /lib/libdrm.so.2
Load new symbol table from "/lib/libdrm.so.2"? (y or n) y
Reading symbols from /lib/libdrm.so.2...Load new symbol table from "/usr/lib/debug/lib/libdrm.so.2.4.0"? (y or n) y
Reading symbols from /usr/lib/debug/lib/libdrm.so.2.4.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
-----

On the target install gdbserver which will act as a proxy for gdb
running on the host:

sudo apt-get install gdbserver

On the target configure the serial port baud rate (setserial doesn't
appear to do it using baud_base so I use screen). I do this via an SSH
session started from the host:

ssh A7M266D

screen /dev/ttyS0 115200,cs8,-ixon,-ixon,istrip
# ... press Ctrl+A then : to get the screen command prompt in the lower-left corner, then type "quit" to exit screen

>From the SSH session, on the target start the gdbserver:

sudo gdbserver /dev/ttyS0 --attach $(pidof X)


On the host start gdb and once started connect to the target:

gdb /usr/bin/Xorg
...
(gdb) target remote /dev/ttyUSB0
Remote debugging using /dev/ttyUSB0
Reading symbols from /lib/libudev.so.0...(no debugging symbols found)...done.
...
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
0x00641422 in __kernel_vsyscall ()


At this point gdb has stopped the X server process so it needs to be unfrozen (continued):

(gdb) cont
Continuing.


Now log-in as normal on the target system and start using applications that will trigger the fault.

As anything using 3D acceleration seems to be a candidate you may find
that Firefox web browsing is sufficient. We also found that Open Office
Draw will do it - just create some random objects/lines/fills.

Once the fault occurs, on the host gdb session, interrupt the target by
pressing Ctrl+C:

^C
Program received signal SIGINT, Interrupt.
0x00641422 in __kernel_vsyscall ()


Now gdb has control, review the call stack:

(gdb) bt
#0  0x00641422 in __kernel_vsyscall ()
#1  0x004ef619 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#2  0x00f9dd0d in drmDMA () from /lib/libdrm.so.2
#3  0x008df525 in RADEONCPGetBuffer (pScrn=0x9b82d60) at ../../src/radeon_accel.c:696
#4  0x009701fa in Emit2DStateCP (pScrn=0x9b82d60, op=1) at ../../src/radeon_exa_funcs.c:94
#5  0x00970512 in RADEONPrepareCopyCP (pSrc=0xadbb2008, pDst=0xad22e008, xdir=1, ydir=1, rop=3, planemask=4294967295)
    at ../../src/radeon_exa_funcs.c:316
#6  0x002a83e8 in exaHWCopyNtoN (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, pbox=0xbff7f1e4, nbox=1, dx=0, dy=0, 
    reverse=0, upsidedown=0) at ../../exa/exa_accel.c:488
#7  0x002a86d0 in exaCopyNtoN (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, pbox=0xbff7f1e4, nbox=1, dx=0, dy=0, 
    reverse=0, upsidedown=0, bitplane=0, closure=0x0) at ../../exa/exa_accel.c:577
#8  0x0819cc8b in miCopyRegion (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, pDstRegion=0xbff7f1e4, dx=0, dy=0, 
    copyProc=0x2a8630 <exaCopyNtoN>, bitPlane=0, closure=0x0) at ../../mi/micopy.c:138
#9  0x0819d1ad in miDoCopy (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, xIn=121, yIn=2, widthSrc=353, heightSrc=353, 
    xOut=121, yOut=2, copyProc=0x2a8630 <exaCopyNtoN>, bitPlane=0, closure=0x0) at ../../mi/micopy.c:338
#10 0x002a7a6f in exaCopyArea (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, srcx=121, srcy=2, width=353, height=353, 
    dstx=121, dsty=2) at ../../exa/exa_accel.c:601
#11 0x08122b73 in damageCopyArea (pSrc=0xadbb2008, pDst=0xad22e008, pGC=0x9fb1ff8, srcx=121, srcy=2, width=353, height=353, dstx=121, dsty=2)
    at ../../../miext/damage/damage.c:949
#12 0x08070df5 in ProcCopyArea (client=0x9f16058) at ../../dix/dispatch.c:1725
#13 0x08072477 in Dispatch () at ../../dix/dispatch.c:439
#14 0x08066d7a in main (argc=9, argv=0xbff7f4c4, envp=0xbff7f4ec) at ../../dix/main.c:285

Note that frame #2 isn't showing the debug info for drmDMA() even if the
symbol table is manually loaded since the dbgsym package library has
been built without the symbol table.

To see the arguments passed to the functions use the "full" option:

(gdb) bt full
#0  0x00641422 in __kernel_vsyscall ()
No symbol table info available.
#1  0x004ef619 in ioctl () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x00f9dd0d in drmDMA () from /lib/libdrm.so.2
No symbol table info available.
#3  0x008df525 in RADEONCPGetBuffer (pScrn=0x9b82d60) at ../../src/radeon_accel.c:696
        info = 0x9b83608
        dma = {context = 1, send_count = 0, send_list = 0x0, send_sizes = 0x0, flags = 0, request_count = 1, request_size = 65536, 
          request_list = 0xbff7eeac, request_sizes = 0xbff7eea8, granted_count = 0}
        buf = <value optimised out>
        indx = 0
        size = 0
        i = 1162
        ret = <value optimised out>
        __FUNCTION__ = "RADEONCPGetBuffer"
#4  0x009701fa in Emit2DStateCP (pScrn=0x9b82d60, op=1) at ../../src/radeon_exa_funcs.c:94
        info = 0x9b83608
        has_src = -1074270392
        __head = 0x0
        __expected = -1074269396
        __count = 0
        __func__ = "Emit2DStateCP"
#5  0x00970512 in RADEONPrepareCopyCP (pSrc=0xadbb2008, pDst=0xad22e008, xdir=1, ydir=1, rop=3, planemask=4294967295)
    at ../../src/radeon_exa_funcs.c:316
        pScrn = 0x9b82d60
        datatype = 6
        src_pitch_offset = 167037872
        dst_pitch_offset = 167036868
#6  0x002a83e8 in exaHWCopyNtoN (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, pbox=0xbff7f1e4, nbox=1, dx=0, dy=0, 
    reverse=0, upsidedown=0) at ../../exa/exa_accel.c:488
        pSrcPixmap = 0xadbb2008
        pDstPixmap = 0xad22e008
        pSrcExaPixmap = 0x9f627a0
        src_off_x = <value optimised out>
        src_off_y = <value optimised out>
        dst_off_x = 0
        dst_off_y = 0
        srcregion = 0x9f976e0
        dstregion = 0x9fb5768
        ret = <value optimised out>
#7  0x002a86d0 in exaCopyNtoN (pSrcDrawable=0xadbb2008, pDstDrawable=0xad22e008, pGC=0x9fb1ff8, pbox=0xbff7f1e4, nbox=1, dx=0, dy=0, 
    reverse=0, upsidedown=0, bitplane=0, closure=0x0) at ../../exa/exa_accel.c:577
No locals.
#8  0x0819cc8b in ?? ()
No symbol table info available.
#9  0x0819d1ad in ?? ()
No symbol table info available.
#10 0x002a7a6f in exaCopyArea (pSrcDrawable=0xadbb2008, pDstDrawable=0xadbb2008, pGC=0x9fb1ff8, srcx=121, srcy=2, width=353, height=353, 
    dstx=1, dsty=2) at ../../exa/exa_accel.c:601
No locals.
#11 0x08122b73 in ?? ()
No symbol table info available.
#12 0x08070df5 in ?? ()
No symbol table info available.
#13 0x08072477 in ?? ()
No symbol table info available.
#14 0x08066d7a in ?? ()
No symbol table info available.
#15 0x00440bd6 in __libc_start_main (main=0x8066a00, argc=9, ubp_av=0xbff7f4c4, init=0x81c7720, fini=0x81c7710, rtld_fini=0xf56030, 
    stack_end=0xbff7f4bc) at libc-start.c:226
        result = <value optimised out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {5767156, 0, 0, -1074269032, -1941077759, -340593026}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x9, 0x8066940}, data = {prev = 0x0, cleanup = 0x0, canceltype = 9}}}
        not_first_call = <value optimised out>
#16 0x08066961 in ?? ()
No symbol table info available.


Searching for info on RADEONPrepareCopyCP() leads to https://bugs.freedesktop.org/show_bug.cgi?id=27957 which in comment #6 has a patch that has solved the issue for the R600 series.

The report talks about a buffer leak. However, the code for the R200
(radeon_accel.c) in this fault is different.

Examining the source I see there is, in RADEONPrepareCopyCP() in
"src/radeon_accel.c", a loop that could get stuck in an infinite
condition if it never reaches the "return buf;" statement (which would
account for the 100% CPU usage of the /usr/bin/X process):

while (1) {
	do {
         ...

See: http://cgit.freedesktop.org/xorg/driver/xf86-video-
ati/tree/src/radeon_accel.c?id=801e83227a59a29eea425ea612083bbf2b536c30#n708

I intend digging into this further as time permits.

** Bug watch added: freedesktop.org Bugzilla #27957
   http://bugs.freedesktop.org/show_bug.cgi?id=27957

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/567696

Title:
  [mi] EQ overflowing. The server is probably stuck in an infinite loop

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/567696/+subscriptions