← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1630302] Re: Multi-threaded luaJIT application hangs; apparent deadlock in GLIBC

 

As I understand it this is fixed upstream in glibc 2.24, which means
Ubuntu 16.10 already has the fix; please reopen this task if this is
incorrect.

** Changed in: glibc (Ubuntu)
       Status: New => Fix Released

** Also affects: glibc (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: glibc (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: glibc (Ubuntu Xenial)
       Status: New => Triaged

** Changed in: glibc (Ubuntu Xenial)
     Assignee: (unassigned) => Adam Conrad (adconrad)

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1630302

Title:
  Multi-threaded luaJIT application hangs; apparent deadlock in GLIBC

Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Triaged

Bug description:
  ---Problem Description---
  Multi-threaded luaJIT application hangs due to apparent deadlock in GLIBC.
    
  ---uname output---
  Linux p10a102 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
    
  ---Steps to Reproduce---
   Build luaJIT + Torch and run the following lua program:

  local Threads = require 'threads'
  nthreads = 8
  thrds = Threads(nthreads,
           function()  print('Starting thread ') end,
           function()  require 'image' end
        );
   
  thrds:synchronize()
  print "Done"
   
  Userspace tool common name: GLIBC 
   
  The userspace tool has the following bit modes: 64-bit 

  Userspace package: GLIBC 2.23

  Userspace tool obtained from project website:  https://github.com/PPC64/torch-distro.git   ecb487a3e807c3bfc901cab0b9e8767a853e085d 
   
  Here's a sample run of the lua application and stack backtraces for all the threads.  You can all the worker threads are in GLIBC's __lll_lock_wait_private() waiting on various locks. They're traversing various paths to their lock waits--some threads in IO paths, some in allocation paths, and some in dlsym().

  Problem is easily recreatable, and is more likely to strike as the
  number of threads grows.

  I can provide core file, or package with the luajit/torch binaries,
  etc. as needed.

  $ which luajit
  /opt/DL/torch/bin/luajit

  $ luajit -v
  LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

  $ cat t.lua
  local Threads = require 'threads'
  nthreads = 8
  thrds = Threads(nthreads,
           function()  print('Starting thread ') end,
           function()  require 'image' end
        );
   
  thrds:synchronize()
  print "Done"

  $ gdb /opt/DL/torch/bin/luajit
  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
  [...]

  (gdb) run t.lua
  Starting program: /opt/DL/torch/bin/luajit t.lua
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
  [New Thread 0x3ffd3693f1a0 (LWP 22251)]
  [New Thread 0x3ffd3611f1a0 (LWP 22252)]
  [New Thread 0x3ffd358ff1a0 (LWP 22253)]
  [New Thread 0x3ffd350df1a0 (LWP 22254)]
  [New Thread 0x3ffd348bf1a0 (LWP 22255)]
  [New Thread 0x3ffd27fff1a0 (LWP 22256)]
  [New Thread 0x3ffd277ff1a0 (LWP 22257)]
  [New Thread 0x3ffd26fff1a0 (LWP 22258)]
  Starting thread 
  Starting thread 
  Starting thread 
  Starting thread 
  Starting thread 
  Starting thread 
  Starting thread 
  Starting thread 

  ^C

  Thread 1 "luajit" received signal SIGINT, Interrupt.
  0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_cond_wait.c:186
  186	pthread_cond_wait.c: No such file or directory.

  (gdb) info threads
    Id   Target Id         Frame 
  * 1    Thread 0x3fffb7ff68a0 (LWP 22248) "luajit" 0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_cond_wait.c:186
    2    Thread 0x3ffd3693f1a0 (LWP 22251) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    3    Thread 0x3ffd3611f1a0 (LWP 22252) "luajit" 0x00003fffb7e15fa8 in __lll_lock_wait (futex=0x200, private=<optimized out>) at lowlevellock.c:46
    4    Thread 0x3ffd358ff1a0 (LWP 22253) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    5    Thread 0x3ffd350df1a0 (LWP 22254) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    6    Thread 0x3ffd348bf1a0 (LWP 22255) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    7    Thread 0x3ffd27fff1a0 (LWP 22256) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    8    Thread 0x3ffd277ff1a0 (LWP 22257) "luajit" 0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
    9    Thread 0x3ffd26fff1a0 (LWP 22258) "luajit" 0x00003fffb7d2c408 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:33

  (gdb) thread apply all where

  Thread 9 (Thread 0x3ffd26fff1a0 (LWP 22258)):
  #0  0x00003fffb7d2c408 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:33
  #1  0x00003fffb7c7e2d4 in _IO_flush_all_lockp (do_lock=<optimized out>) at genops.c:777
  #2  0x00003fffb7c7e63c in __GI__IO_flush_all () at genops.c:817
  #3  0x00003fffb7c687a4 in __GI__IO_fflush (fp=<optimized out>) at iofflush.c:34
  [...]
  #12 0x00000000100460f8 in lua_pcall ()
  #13 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #14 0x00003fffb7e084a0 in start_thread (arg=0x3ffd26fff1a0) at pthread_create.c:335
  #15 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 8 (Thread 0x3ffd277ff1a0 (LWP 22257)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c87c28 in malloc_atfork (sz=65536, caller=<optimized out>) at arena.c:179
  #2  0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
  #3  0x00003fffb7c67e8c in __GI__IO_file_doallocate (fp=0x3ffd180037c0) at filedoalloc.c:127
  #4  0x00003fffb7c7ce74 in __GI__IO_doallocbuf (fp=0x3ffd180037c0) at genops.c:398
  #5  0x00003fffb7c7b77c in _IO_new_file_underflow (fp=0x3ffd180037c0) at fileops.c:556
  #6  0x00003fffb7c7d2c4 in __GI___underflow (fp=0x3ffd180037c0) at genops.c:342
  #7  __GI__IO_default_xsgetn (fp=0x3ffd180037c0, data=<optimized out>, n=8192) at genops.c:504
  #8  0x00003fffb7c7d168 in __GI__IO_sgetn (fp=<optimized out>, data=<optimized out>, n=<optimized out>) at genops.c:467
  #9  0x00003fffb7c696f4 in __GI__IO_fread (buf=0x3ffd2602af10, size=1, count=8192, fp=0x3ffd180037c0) at iofread.c:38
  [...]
  #18 0x00000000100460f8 in lua_pcall ()
  #19 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #20 0x00003fffb7e084a0 in start_thread (arg=0x3ffd277ff1a0) at pthread_create.c:335
  #21 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 7 (Thread 0x3ffd27fff1a0 (LWP 22256)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c800e8 in ptmalloc_lock_all () at arena.c:235
  #2  0x00003fffb7ccea44 in __libc_fork () at ../sysdeps/nptl/fork.c:90
  #3  0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd140037c0, command=0x3ffd2628f100 "which lua 2>&1", mode=<optimized out>) at iopopen.c:180
  #4  0x00003fffb7c6b4f8 in _IO_new_popen (command=0x3ffd2628f100 "which lua 2>&1", mode=0x100606d8 "r") at iopopen.c:296
  [...]
  #13 0x00000000100460f8 in lua_pcall ()
  #14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd27fff1a0) at pthread_create.c:335
  #16 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 6 (Thread 0x3ffd348bf1a0 (LWP 22255)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c87c28 in malloc_atfork (sz=63, caller=<optimized out>) at arena.c:179
  #2  0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
  #3  0x00003fffb7fbb39c in _dl_signal_error (errcode=0, objname=0x3ffffffffa43 "/opt/DL/torch/bin/luajit", occation=0x3fffb7fd0780 "symbol lookup error", 
      errstring=0x3ffd348bd230 "undefined symbol: luaJIT_BC_sys_fpath") at dl-error.c:90
  #4  0x00003fffb7fbb640 in _dl_signal_cerror (errcode=<optimized out>, objname=0x3ffffffffa43 "/opt/DL/torch/bin/luajit", occation=0x3fffb7fd0780 "symbol lookup error", 
      errstring=0x3ffd348bd230 "undefined symbol: luaJIT_BC_sys_fpath") at dl-error.c:155
  #5  0x00003fffb7fb424c in _dl_lookup_symbol_x (undef_name=<optimized out>, undef_map=<optimized out>, ref=0x3ffd348bd730, symbol_scope=0x3fffb7ff14a8, version=<optimized out>, 
      type_class=<optimized out>, flags=<optimized out>, skip_map=<optimized out>) at dl-lookup.c:871
  #6  0x00003fffb7d6fd34 in call_dl_lookup (ptr=0x3ffd348bd6f0) at dl-sym.c:79
  #7  0x00003fffb7fbb6e8 in _dl_catch_error (objname=0x3ffd348bd728, errstring=0x3ffd348bd720, mallocedp=0x3ffd348bd738, operate=0x3fffb7d6fce0 <call_dl_lookup>, args=0x3ffd348bd6f0)
      at dl-error.c:187
  #8  0x00003fffb7d701ac in do_sym (handle=0x0, name=0x3ffd260f9470 "luaJIT_BC_sys_fpath", who=<optimized out>, vers=0x0, flags=<optimized out>) at dl-sym.c:126
  #9  0x00003fffb7f3138c in dlsym_doit (a=0x3ffd348bdb50) at dlsym.c:50
  #10 0x00003fffb7fbb6e8 in _dl_catch_error (objname=0x3ffd1c0008d0, errstring=0x3ffd1c0008d8, mallocedp=0x3ffd1c0008c8, operate=0x3fffb7f31360 <dlsym_doit>, args=0x3ffd348bdb50)
      at dl-error.c:187
  #11 0x00003fffb7f31cc8 in _dlerror_run (operate=0x3fffb7f31360 <dlsym_doit>, args=0x3ffd348bdb50) at dlerror.c:163
  #12 0x00003fffb7f31438 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70
  [...]
  #23 0x00000000100460f8 in lua_pcall ()
  #24 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #25 0x00003fffb7e084a0 in start_thread (arg=0x3ffd348bf1a0) at pthread_create.c:335
  #26 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 5 (Thread 0x3ffd350df1a0 (LWP 22254)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c7f194 in __GI__IO_list_lock () at genops.c:1210
  #2  0x00003fffb7ccea70 in __libc_fork () at ../sysdeps/nptl/fork.c:112
  #3  0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd200037c0, command=0x3ffd260dd328 "uname -a 2>&1", mode=<optimized out>) at iopopen.c:180
  #4  0x00003fffb7c6b4f8 in _IO_new_popen (command=0x3ffd260dd328 "uname -a 2>&1", mode=0x100606d8 "r") at iopopen.c:296
  [...]
  #13 0x00000000100460f8 in lua_pcall ()
  #14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd350df1a0) at pthread_create.c:335
  #16 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 4 (Thread 0x3ffd358ff1a0 (LWP 22253)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c800e8 in ptmalloc_lock_all () at arena.c:235
  #2  0x00003fffb7ccea44 in __libc_fork () at ../sysdeps/nptl/fork.c:90
  #3  0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd2c0037c0, command=0x3ffd260bc4c8 "uname -a 2>&1", mode=<optimized out>) at iopopen.c:180
  #4  0x00003fffb7c6b4f8 in _IO_new_popen (command=0x3ffd260bc4c8 "uname -a 2>&1", mode=0x100606d8 "r") at iopopen.c:296
  [...]
  #13 0x00000000100460f8 in lua_pcall ()
  #14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd358ff1a0) at pthread_create.c:335
  #16 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 3 (Thread 0x3ffd3611f1a0 (LWP 22252)):
  #0  0x00003fffb7e15fa8 in __lll_lock_wait (futex=0x200, private=<optimized out>) at lowlevellock.c:46
  #1  0x00003fffb7e0bdec in __GI___pthread_mutex_lock (mutex=<optimized out>) at ../nptl/pthread_mutex_lock.c:115
  #2  0x00003fffb7f31424 in __dlsym (handle=0x0, name=<optimized out>) at dlsym.c:68
  [...]
  #11 0x00000000100460f8 in lua_pcall ()
  #12 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #13 0x00003fffb7e084a0 in start_thread (arg=0x3ffd3611f1a0) at pthread_create.c:335
  #14 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 2 (Thread 0x3ffd3693f1a0 (LWP 22251)):
  #0  0x00003fffb7d2c460 in __lll_lock_wait_private (futex=0x200) at ./lowlevellock.c:30
  #1  0x00003fffb7c87c28 in malloc_atfork (sz=65536, caller=<optimized out>) at arena.c:179
  #2  0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
  #3  0x00003fffb7c67e8c in __GI__IO_file_doallocate (fp=0x3ffd30003bd0) at filedoalloc.c:127
  #4  0x00003fffb7c7ce74 in __GI__IO_doallocbuf (fp=0x3ffd30003bd0) at genops.c:398
  #5  0x00003fffb7c7b77c in _IO_new_file_underflow (fp=0x3ffd30003bd0) at fileops.c:556
  #6  0x00003fffb7c7d2c4 in __GI___underflow (fp=0x3ffd30003bd0) at genops.c:342
  #7  __GI__IO_default_xsgetn (fp=0x3ffd30003bd0, data=<optimized out>, n=8192) at genops.c:504
  #8  0x00003fffb7c7d168 in __GI__IO_sgetn (fp=<optimized out>, data=<optimized out>, n=<optimized out>) at genops.c:467
  #9  0x00003fffb7c696f4 in __GI__IO_fread (buf=0x3ffd2617b358, size=1, count=8192, fp=0x3ffd30003bd0) at iofread.c:38
  [...]
  #18 0x00000000100460f8 in lua_pcall ()
  #19 0x00003ffd36940ad8 in THThread_main () from /opt/DL/torch/lib/libthreadsmain.so
  #20 0x00003fffb7e084a0 in start_thread (arg=0x3ffd3693f1a0) at pthread_create.c:335
  #21 0x00003fffb7d17e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

  Thread 1 (Thread 0x3fffb7ff68a0 (LWP 22248)):
  #0  0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_cond_wait.c:186
  #1  0x00003fffb7bc5308 in THCondition_wait () from /opt/DL/torch/lib/lua/5.1/libthreads.so
  #2  0x00003fffb7bc269c in ?? () from /opt/DL/torch/lib/lua/5.1/libthreads.so
  #3  0x000000001005a930 in ?? ()
  #4  0x00000000100460f8 in lua_pcall ()
  #5  0x0000000010006884 in ?? ()
  #6  0x000000001005a930 in ?? ()
  #7  0x00000000100461f0 in lua_cpcall ()
  #8  0x00000000100041a8 in main ()

  
  This is caused by glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=19431.

  Ubuntu 16.04 is missing the following patches (already backported to
  release/2.23/master):

  commit 888d9a0146b4b8364e065ab359eae5b3db5badb9
  Author: Florian Weimer <fweimer@xxxxxxxxxx>
  Date:   Thu Apr 14 12:53:03 2016 +0200

      malloc: Add missing internal_function attributes on function definitions
      
      Fixes build on i386 after commit 29d794863cd6e03115d3670707cc873a9965ba92.
      
      (cherry picked from commit 186fe877f3df0b84d57dfbf0386f6332c6aa69bc)

  commit 927170dd59787d9443e07eeb0b22329c4eff1530
  Author: Florian Weimer <fweimer@xxxxxxxxxx>
  Date:   Thu Apr 14 09:18:30 2016 +0200

      malloc: Remove malloc hooks from fork handler
      
      The fork handler now runs so late that there is no risk anymore that
      other fork handlers in the same thread use malloc, so it is no
      longer necessary to install malloc hooks which made a subset
      of malloc functionality available to the thread that called fork.
      
      (cherry picked from commit 8a727af925be63aa6ea0f5f90e16751fd541626b)

  commit 2a71cf409681b89ffb8892b35cac64de79b7adb8
  Author: Florian Weimer <fweimer@xxxxxxxxxx>
  Date:   Thu Apr 14 09:17:02 2016 +0200

      malloc: Run fork handler as late as possible [BZ #19431]
      
      Previously, a thread M invoking fork would acquire locks in this order:
      
        (M1) malloc arena locks (in the registered fork handler)
        (M2) libio list lock
      
      A thread F invoking flush (NULL) would acquire locks in this order:
      
        (F1) libio list lock
        (F2) individual _IO_FILE locks
      
      A thread G running getdelim would use this order:
      
        (G1) _IO_FILE lock
        (G2) malloc arena lock
      
      After executing (M1), (F1), (G1), none of the threads can make progress.
      
      This commit changes the fork lock order to:
      
        (M'1) libio list lock
        (M'2) malloc arena locks
      
      It explicitly encodes the lock order in the implementations of fork,
      and does not rely on the registration order, thus avoiding the deadlock.
      
      (cherry picked from commit 29d794863cd6e03115d3670707cc873a9965ba92)

  
  commit a5c2f42566460fc73755c768e8e1c59dbd5a4bb2
  Author: Samuel Thibault <samuel.thibault@xxxxxxxxxxxx>
  Date:   Tue Mar 22 09:58:48 2016 +0100

      Fix malloc threaded tests link on non-Linux
      
      	* malloc/Makefile (tst-malloc-backtrace,
      	tst-malloc-thread-exit, tst-malloc-thread-fail): Use
      	 instead of hardcoding the path to libpthread.
      
      (cherry picked from commit b87e41378beca3c98ec3464d64835e66cc788497)

  
  commit f69ae17e843b00d3495b736f4381c1fa64dc02bc
  Author: Florian Weimer <fweimer@xxxxxxxxxx>
  Date:   Fri Feb 19 17:07:45 2016 +0100

      malloc: Remove NO_THREADS
      
      No functional change.  It was not possible to build without
      threading support before.
      
      (cherry picked from commit 59eda029a8a35e5f4e5cd7be0f84c6629e48ec6e)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1630302/+subscriptions