← Back to team overview

kernel-packages team mailing list archive

[Bug 1308474] [NEW] CPU-intensive processes killed by OS

 

Public bug reported:

When testing some algorithm I ran the test program several times with
different parameters. It was to be expected that each run takes about
four hours, and to utilize the capacities of my computer (4 CPUs) I
started the program 4 times running in parallel and with lowered
priority (the latter to be able to do meanwhile something else on the
computer).  All four processes ran in parallel for a few minutes but
then two of them were killed with the error message:

Command terminated by signal 9
[1]   Exit 137                /usr/bin/time -f '%U+%S sec, avg %Kk max %Mk memory, major %F minor %R faults, %I+%O i/o' ./testu01long -d9689 -b3 > cat9689.tub

Here, "/usr/bin/time ..." is the command I launched for the first
process, the second error message was quite similar, now for the second
process. The process started first ran about 6 minutes, the second about
14 minutes, so the TERM signals have been issued at different times (the
processes had been started within one minute). The two other processes
continued without problems. The point is that I did NOT issue a TERM
signal to the processes, I conclude that the OS issued the TERM signals.
The effect can be repeated to some extend: if another CPU-intensive
process is started (e.g. a lengthy compilation) then an already running
CPU-intensive process may be killed, if so then the one first started.

When the killed program had been relaunched with the same parameters but
without concurrency of other CPU-intensive processes (e.g. only editing,
reading e-mails running in parallel) then all things were fine, so the
kill is really not caused by the program. The program does not use
external devices and even no I/O except for writing a summary to stdout
at program end (i.e. not when the processes were killed).

The question is, why have the processes been killed? Is the OS unable to manage as many running processes? 
I consider this a bug in the OS, part process management. 
The bug implies that the computer can be used for light weight tasks only. 

With regards
Dr. Wolfgang Jansen

PS: 
Additional platform info (commands "ulimit -a", "top" when two CPU-intensive processes were still running): 

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30092
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 30092
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

-----------------------------------------------

Tasks: 244 total,   3 running, 241 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.5 us,  1.2 sy, 27.5 ni, 54.6 id,  1.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   3930376 total,  2263872 used,  1666504 free,    36756 buffers
KiB Swap:     9212 total,     9212 used,        0 free,   705612 cached

  PID USER      PR  NI  VIRT  RES  SHR S P  %CPU %MEM   TIME COMMAND            
18804 wolfgang  30  10 14632  960  648 R 1 103.0  0.0  95:09 testu01long        
18817 wolfgang  30  10 14632  960  648 R 0 103.0  0.0  94:03 testu01long        
 2491 root      20   0  444m 128m 103m S 2   6.4  3.3   7:04 Xorg               
 2945 wolfgang  20   0 1950m 205m  11m S 2   6.4  5.4  13:08 cinnamon           
    1 root      20   0 27336 1536    0 S 1   0.0  0.0   0:01 init               
    2 root      20   0     0    0    0 S 3   0.0  0.0   0:00 kthreadd           
    3 root      20   0     0    0    0 S 0   0.0  0.0   0:00 ksoftirqd/0        
    5 root       0 -20     0    0    0 S 0   0.0  0.0   0:00 kworker/0:0H       
    7 root      rt   0     0    0    0 S 0   0.0  0.0   0:00 migration/0        
    8 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcu_bh             
    9 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/0            
   10 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/1            
   11 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/2            
   12 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/3            
   13 root      20   0     0    0    0 S 3   0.0  0.0   0:27 rcu_sched          
   14 root      20   0     0    0    0 S 3   0.0  0.0   0:06 rcuos/0            
   15 root      20   0     0    0    0 S 1   0.0  0.0   0:07 rcuos/1    
         
Remark: The critical figure seems to be the swap space.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: linux-image-3.11.0-19-generic 3.11.0-19.33
ProcVersionSignature: Ubuntu 3.11.0-19.33-generic 3.11.10.5
Uname: Linux 3.11.0-19-generic x86_64
ApportVersion: 2.12.5-0ubuntu2.2
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC1:  wolfgang   2911 F.... pulseaudio
 /dev/snd/controlC0:  wolfgang   2911 F.... pulseaudio
CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
Date: Wed Apr 16 11:35:44 2014
InstallationDate: Installed on 2014-03-20 (26 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424)
IwConfig:
 eth0      no wireless extensions.
 
 lo        no wireless extensions.
MachineType: Acer Aspire XC600
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.11.0-19-generic root=UUID=10374065-1d70-4588-aca9-25093062a44a ro quiet splash vt.handoff=7
RfKill:
 
SourcePackage: linux
UpgradeStatus: Upgraded to saucy on 2014-03-21 (25 days ago)
WifiSyslog:
 
dmi.bios.date: 11/02/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P11-A3
dmi.board.name: Aspire XC600
dmi.board.vendor: Acer
dmi.board.version: v1.0
dmi.chassis.type: 3
dmi.chassis.vendor: Acer
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP11-A3:bd11/02/2012:svnAcer:pnAspireXC600:pvr:rvnAcer:rnAspireXC600:rvrv1.0:cvnAcer:ct3:cvr:
dmi.product.name: Aspire XC600
dmi.sys.vendor: Acer

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug saucy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1308474

Title:
  CPU-intensive processes killed by OS

Status in “linux” package in Ubuntu:
  New

Bug description:
  When testing some algorithm I ran the test program several times with
  different parameters. It was to be expected that each run takes about
  four hours, and to utilize the capacities of my computer (4 CPUs) I
  started the program 4 times running in parallel and with lowered
  priority (the latter to be able to do meanwhile something else on the
  computer).  All four processes ran in parallel for a few minutes but
  then two of them were killed with the error message:

  Command terminated by signal 9
  [1]   Exit 137                /usr/bin/time -f '%U+%S sec, avg %Kk max %Mk memory, major %F minor %R faults, %I+%O i/o' ./testu01long -d9689 -b3 > cat9689.tub

  Here, "/usr/bin/time ..." is the command I launched for the first
  process, the second error message was quite similar, now for the
  second process. The process started first ran about 6 minutes, the
  second about 14 minutes, so the TERM signals have been issued at
  different times (the processes had been started within one minute).
  The two other processes continued without problems. The point is that
  I did NOT issue a TERM signal to the processes, I conclude that the OS
  issued the TERM signals. The effect can be repeated to some extend: if
  another CPU-intensive process is started (e.g. a lengthy compilation)
  then an already running CPU-intensive process may be killed, if so
  then the one first started.

  When the killed program had been relaunched with the same parameters
  but without concurrency of other CPU-intensive processes (e.g. only
  editing, reading e-mails running in parallel) then all things were
  fine, so the kill is really not caused by the program. The program
  does not use external devices and even no I/O except for writing a
  summary to stdout at program end (i.e. not when the processes were
  killed).

  The question is, why have the processes been killed? Is the OS unable to manage as many running processes? 
  I consider this a bug in the OS, part process management. 
  The bug implies that the computer can be used for light weight tasks only. 

  With regards
  Dr. Wolfgang Jansen

  PS: 
  Additional platform info (commands "ulimit -a", "top" when two CPU-intensive processes were still running): 

  core file size          (blocks, -c) 0
  data seg size           (kbytes, -d) unlimited
  scheduling priority             (-e) 0
  file size               (blocks, -f) unlimited
  pending signals                 (-i) 30092
  max locked memory       (kbytes, -l) 64
  max memory size         (kbytes, -m) unlimited
  open files                      (-n) 1024
  pipe size            (512 bytes, -p) 8
  POSIX message queues     (bytes, -q) 819200
  real-time priority              (-r) 0
  stack size              (kbytes, -s) 8192
  cpu time               (seconds, -t) unlimited
  max user processes              (-u) 30092
  virtual memory          (kbytes, -v) unlimited
  file locks                      (-x) unlimited

  -----------------------------------------------

  Tasks: 244 total,   3 running, 241 sleeping,   0 stopped,   0 zombie
  %Cpu(s): 15.5 us,  1.2 sy, 27.5 ni, 54.6 id,  1.2 wa,  0.0 hi,  0.0 si,  0.0 st
  KiB Mem:   3930376 total,  2263872 used,  1666504 free,    36756 buffers
  KiB Swap:     9212 total,     9212 used,        0 free,   705612 cached

    PID USER      PR  NI  VIRT  RES  SHR S P  %CPU %MEM   TIME COMMAND            
  18804 wolfgang  30  10 14632  960  648 R 1 103.0  0.0  95:09 testu01long        
  18817 wolfgang  30  10 14632  960  648 R 0 103.0  0.0  94:03 testu01long        
   2491 root      20   0  444m 128m 103m S 2   6.4  3.3   7:04 Xorg               
   2945 wolfgang  20   0 1950m 205m  11m S 2   6.4  5.4  13:08 cinnamon           
      1 root      20   0 27336 1536    0 S 1   0.0  0.0   0:01 init               
      2 root      20   0     0    0    0 S 3   0.0  0.0   0:00 kthreadd           
      3 root      20   0     0    0    0 S 0   0.0  0.0   0:00 ksoftirqd/0        
      5 root       0 -20     0    0    0 S 0   0.0  0.0   0:00 kworker/0:0H       
      7 root      rt   0     0    0    0 S 0   0.0  0.0   0:00 migration/0        
      8 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcu_bh             
      9 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/0            
     10 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/1            
     11 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/2            
     12 root      20   0     0    0    0 S 0   0.0  0.0   0:00 rcuob/3            
     13 root      20   0     0    0    0 S 3   0.0  0.0   0:27 rcu_sched          
     14 root      20   0     0    0    0 S 3   0.0  0.0   0:06 rcuos/0            
     15 root      20   0     0    0    0 S 1   0.0  0.0   0:07 rcuos/1    
           
  Remark: The critical figure seems to be the swap space.

  ProblemType: Bug
  DistroRelease: Ubuntu 13.10
  Package: linux-image-3.11.0-19-generic 3.11.0-19.33
  ProcVersionSignature: Ubuntu 3.11.0-19.33-generic 3.11.10.5
  Uname: Linux 3.11.0-19-generic x86_64
  ApportVersion: 2.12.5-0ubuntu2.2
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  wolfgang   2911 F.... pulseaudio
   /dev/snd/controlC0:  wolfgang   2911 F.... pulseaudio
  CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
  Date: Wed Apr 16 11:35:44 2014
  InstallationDate: Installed on 2014-03-20 (26 days ago)
  InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  MachineType: Acer Aspire XC600
  MarkForUpload: True
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.11.0-19-generic root=UUID=10374065-1d70-4588-aca9-25093062a44a ro quiet splash vt.handoff=7
  RfKill:
   
  SourcePackage: linux
  UpgradeStatus: Upgraded to saucy on 2014-03-21 (25 days ago)
  WifiSyslog:
   
  dmi.bios.date: 11/02/2012
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: P11-A3
  dmi.board.name: Aspire XC600
  dmi.board.vendor: Acer
  dmi.board.version: v1.0
  dmi.chassis.type: 3
  dmi.chassis.vendor: Acer
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP11-A3:bd11/02/2012:svnAcer:pnAspireXC600:pvr:rvnAcer:rnAspireXC600:rvrv1.0:cvnAcer:ct3:cvr:
  dmi.product.name: Aspire XC600
  dmi.sys.vendor: Acer

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1308474/+subscriptions


Follow ups

References