yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1874032] [NEW] nova-compute become stuck when doing IO on busy file system

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Alexandre arents <1874032@xxxxxxxxxxxxxxxxxx>
Date: Tue, 21 Apr 2020 09:09:00 -0000
Reply-to: Bug 1874032 <1874032@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Description
===========

It seems that when nova-compute process run I/O intensive task on a busy file system,
it can become stuck and got disconnected from rabbitmq cluster.

>From my understanding nova-compute do not use true OS multithreading,
but internal python multi-tasking also called cooperative multitasking,
This have some limitations like using only one core at a time,
and task did not schedule well under IO workload [1]

It look likes a "known" limitation [2]:
"Periodic tasks are important to understand because of limitations
 in the threading model that OpenStack uses.
 OpenStack uses cooperative threading in Python,
 which means that if something long and complicated is running,
 it will block other tasks inside that process from running unless it voluntarily yields execution    to another cooperative thread."

I think we hit the same limitation than periodic task when we do a
snapshot in qcow2/raw environment, during glance upload, if the snapshot
directory is on a busy file system.


[1] https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=2ahUKEwjpmujvpvfoAhUD0BoKHRkgBtIQFjAFegQICBAB&url=https%3A%2F%2Fwww.dabeaz.com%2Fpython%2FUnderstandingGIL.pdf&usg=AOvVaw33_e5qkePlFOreCu8NrE2J
[2] https://docs.openstack.org/operations-guide/ops-advanced-configuration.html


Steps to reproduce
==================
On master devstack

1)apt-get install fio

2) # create some user data in an initfile                                                                                                                                                             
$ cat bigfile.sh
#!/bin/sh
while true ; do cat /bin/busybox >> bigfile ; done

3)Spawn an  m1.small instance with init script (we must generate snapshot file fat enought)
$openstack server create  --flavor m1.small --image cirros-0.4.0-x86_64-disk --user-data bigfile.sh  --nic net-id=private  bigfile

4)wait few time that instance become fat enought (flavor is 20GB)
$du -sh /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/                                                                                                                                                               
18G     /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/                                                                                                                                                         

5) stop instance to reach bug quicker (avoid qemu blockcopy job)
 $openstack server stop  85998418-bf2c-4246-8204-c08962876d18

6) get file system busy(during snapshot) in another terminal:
$fio --ioengine libaio --iodepth  32 --direct=1 --runtime=50000 --size=100m --time_based --bs=4k --rw=randwrite --numjobs 12 --name=job

7) start snapshot: 
$openstack server image create 85998418-bf2c-4246-8204-c08962876d18 --name snap1

8) you may check in top qemu-img generating snapshot file, after that
nova-compute become in state D waiting for IO

9) check in log rabbit disconnection because nova-compute become stuck: 
$ sudo journalctl -u devstack@n-cpu  -f
Feb 27 13:42:26 alex-devstack nova-compute[31997]: DEBUG oslo_service.periodic_task [None req-e86ab239-c81d-4674-b275-9438999abfff None None] Running periodic task ComputeManager._check_instance_build_time 
Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [813d5cb3-ce73-4294-8d4a-20422c2088c6] AMQP server on 54.39.183.136:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds.:amqp.exceptions.ConnectionForced: Too many heartbeats missed
Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [e61dc56c-e72a-4a6b-b459-e064fa4544f7] AMQP server on 54.39.183.136:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               

NOTE:
-Issue happens only during glance upload, after snapshot file is "extracted" by a qemu-img convert, because upload IOs are made in nova-compute process using native python glance client lib.
-qemu-img convert which is done just before, is run in a subprocess(), so outside of nova-compute, and there is no issue here.
-This fio is a bit overkill (12 jobs) because I run on a ssd disk. In operation the issue may happen relatively quickly if snapshot dirextory is on:
	-a platter disk, only two simultenous snapshot and nova-compute can start to hang.
	-on a network drive with limited resource(ex: rbd mound) it can start with only one snapshot. (whole earth is backing up at 00 am)
	-busy mean sar -d 1 10 say disk is 100% busy


SOLUTION?
==========

-Probably best solution is to use use true multithreading/workers on nova-compute to solve this ?
 It seems that nova-compute is the only nova service that did not use oslo service workers:
 $grep service.serve nova/cmd/*
 nova/cmd/api.py:            launcher.launch_service(server, workers=server.workers or 1)
 nova/cmd/api_metadata.py:    service.serve(server, workers=server.workers)
 nova/cmd/api_os_compute.py:    service.serve(server, workers=server.workers)
 nova/cmd/compute.py:    service.serve(server)
 nova/cmd/conductor.py:    service.serve(server, workers=workers)
 nova/cmd/scheduler.py:    service.serve(server, workers=workers)

 Maybe there is reason to not using it ? Would it be a costly
change(some redesign)?

 -Alternatively we must assume that all I/O workload should be put out
of nova-compute process(fork()).


Expected result
===============
nova-compute should stay healthy when doing IO on busy fs
(even if task run slower)

Actual result
=============
nova-compute become stuck and disconnected from rabbitmq.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1874032

Title:
  nova-compute become stuck when doing IO on busy file system

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  It seems that when nova-compute process run I/O intensive task on a busy file system,
  it can become stuck and got disconnected from rabbitmq cluster.

  From my understanding nova-compute do not use true OS multithreading,
  but internal python multi-tasking also called cooperative multitasking,
  This have some limitations like using only one core at a time,
  and task did not schedule well under IO workload [1]

  It look likes a "known" limitation [2]:
  "Periodic tasks are important to understand because of limitations
   in the threading model that OpenStack uses.
   OpenStack uses cooperative threading in Python,
   which means that if something long and complicated is running,
   it will block other tasks inside that process from running unless it voluntarily yields execution    to another cooperative thread."

  I think we hit the same limitation than periodic task when we do a
  snapshot in qcow2/raw environment, during glance upload, if the
  snapshot directory is on a busy file system.

  
  [1] https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=2ahUKEwjpmujvpvfoAhUD0BoKHRkgBtIQFjAFegQICBAB&url=https%3A%2F%2Fwww.dabeaz.com%2Fpython%2FUnderstandingGIL.pdf&usg=AOvVaw33_e5qkePlFOreCu8NrE2J
  [2] https://docs.openstack.org/operations-guide/ops-advanced-configuration.html

  
  Steps to reproduce
  ==================
  On master devstack

  1)apt-get install fio

  2) # create some user data in an initfile                                                                                                                                                             
  $ cat bigfile.sh
  #!/bin/sh
  while true ; do cat /bin/busybox >> bigfile ; done

  3)Spawn an  m1.small instance with init script (we must generate snapshot file fat enought)
  $openstack server create  --flavor m1.small --image cirros-0.4.0-x86_64-disk --user-data bigfile.sh  --nic net-id=private  bigfile

  4)wait few time that instance become fat enought (flavor is 20GB)
  $du -sh /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/                                                                                                                                                               
  18G     /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/                                                                                                                                                         

  5) stop instance to reach bug quicker (avoid qemu blockcopy job)
   $openstack server stop  85998418-bf2c-4246-8204-c08962876d18

  6) get file system busy(during snapshot) in another terminal:
  $fio --ioengine libaio --iodepth  32 --direct=1 --runtime=50000 --size=100m --time_based --bs=4k --rw=randwrite --numjobs 12 --name=job

  7) start snapshot: 
  $openstack server image create 85998418-bf2c-4246-8204-c08962876d18 --name snap1

  8) you may check in top qemu-img generating snapshot file, after that
  nova-compute become in state D waiting for IO

  9) check in log rabbit disconnection because nova-compute become stuck: 
  $ sudo journalctl -u devstack@n-cpu  -f
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: DEBUG oslo_service.periodic_task [None req-e86ab239-c81d-4674-b275-9438999abfff None None] Running periodic task ComputeManager._check_instance_build_time 
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [813d5cb3-ce73-4294-8d4a-20422c2088c6] AMQP server on 54.39.183.136:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds.:amqp.exceptions.ConnectionForced: Too many heartbeats missed
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [e61dc56c-e72a-4a6b-b459-e064fa4544f7] AMQP server on 54.39.183.136:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               
  Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer                               

  NOTE:
  -Issue happens only during glance upload, after snapshot file is "extracted" by a qemu-img convert, because upload IOs are made in nova-compute process using native python glance client lib.
  -qemu-img convert which is done just before, is run in a subprocess(), so outside of nova-compute, and there is no issue here.
  -This fio is a bit overkill (12 jobs) because I run on a ssd disk. In operation the issue may happen relatively quickly if snapshot dirextory is on:
  	-a platter disk, only two simultenous snapshot and nova-compute can start to hang.
  	-on a network drive with limited resource(ex: rbd mound) it can start with only one snapshot. (whole earth is backing up at 00 am)
  	-busy mean sar -d 1 10 say disk is 100% busy

  
  SOLUTION?
  ==========

  -Probably best solution is to use use true multithreading/workers on nova-compute to solve this ?
   It seems that nova-compute is the only nova service that did not use oslo service workers:
   $grep service.serve nova/cmd/*
   nova/cmd/api.py:            launcher.launch_service(server, workers=server.workers or 1)
   nova/cmd/api_metadata.py:    service.serve(server, workers=server.workers)
   nova/cmd/api_os_compute.py:    service.serve(server, workers=server.workers)
   nova/cmd/compute.py:    service.serve(server)
   nova/cmd/conductor.py:    service.serve(server, workers=workers)
   nova/cmd/scheduler.py:    service.serve(server, workers=workers)

   Maybe there is reason to not using it ? Would it be a costly
  change(some redesign)?

   -Alternatively we must assume that all I/O workload should be put out
  of nova-compute process(fork()).

  
  Expected result
  ===============
  nova-compute should stay healthy when doing IO on busy fs
  (even if task run slower)

  Actual result
  =============
  nova-compute become stuck and disconnected from rabbitmq.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1874032/+subscriptions
Follow ups

[Bug 1874032] Re: nova-compute become stuck when doing IO on busy file system
From: OpenStack Infra, 2020-07-06
[Bug 1874032] Re: nova-compute become stuck when doing IO on busy file system
From: OpenStack Infra, 2020-06-10
[Bug 1874032] Re: nova-compute become stuck when doing IO on busy file system
From: Sylvain Bauza, 2020-04-22