yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83212
[Bug 1874032] Re: nova-compute become stuck when doing IO on busy file system
Reviewed: https://review.opendev.org/734776
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=be9b7358473ca5276cff3c6491f88cd512d118b8
Submitter: Zuul
Branch: master
commit be9b7358473ca5276cff3c6491f88cd512d118b8
Author: Alexandre Arents <alexandre.arents@xxxxxxxxxxxx>
Date: Tue Jun 9 14:17:37 2020 +0000
Snapshot: offload glance upload in a native thread
Execute glance upload in a native thread as it may block the current
coroutine until it completes.
Despite the fact we use eventlet monkey_patching [1] to achieve cooperative
yielding for network IO, file IO on busy file system may still get
nova-compute hanging.
Stick those IO in a native thread using eventlet tpool.execute() [2]
avoid this issue.
[1] https://eventlet.net/doc/patching.html
[2] https://eventlet.net/doc/threading.html
Closes-Bug: #1874032
Change-Id: I8dbc579e0037969aab4f2bb500fccfbde4190726
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1874032
Title:
nova-compute become stuck when doing IO on busy file system
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
It seems that when nova-compute process run I/O intensive task on a busy file system,
it can become stuck and got disconnected from rabbitmq cluster.
From my understanding nova-compute do not use true OS multithreading,
but internal python multi-tasking also called cooperative multitasking,
This have some limitations like using only one core at a time,
and task did not schedule well under IO workload [1]
It look likes a "known" limitation [2]:
"Periodic tasks are important to understand because of limitations
in the threading model that OpenStack uses.
OpenStack uses cooperative threading in Python,
which means that if something long and complicated is running,
it will block other tasks inside that process from running unless it voluntarily yields execution to another cooperative thread."
I think we hit the same limitation than periodic task when we do a
snapshot in qcow2/raw environment, during glance upload, if the
snapshot directory is on a busy file system.
[1] https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=2ahUKEwjpmujvpvfoAhUD0BoKHRkgBtIQFjAFegQICBAB&url=https%3A%2F%2Fwww.dabeaz.com%2Fpython%2FUnderstandingGIL.pdf&usg=AOvVaw33_e5qkePlFOreCu8NrE2J
[2] https://docs.openstack.org/operations-guide/ops-advanced-configuration.html
Steps to reproduce
==================
On master devstack
1)apt-get install fio
2) # create some user data in an initfile
$ cat bigfile.sh
#!/bin/sh
while true ; do cat /bin/busybox >> bigfile ; done
3)Spawn an m1.small instance with init script (we must generate snapshot file fat enought)
$openstack server create --flavor m1.small --image cirros-0.4.0-x86_64-disk --user-data bigfile.sh --nic net-id=private bigfile
4)wait few time that instance become fat enought (flavor is 20GB)
$du -sh /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/
18G /opt/stack/data/nova/instances/85998418-bf2c-4246-8204-c08962876d18/
5) stop instance to reach bug quicker (avoid qemu blockcopy job)
$openstack server stop 85998418-bf2c-4246-8204-c08962876d18
6) get file system busy(during snapshot) in another terminal:
$fio --ioengine libaio --iodepth 32 --direct=1 --runtime=50000 --size=100m --time_based --bs=4k --rw=randwrite --numjobs 12 --name=job
7) start snapshot:
$openstack server image create 85998418-bf2c-4246-8204-c08962876d18 --name snap1
8) you may check in top qemu-img generating snapshot file, after that
nova-compute become in state D waiting for IO
9) check in log rabbit disconnection because nova-compute become stuck:
$ sudo journalctl -u devstack@n-cpu -f
Feb 27 13:42:26 alex-devstack nova-compute[31997]: DEBUG oslo_service.periodic_task [None req-e86ab239-c81d-4674-b275-9438999abfff None None] Running periodic task ComputeManager._check_instance_build_time
Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [813d5cb3-ce73-4294-8d4a-20422c2088c6] AMQP server on 54.39.183.136:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds.:amqp.exceptions.ConnectionForced: Too many heartbeats missed
Feb 27 13:42:26 alex-devstack nova-compute[31997]: ERROR oslo.messaging._drivers.impl_rabbit [-] [e61dc56c-e72a-4a6b-b459-e064fa4544f7] AMQP server on 54.39.183.136:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
Feb 27 13:42:26 alex-devstack nova-compute[31997]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
NOTE:
-Issue happens only during glance upload, after snapshot file is "extracted" by a qemu-img convert, because upload IOs are made in nova-compute process using native python glance client lib.
-qemu-img convert which is done just before, is run in a subprocess(), so outside of nova-compute, and there is no issue here.
-This fio is a bit overkill (12 jobs) because I run on a ssd disk. In operation the issue may happen relatively quickly if snapshot dirextory is on:
-a platter disk, only two simultenous snapshot and nova-compute can start to hang.
-on a network drive with limited resource(ex: rbd mound) it can start with only one snapshot. (whole earth is backing up at 00 am)
-busy mean sar -d 1 10 say disk is 100% busy
SOLUTION?
==========
-Probably best solution is to use use true multithreading/workers on nova-compute to solve this ?
It seems that nova-compute is the only nova service that did not use oslo service workers:
$grep service.serve nova/cmd/*
nova/cmd/api.py: launcher.launch_service(server, workers=server.workers or 1)
nova/cmd/api_metadata.py: service.serve(server, workers=server.workers)
nova/cmd/api_os_compute.py: service.serve(server, workers=server.workers)
nova/cmd/compute.py: service.serve(server)
nova/cmd/conductor.py: service.serve(server, workers=workers)
nova/cmd/scheduler.py: service.serve(server, workers=workers)
Maybe there is reason to not using it ? Would it be a costly
change(some redesign)?
-Alternatively we must assume that all I/O workload should be put out
of nova-compute process(fork()).
Expected result
===============
nova-compute should stay healthy when doing IO on busy fs
(even if task run slower)
Actual result
=============
nova-compute become stuck and disconnected from rabbitmq.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1874032/+subscriptions
References