group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #38323
[Bug 1903733] Re: Out of memory issue for websocket client
This bug was fixed in the package python-tornado - 4.5.3-1ubuntu0.2
---------------
python-tornado (4.5.3-1ubuntu0.2) bionic; urgency=low
* d/p/0001-lp1903733-read-queue-of-1-message.patch (LP: #1903733)
- fixes potential oom error with python-tornado client websocket
* d/p/0002-lp1903733-Remove-unused-import.patch
- code clean up
-- Heather Lemon <heather.lemon@xxxxxxxxxxxxx> Tue, 12 Jan 2021
20:57:40 +0000
** Changed in: python-tornado (Ubuntu Bionic)
Status: Fix Committed => Fix Released
** Changed in: python-tornado (Ubuntu Xenial)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1903733
Title:
Out of memory issue for websocket client
Status in python-tornado package in Ubuntu:
Fix Released
Status in python-tornado source package in Xenial:
Fix Released
Status in python-tornado source package in Bionic:
Fix Released
Status in python-tornado source package in Focal:
Fix Released
Status in python-tornado source package in Groovy:
Fix Released
Status in python-tornado source package in Hirsute:
Fix Released
Bug description:
[Impact]
Applications using package python-tornado v5.1.1 or earlier are susceptible to an out of memory error related to websockets.
[Other Info]
Upstream commit(s):
https://github.com/tornadoweb/tornado/pull/2351/commits/20becca336caae61cd24f7afba0e177c0a210c70
$ git remote -v
origin https://github.com/tornadoweb/tornado.git (fetch)
origin https://github.com/tornadoweb/tornado.git (push)
$ git describe --contains 20becca3
v5.1.0b1~28^2~1
$ rmadison python3-tornardo
=> python3-tornado | 4.2.1-1ubuntu3 | xenial
python3-tornado | 4.5.3-1 | bionic/universe
=> python3-tornado | 4.5.3-1ubuntu0.1 | bionic-updates/universe
python3-tornado | 6.0.3+really5.1.1-3 | focal/universe
python3-tornado | 6.0.4-2 | groovy/universe
python3-tornado | 6.0.4-3 | hirsute/universe
python3-tornado | 6.1.0-1 | hirsute-proposed/universe
[Original Description]
Tornado has no 'flow control', [8] TCP flow control definition, for websockets. A websocket will receive data as fast as it can, and store the data in a deque. If that data is not consumed as fast as it is written, then that deque will grow in size indefinitely, ultimately leading to a memory error and killing the process.
Fix is to use a Queue. Read and get messages from the queue on the client side.
Patch file [0]
Commit history [1]
GitHub [2]
Issue [3]
[0] https://patch-diff.githubusercontent.com/raw/tornadoweb/tornado/pull/2351.patch
[1] https://github.com/tornadoweb/tornado/pull/2351/commits
[2] https://github.com/tornadoweb/tornado
[3] https://github.com/tornadoweb/tornado/issues/2341
[Test Case]
I will be attaching two python test files.
client.py
server.py
# create lxc container & limits on memory and turn off swap
$ sudo apt-get install lxc lxd
$ lxd init
$ lxc launch ubuntu:18.04 bionic-python-tornado
# shrink server size
lxc config set server limits.cpu 2
# changes ram setting
lxc config set server limits.memory 150MB
# severely limits amount of swap used [4]
lxc config set bionic-py-tornado limits.memory.swap false
# install dev tools and download source code
$ lxc exec bionic-python-tornado bash
$ apt-get update
$ apt install ubuntu-dev-tools -y
$ pull-lp-source python-tornado bionic
$ sudo apt build-dep .
# copy client.py and server.py to
# $ ~/python-tornado-4.5.3/demos
$ scp or touch client.py and server.py
# build code
$ python3 setup.py build
$ python3 setup.py install
# I have 3 terminals open
2 for executing python, one for the client and one for server
and another one using top to view memory constraints
# run server.py, client.py, and top in separate terminals
$ python3 demos/client.py
$ python3 demos/server.py
$ top
What gets print out in the client.py is the length of the
collections.deque
In the server.py prints out messages like:
message: keep alive
* press ctrl+E for showing memory in MB in the terminal with top
top - shows that swap is off/ running very low and our memory is only 150MB
Although I never hit the oom exception that is expected to be thrown,
you can check dmesg
$ sudo dmesg | grep -i python
looks similar to this:
[ 3250.067833] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=lxc.payload.iptest,mems_allowed=0,oom_memcg=/lxc.payload.iptest,task_memcg=/lxc.payload.iptest,task=python3,pid=44889,uid=1000000
[ 3250.067838] Memory cgroup out of memory: Killed process 44889 (python3) total-vm:304616kB, anon-rss:235152kB, file-rss:0kB, shmem-rss:0kB, UID:1000000 pgtables:628kB oom_score_adj:0
[ 3250.075096] oom_reaper: reaped process 44889 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0k
After either adding the patch or running focal or later versions
*pull-lp-source python-tornado focal
We can run the exact same setup again, and this time it shows that the
new queue object has only a length of 1.
We have shown that before the patch, what was used to store messages
in the queue was unbounded and could grow "If maxlen is not specified
or is None, deques may grow to an arbitrary length over time."[6]
Afterwards they decided using a blocking queue with size 1.
(Queue(1)), where there is only ever 1 item in the queue at a time.
[7]
[4] https://discuss.linuxcontainers.org/t/limiting-swap-in-lxc-container/6343/4
[5] attached screenshot
[6] https://docs.python.org/3/library/collections.html?highlight=collections%20deque#collections.deque
[7] https://www.tornadoweb.org/en/stable/queues.html
[8] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control
[Where Problems Could Occur]
Potential problems could occur if messages are not being consumed from the queue.
"""
File "/usr/local/lib/python3.8/dist-packages/tornado-5.1.1-py3.8-linux-x86_64.egg/tornado/queues.py", line 202, in put_nowait
raise QueueFull
tornado.queues.QueueFull
"""
If the messages are not being taken off then you will always have something in the queue.
[Extra Notes]
Although they have solved the oom error by using a blocking queue of
size one. Ex: (Queue(1)). [9] Which they have used their own
implementation of what a queue is [10]. If items are not being
consumed from the queue, then there will be problems there as well.
They have pushed the issue from the application/library onto the
networking side. Having dropped data/packets in the network is more
likely since there will only ever be one item at the queue at a time.
" Constructor for a FIFO queue. maxsize is an integer that sets the
upperbound limit on the number of items that can be placed in the
queue. Insertion will block once this size has been reached, until
queue items are consumed. If maxsize is less than or equal to zero,
the queue size is infinite." [11]
[9] https://github.com/tornadoweb/tornado/blob/master/tornado/websocket.py#L1376
[10] https://github.com/tornadoweb/tornado/blob/f399f28fecc741667b63b7c20b930d7926d34ac3/tornado/queues.py#L81
[11] https://docs.python.org/3/library/queue.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-tornado/+bug/1903733/+subscriptions