← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1903733] Re: Out of memory issue for websocket client

 

This bug was fixed in the package python-tornado - 4.5.3-1ubuntu0.2

---------------
python-tornado (4.5.3-1ubuntu0.2) bionic; urgency=low

  * d/p/0001-lp1903733-read-queue-of-1-message.patch (LP: #1903733)
    - fixes potential oom error with python-tornado client websocket
  * d/p/0002-lp1903733-Remove-unused-import.patch
    - code clean up

 -- Heather Lemon <heather.lemon@xxxxxxxxxxxxx>  Tue, 12 Jan 2021
20:57:40 +0000

** Changed in: python-tornado (Ubuntu Bionic)
       Status: Fix Committed => Fix Released

** Changed in: python-tornado (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1903733

Title:
  Out of memory issue for websocket client

Status in python-tornado package in Ubuntu:
  Fix Released
Status in python-tornado source package in Xenial:
  Fix Released
Status in python-tornado source package in Bionic:
  Fix Released
Status in python-tornado source package in Focal:
  Fix Released
Status in python-tornado source package in Groovy:
  Fix Released
Status in python-tornado source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  Applications using package python-tornado v5.1.1 or earlier are susceptible to an out of memory error related to websockets.

  [Other Info]

  Upstream commit(s):
  https://github.com/tornadoweb/tornado/pull/2351/commits/20becca336caae61cd24f7afba0e177c0a210c70

  $ git remote -v
  origin	https://github.com/tornadoweb/tornado.git (fetch)
  origin	https://github.com/tornadoweb/tornado.git (push)

  $ git describe --contains 20becca3
  v5.1.0b1~28^2~1

  $ rmadison python3-tornardo
   => python3-tornado | 4.2.1-1ubuntu3      | xenial
   python3-tornado | 4.5.3-1             | bionic/universe
   => python3-tornado | 4.5.3-1ubuntu0.1    | bionic-updates/universe
   python3-tornado | 6.0.3+really5.1.1-3 | focal/universe
   python3-tornado | 6.0.4-2             | groovy/universe
   python3-tornado | 6.0.4-3             | hirsute/universe
   python3-tornado | 6.1.0-1             | hirsute-proposed/universe

  [Original Description]

  Tornado has no 'flow control', [8] TCP flow control definition, for websockets. A websocket will receive data as fast as it can, and store the data in a deque. If that data is not consumed as fast as it is written, then that deque will grow in size indefinitely, ultimately leading to a memory error and killing the process.
  Fix is to use a Queue. Read and get messages from the queue on the client side.

  Patch file [0]
  Commit history [1]
  GitHub [2]
  Issue [3]

  [0] https://patch-diff.githubusercontent.com/raw/tornadoweb/tornado/pull/2351.patch
  [1] https://github.com/tornadoweb/tornado/pull/2351/commits
  [2] https://github.com/tornadoweb/tornado
  [3] https://github.com/tornadoweb/tornado/issues/2341

  [Test Case]

  I will be attaching two python test files.
  client.py
  server.py

  # create lxc container & limits on memory and turn off swap
  $ sudo apt-get install lxc lxd
  $ lxd init
  $ lxc launch ubuntu:18.04 bionic-python-tornado

  # shrink server size
  lxc config set server limits.cpu 2
  # changes ram setting
  lxc config set server limits.memory 150MB
  # severely limits amount of swap used [4]
  lxc config set bionic-py-tornado limits.memory.swap false

  # install dev tools and download source code
  $ lxc exec bionic-python-tornado bash
  $ apt-get update
  $ apt install ubuntu-dev-tools -y
  $ pull-lp-source python-tornado bionic
  $ sudo apt build-dep .

  # copy client.py and server.py to
  # $ ~/python-tornado-4.5.3/demos
  $ scp or touch client.py and server.py

  # build code
  $ python3 setup.py build
  $ python3 setup.py install

  # I have 3 terminals open
  2 for executing python, one for the client and one for server
  and another one using top to view memory constraints

  # run server.py, client.py, and top in separate terminals
  $ python3 demos/client.py
  $ python3 demos/server.py
  $ top

  What gets print out in the client.py is the length of the
  collections.deque

  In the server.py prints out messages like:
  message: keep alive

  * press ctrl+E for showing memory in MB in the terminal with top
  top - shows that swap is off/ running very low and our memory is only 150MB

  Although I never hit the oom exception that is expected to be thrown,
  you can check dmesg
  $ sudo dmesg | grep -i python

  looks similar to this:
  [ 3250.067833] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=lxc.payload.iptest,mems_allowed=0,oom_memcg=/lxc.payload.iptest,task_memcg=/lxc.payload.iptest,task=python3,pid=44889,uid=1000000
  [ 3250.067838] Memory cgroup out of memory: Killed process 44889 (python3) total-vm:304616kB, anon-rss:235152kB, file-rss:0kB, shmem-rss:0kB, UID:1000000 pgtables:628kB oom_score_adj:0
  [ 3250.075096] oom_reaper: reaped process 44889 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0k

  After either adding the patch or running focal or later versions
  *pull-lp-source python-tornado focal

  We can run the exact same setup again, and this time it shows that the
  new queue object has only a length of 1.

  We have shown that before the patch, what was used to store messages
  in the queue was unbounded and could grow "If maxlen is not specified
  or is None, deques may grow to an arbitrary length over time."[6]
  Afterwards they decided using a blocking queue with size 1.
  (Queue(1)), where there is only ever 1 item in the queue at a time.
  [7]

  [4] https://discuss.linuxcontainers.org/t/limiting-swap-in-lxc-container/6343/4
  [5] attached screenshot
  [6] https://docs.python.org/3/library/collections.html?highlight=collections%20deque#collections.deque
  [7] https://www.tornadoweb.org/en/stable/queues.html
  [8] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control

  [Where Problems Could Occur]

  Potential problems could occur if messages are not being consumed from the queue. 
  """
    File "/usr/local/lib/python3.8/dist-packages/tornado-5.1.1-py3.8-linux-x86_64.egg/tornado/queues.py", line 202, in put_nowait
      raise QueueFull
  tornado.queues.QueueFull
  """
  If the messages are not being taken off then you will always have something in the queue.    

  [Extra Notes]

  Although they have solved the oom error by using a blocking queue of
  size one. Ex: (Queue(1)). [9] Which they have used their own
  implementation of what a queue is [10]. If items are not being
  consumed from the queue, then there will be problems there as well.

  They have pushed the issue from the application/library onto the
  networking side. Having dropped data/packets in the network is more
  likely since there will only ever be one item at the queue at a time.

  " Constructor for a FIFO queue. maxsize is an integer that sets the
  upperbound limit on the number of items that can be placed in the
  queue. Insertion will block once this size has been reached, until
  queue items are consumed. If maxsize is less than or equal to zero,
  the queue size is infinite." [11]

  [9] https://github.com/tornadoweb/tornado/blob/master/tornado/websocket.py#L1376
  [10] https://github.com/tornadoweb/tornado/blob/f399f28fecc741667b63b7c20b930d7926d34ac3/tornado/queues.py#L81
  [11] https://docs.python.org/3/library/queue.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-tornado/+bug/1903733/+subscriptions