← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1793411] [NEW] Dashboard memory leaks

 

Public bug reported:

1.Issue description

Recently, we found the server which hosts horizon dashboard had serveral
times OOM caused by horizon services. After restarting the dashboard,
the memory usage goes up very quickly if we access
/project/network_topology/ path.

2.How to reproduce

Login into the dashboard and go to 'Network Topology' tab, then leave it
there (autorefresh 10s by default), now monitor the memory changes on
the host.

3.Versions and Components

Dashboard:  Stable/Pike
Server:   uWSGI 1.9.17-1
OS:       Ubuntu 14.04 trusty
Python:   2.7.6

As the codes of memoized has little changes since Pike, if you use
Queen/Rocky release, you may also succeed to reproduce it.

4.The investigation

The root cause of the memory leak is the decorator
memorized(horizon/utils/memoized.py) which is used to cache function
calls in Horizon.

After disable it, the memory increases has been controlled.

The following is the comparison of memory change(with guppy) for each
request of /project/network_topology:

 - original (no code change)        684kb

 - do garbage collection manually   185kb

 - disable memorize cache           10kb

As we known, memoized uses weakref to cache objects. A weak reference to
an object is not enough to keep the object alive: when the only
remaining references to a referent are weak references, garbage
collection is free to destroy the referent and reuse its memory for
something else.

In the memory, we could see lots of weakref stuffs, the following is a
example:

Partition of a set of 394 objects. Total size = 37824 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0    197  50    18912  50     18912  50 _cffi_backend.CDataGCP
     1    197  50    18912  50     37824 100 weakref.KeyedRefq

But the rest of them are not. the following result is the memory objects
changes of per /project/network_topology access with garbage collection
manually.

Partition of a set of 1017 objects. Total size = 183680 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0    419  41    58320  32     58320  32 dict (no owner)
     1    100  10    23416  13     81736  44 list
     2    135  13    15184   8     96920  53 <Nothing>
     3      2   0     6704   4    103624  56 urllib3.connection.VerifiedHTTPSConnection
     4      2   0     6704   4    110328  60 urllib3.connectionpool.HTTPSConnectionPool
     5      1   0     3352   2    113680  62 novaclient.v2.client.Client
     6      2   0     2096   1    115776  63 OpenSSL.SSL.Connection
     7      2   0     2096   1    117872  64 OpenSSL.SSL.Context
     8      2   0     2096   1    119968  65 Queue.LifoQueue
     9     12   1     2096   1    122064  66 dict of urllib3.connectionpool.HTTPSConnectionPool

The most of them are dicts. Followings are the dicts sorted by class, as
you can see most of them are not weakref objects:

Partition of a set of 419 objects. Total size = 58320 bytes.
 Index  Count   %     Size   % Cumulative  % Class
     0    362  86    50712  87     50712  87 unicode
     1     27   6     3736   6     54448  93 list
     2      5   1     2168   4     56616  97 dict
     3     22   5     1448   2     58064 100 str
     4      2   0      192   0     58256 100 weakref.KeyedRef
     5      1   0       64   0     58320 100 keystoneauth1.discover.Discover

5.The issue

So the problem is that memoized does not work like what we expect. It
allocates memory to cache objects but some of them could not be
released.

** Affects: horizon
     Importance: Undecided
         Status: New

** Description changed:

- Issue description
+ 1.Issue description
  
  Recently, we found the server which hosts horizon dashboard had serveral
  times OOM caused by horizon services. After restarting the dashboard,
  the memory usage goes up very quickly if we access
  /project/network_topology/ path.
  
- 
- How to reproduce
+ 2.How to reproduce
  
  Login into the dashboard and go to 'Network Topology' tab, then leave it
  there (autorefresh 10s by default), now monitor the memory changes on
  the host.
  
- Versions and Components
+ 3.Versions and Components
  
  Releases:  Stable/Pike
  Server:   uWSGI 1.9.17-1
  OS:       Ubuntu 14.04 trusty
  Python:   2.7.6
  
  As the codes of memoized has little changes since Pike, if you use
  Queen/Rocky release, you may also succeed to reproduce it.
  
- 
- The investigation
+ 4.The investigation
  
  The root cause of the memory leak is the decorator
  memorized(horizon/utils/memoized.py) which is used to cache function
  calls in Horizon.
  
  After disable it, the memory increases has been controlled.
  
  The following is the comparison of memory change(with guppy) for each
  request of /project/network_topology:
  
-  - original (no code change)        684kb
+  - original (no code change)        684kb
  
-  - do garbage collection manually   185kb
+  - do garbage collection manually   185kb
  
-  - disable memorize cache           10kb
+  - disable memorize cache           10kb
  
- 
- As we known, memoized uses weakref to cache objects. A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
+ As we known, memoized uses weakref to cache objects. A weak reference to
+ an object is not enough to keep the object alive: when the only
+ remaining references to a referent are weak references, garbage
+ collection is free to destroy the referent and reuse its memory for
+ something else.
  
  In the memory, we could see lots of weakref stuffs, the following is a
  example:
  
  Partition of a set of 394 objects. Total size = 37824 bytes.
-  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
-      0    197  50    18912  50     18912  50 _cffi_backend.CDataGCP
-      1    197  50    18912  50     37824 100 weakref.KeyedRefq
+  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
+      0    197  50    18912  50     18912  50 _cffi_backend.CDataGCP
+      1    197  50    18912  50     37824 100 weakref.KeyedRefq
  
  But the rest of them are not. the following result is the memory objects
  changes of per /project/network_topology access with garbage collection
  manually.
  
  Partition of a set of 1017 objects. Total size = 183680 bytes.
-  Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
-      0    419  41    58320  32     58320  32 dict (no owner)
-      1    100  10    23416  13     81736  44 list
-      2    135  13    15184   8     96920  53 <Nothing>
-      3      2   0     6704   4    103624  56 urllib3.connection.VerifiedHTTPSConnection
-      4      2   0     6704   4    110328  60 urllib3.connectionpool.HTTPSConnectionPool
-      5      1   0     3352   2    113680  62 novaclient.v2.client.Client
-      6      2   0     2096   1    115776  63 OpenSSL.SSL.Connection
-      7      2   0     2096   1    117872  64 OpenSSL.SSL.Context
-      8      2   0     2096   1    119968  65 Queue.LifoQueue
-      9     12   1     2096   1    122064  66 dict of urllib3.connectionpool.HTTPSConnectionPool
+  Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
+      0    419  41    58320  32     58320  32 dict (no owner)
+      1    100  10    23416  13     81736  44 list
+      2    135  13    15184   8     96920  53 <Nothing>
+      3      2   0     6704   4    103624  56 urllib3.connection.VerifiedHTTPSConnection
+      4      2   0     6704   4    110328  60 urllib3.connectionpool.HTTPSConnectionPool
+      5      1   0     3352   2    113680  62 novaclient.v2.client.Client
+      6      2   0     2096   1    115776  63 OpenSSL.SSL.Connection
+      7      2   0     2096   1    117872  64 OpenSSL.SSL.Context
+      8      2   0     2096   1    119968  65 Queue.LifoQueue
+      9     12   1     2096   1    122064  66 dict of urllib3.connectionpool.HTTPSConnectionPool
  
  The most of them are dicts. Followings are the dicts sorted by class, as
  you can see most of them are not weakref objects:
  
  Partition of a set of 419 objects. Total size = 58320 bytes.
-  Index  Count   %     Size   % Cumulative  % Class
-      0    362  86    50712  87     50712  87 unicode
-      1     27   6     3736   6     54448  93 list
-      2      5   1     2168   4     56616  97 dict
-      3     22   5     1448   2     58064 100 str
-      4      2   0      192   0     58256 100 weakref.KeyedRef
-      5      1   0       64   0     58320 100 keystoneauth1.discover.Discover
+  Index  Count   %     Size   % Cumulative  % Class
+      0    362  86    50712  87     50712  87 unicode
+      1     27   6     3736   6     54448  93 list
+      2      5   1     2168   4     56616  97 dict
+      3     22   5     1448   2     58064 100 str
+      4      2   0      192   0     58256 100 weakref.KeyedRef
+      5      1   0       64   0     58320 100 keystoneauth1.discover.Discover
  
- The issue
+ 5.The issue
  
  So the problem is that memoized does not work like what we expect. It
  allocates memory to cache objects but some of them could not be
  released.

** Description changed:

  1.Issue description
  
  Recently, we found the server which hosts horizon dashboard had serveral
  times OOM caused by horizon services. After restarting the dashboard,
  the memory usage goes up very quickly if we access
  /project/network_topology/ path.
  
  2.How to reproduce
  
  Login into the dashboard and go to 'Network Topology' tab, then leave it
  there (autorefresh 10s by default), now monitor the memory changes on
  the host.
  
  3.Versions and Components
  
- Releases:  Stable/Pike
+ Dashboard:  Stable/Pike
  Server:   uWSGI 1.9.17-1
  OS:       Ubuntu 14.04 trusty
  Python:   2.7.6
  
  As the codes of memoized has little changes since Pike, if you use
  Queen/Rocky release, you may also succeed to reproduce it.
  
  4.The investigation
  
  The root cause of the memory leak is the decorator
  memorized(horizon/utils/memoized.py) which is used to cache function
  calls in Horizon.
  
  After disable it, the memory increases has been controlled.
  
  The following is the comparison of memory change(with guppy) for each
  request of /project/network_topology:
  
   - original (no code change)        684kb
  
   - do garbage collection manually   185kb
  
   - disable memorize cache           10kb
  
  As we known, memoized uses weakref to cache objects. A weak reference to
  an object is not enough to keep the object alive: when the only
  remaining references to a referent are weak references, garbage
  collection is free to destroy the referent and reuse its memory for
  something else.
  
  In the memory, we could see lots of weakref stuffs, the following is a
  example:
  
  Partition of a set of 394 objects. Total size = 37824 bytes.
   Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
       0    197  50    18912  50     18912  50 _cffi_backend.CDataGCP
       1    197  50    18912  50     37824 100 weakref.KeyedRefq
  
  But the rest of them are not. the following result is the memory objects
  changes of per /project/network_topology access with garbage collection
  manually.
  
  Partition of a set of 1017 objects. Total size = 183680 bytes.
   Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
       0    419  41    58320  32     58320  32 dict (no owner)
       1    100  10    23416  13     81736  44 list
       2    135  13    15184   8     96920  53 <Nothing>
       3      2   0     6704   4    103624  56 urllib3.connection.VerifiedHTTPSConnection
       4      2   0     6704   4    110328  60 urllib3.connectionpool.HTTPSConnectionPool
       5      1   0     3352   2    113680  62 novaclient.v2.client.Client
       6      2   0     2096   1    115776  63 OpenSSL.SSL.Connection
       7      2   0     2096   1    117872  64 OpenSSL.SSL.Context
       8      2   0     2096   1    119968  65 Queue.LifoQueue
       9     12   1     2096   1    122064  66 dict of urllib3.connectionpool.HTTPSConnectionPool
  
  The most of them are dicts. Followings are the dicts sorted by class, as
  you can see most of them are not weakref objects:
  
  Partition of a set of 419 objects. Total size = 58320 bytes.
   Index  Count   %     Size   % Cumulative  % Class
       0    362  86    50712  87     50712  87 unicode
       1     27   6     3736   6     54448  93 list
       2      5   1     2168   4     56616  97 dict
       3     22   5     1448   2     58064 100 str
       4      2   0      192   0     58256 100 weakref.KeyedRef
       5      1   0       64   0     58320 100 keystoneauth1.discover.Discover
  
  5.The issue
  
  So the problem is that memoized does not work like what we expect. It
  allocates memory to cache objects but some of them could not be
  released.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1793411

Title:
  Dashboard memory leaks

Status in OpenStack Dashboard (Horizon):
  New

Bug description:
  1.Issue description

  Recently, we found the server which hosts horizon dashboard had
  serveral times OOM caused by horizon services. After restarting the
  dashboard, the memory usage goes up very quickly if we access
  /project/network_topology/ path.

  2.How to reproduce

  Login into the dashboard and go to 'Network Topology' tab, then leave
  it there (autorefresh 10s by default), now monitor the memory changes
  on the host.

  3.Versions and Components

  Dashboard:  Stable/Pike
  Server:   uWSGI 1.9.17-1
  OS:       Ubuntu 14.04 trusty
  Python:   2.7.6

  As the codes of memoized has little changes since Pike, if you use
  Queen/Rocky release, you may also succeed to reproduce it.

  4.The investigation

  The root cause of the memory leak is the decorator
  memorized(horizon/utils/memoized.py) which is used to cache function
  calls in Horizon.

  After disable it, the memory increases has been controlled.

  The following is the comparison of memory change(with guppy) for each
  request of /project/network_topology:

   - original (no code change)        684kb

   - do garbage collection manually   185kb

   - disable memorize cache           10kb

  As we known, memoized uses weakref to cache objects. A weak reference
  to an object is not enough to keep the object alive: when the only
  remaining references to a referent are weak references, garbage
  collection is free to destroy the referent and reuse its memory for
  something else.

  In the memory, we could see lots of weakref stuffs, the following is a
  example:

  Partition of a set of 394 objects. Total size = 37824 bytes.
   Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
       0    197  50    18912  50     18912  50 _cffi_backend.CDataGCP
       1    197  50    18912  50     37824 100 weakref.KeyedRefq

  But the rest of them are not. the following result is the memory
  objects changes of per /project/network_topology access with garbage
  collection manually.

  Partition of a set of 1017 objects. Total size = 183680 bytes.
   Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
       0    419  41    58320  32     58320  32 dict (no owner)
       1    100  10    23416  13     81736  44 list
       2    135  13    15184   8     96920  53 <Nothing>
       3      2   0     6704   4    103624  56 urllib3.connection.VerifiedHTTPSConnection
       4      2   0     6704   4    110328  60 urllib3.connectionpool.HTTPSConnectionPool
       5      1   0     3352   2    113680  62 novaclient.v2.client.Client
       6      2   0     2096   1    115776  63 OpenSSL.SSL.Connection
       7      2   0     2096   1    117872  64 OpenSSL.SSL.Context
       8      2   0     2096   1    119968  65 Queue.LifoQueue
       9     12   1     2096   1    122064  66 dict of urllib3.connectionpool.HTTPSConnectionPool

  The most of them are dicts. Followings are the dicts sorted by class,
  as you can see most of them are not weakref objects:

  Partition of a set of 419 objects. Total size = 58320 bytes.
   Index  Count   %     Size   % Cumulative  % Class
       0    362  86    50712  87     50712  87 unicode
       1     27   6     3736   6     54448  93 list
       2      5   1     2168   4     56616  97 dict
       3     22   5     1448   2     58064 100 str
       4      2   0      192   0     58256 100 weakref.KeyedRef
       5      1   0       64   0     58320 100 keystoneauth1.discover.Discover

  5.The issue

  So the problem is that memoized does not work like what we expect. It
  allocates memory to cache objects but some of them could not be
  released.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1793411/+subscriptions


Follow ups