← Back to team overview

kernel-packages team mailing list archive

[Bug 1328088] Re: Kernel network namespace performance regression during rcu development on kernels above 3.8

 

** Description changed:

  Please, follow this in: http://people.canonical.com/~inaddy/lp1328088/.
  Same description on daily-basis updated text.
  
  --
- 
- It was brought to my attention that "fake router creation" scalability
- was affected during kernel development.
+ It was brought to my attention that network namespace creation scalability was affected during kernel development.
  
  The following script was used for all the tests and charts generation:
  
  http://people.canonical.com/~inaddy/lp1328088/make_fake_routers.sh
  http://people.canonical.com/~inaddy/lp1328088/parse.py
  
  I measured how many "fake routers" (above script) could be added per
  second from 0 to 4000 created routers mark. Using this script and a git
  bisect on kernel tree I was led to one specific commit causing
- regression: #911af505 "rcu: Provide compile-time control for no-CBs
- CPUs".
+ regression: #911af50 "rcu: Provide compile-time control for no-CBs
+ CPUs". Even Though this change was experimental at that point, it
+ introduced a performance scalability regression (explained below) that
+ still last and seems to be the default option for distributions
+ nowadays.
  
- It appeared that rcu, rcu callbacks and no-cb cpus were causing the
- issue so every commit that changed any of this files: "kernel/rcutree.c
- kernel/rcutree.h kernel/rcutree_plugin.h include/trace/events/rcu.h
- include/linux/rcupdate.h" was tested. The idea was to check performance
- regression during rcu development. In the worst case I would have data
- for performance regression during kernel development (since we have rcu
- commits from 3.8 to 3.14).
+ RCU related code looked like to be responsible for the problem. With
+ that, every commit from tag v3.8..master that changed any of this files:
+ "kernel/rcutree.c kernel/rcutree.h kernel/rcutree_plugin.h
+ include/trace/events/rcu.h include/linux/rcupdate.h" was tested. The
+ idea was to check performance regression during rcu development. In the
+ worst case, the regression not being related to rcu, I would still have
+ data to interpret the performance/scalability regression.
  
  All text below this refer to 2 groups of charts, generated during the
  study:
  
  1) Kernel git tags from 3.8 to 3.14.
  http://people.canonical.com/~inaddy/lp1328088/charts/250-tag.html
  
  2) Kernel git commits for rcu development (111 commits).
  http://people.canonical.com/~inaddy/lp1328088/charts/250.html
  
  Since there was difference in results depending on how many cpus or how
  the no-cb cpus were configured, 3 kernel config options were used on
  every measure:
  
- - CONFIG_RCU_NOCB_CPU (disabled: nocbno)
- - CONFIG_RCU_NOCB_CPU_ALL (enabled: nocball)
- - CONFIG_RCU_NOCB_CPU_NONE (enabled: nocbnone)
+ - CONFIG_RCU_NOCB_CPU (disabled): nocbno
+ - CONFIG_RCU_NOCB_CPU_ALL (enabled): nocball
+ - CONFIG_RCU_NOCB_CPU_NONE (enabled): nocbnone
  
- After charts generation and study it was clear that NOCB_CPU_ALL (4
- cpus) affected the "fake routers" creation process performance and this
+ Obs: For 1 cpu cases: nocbno, nocbnone, nocball behaves the same since
+ w/ only 1 cpu there is no no-cb cpu
+ 
+ After charts being generated it was clear that NOCB_CPU_ALL (4 cpus)
+ affected the "fake routers" creation process performance and this
  regression continues up to upstream version. It was also clear that,
- after this commit, there is no scalability executing this test with more
- than 1 cpu.
+ after commit #911af50, having more than 1 cpu does not improve
+ performance/scalability for netns, makes it worse.
+ 
+ #911af50
+ ...
+ +#ifdef CONFIG_RCU_NOCB_CPU_ALL
+ +   pr_info("\tExperimental no-CBs for all CPUs\n");
+ +   cpumask_setall(rcu_nocb_mask);
+ +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
+ ...
  
  Comparing standing out points (see charts):
  
  #81e5949 - good
  #911af50 - bad
- #6faf728 - not good enough
  
- I was able to see that from the script above the following lines were
- affected:
+ I was able to see that, from the script above, the following lines
+ causes major impact on netns scalability/performance:
  
- 1) ip netns add -> huge performance regression
+ 1) ip netns add -> huge performance regression:
+     1 cpu: no regression
+     4 cpu: regression for NOCB_CPU_ALL
+     obs: regression from 250 netns/sec to 50 netns/sec 
+          on 500 netns already created mark
+ 
  2) ip netns exec -> some performance regression
+     1 cpu: no regression
+     4 cpu: regression for NOCB_CPU_ALL
+     obs: regression from 40 netns (+1 exec per netns 
+          creation) to 20 netns/sec on 500 netns created 
+          mark
  
- #
- # Assumption
- #
+ # Assumption (to be confirmed)
  
  rcu callbacks being offloaded to other cpus caused regression in
- unshare(CLONE_NEWNET) code.
- 
- # Specific kernel entry being investigated:
- 
- unshare(CLONE_NEWNET)
+ copy_net_ns<-created_new_namespaces or unshare(clone_newnet).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1328088

Title:
  Kernel network namespace performance regression during rcu development
  on kernels above 3.8

Status in The Linux Kernel:
  In Progress
Status in “linux” package in Ubuntu:
  New

Bug description:
  Please, follow this in:
  http://people.canonical.com/~inaddy/lp1328088/. Same description on
  daily-basis updated text.

  --
  It was brought to my attention that network namespace creation scalability was affected during kernel development.

  The following script was used for all the tests and charts generation:

  http://people.canonical.com/~inaddy/lp1328088/make_fake_routers.sh
  http://people.canonical.com/~inaddy/lp1328088/parse.py

  I measured how many "fake routers" (above script) could be added per
  second from 0 to 4000 created routers mark. Using this script and a
  git bisect on kernel tree I was led to one specific commit causing
  regression: #911af50 "rcu: Provide compile-time control for no-CBs
  CPUs". Even Though this change was experimental at that point, it
  introduced a performance scalability regression (explained below) that
  still last and seems to be the default option for distributions
  nowadays.

  RCU related code looked like to be responsible for the problem. With
  that, every commit from tag v3.8..master that changed any of this
  files: "kernel/rcutree.c kernel/rcutree.h kernel/rcutree_plugin.h
  include/trace/events/rcu.h include/linux/rcupdate.h" was tested. The
  idea was to check performance regression during rcu development. In
  the worst case, the regression not being related to rcu, I would still
  have data to interpret the performance/scalability regression.

  All text below this refer to 2 groups of charts, generated during the
  study:

  1) Kernel git tags from 3.8 to 3.14.
  http://people.canonical.com/~inaddy/lp1328088/charts/250-tag.html

  2) Kernel git commits for rcu development (111 commits).
  http://people.canonical.com/~inaddy/lp1328088/charts/250.html

  Since there was difference in results depending on how many cpus or
  how the no-cb cpus were configured, 3 kernel config options were used
  on every measure:

  - CONFIG_RCU_NOCB_CPU (disabled): nocbno
  - CONFIG_RCU_NOCB_CPU_ALL (enabled): nocball
  - CONFIG_RCU_NOCB_CPU_NONE (enabled): nocbnone

  Obs: For 1 cpu cases: nocbno, nocbnone, nocball behaves the same since
  w/ only 1 cpu there is no no-cb cpu

  After charts being generated it was clear that NOCB_CPU_ALL (4 cpus)
  affected the "fake routers" creation process performance and this
  regression continues up to upstream version. It was also clear that,
  after commit #911af50, having more than 1 cpu does not improve
  performance/scalability for netns, makes it worse.

  #911af50
  ...
  +#ifdef CONFIG_RCU_NOCB_CPU_ALL
  +   pr_info("\tExperimental no-CBs for all CPUs\n");
  +   cpumask_setall(rcu_nocb_mask);
  +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
  ...

  Comparing standing out points (see charts):

  #81e5949 - good
  #911af50 - bad

  I was able to see that, from the script above, the following lines
  causes major impact on netns scalability/performance:

  1) ip netns add -> huge performance regression:
      1 cpu: no regression
      4 cpu: regression for NOCB_CPU_ALL
      obs: regression from 250 netns/sec to 50 netns/sec 
           on 500 netns already created mark

  2) ip netns exec -> some performance regression
      1 cpu: no regression
      4 cpu: regression for NOCB_CPU_ALL
      obs: regression from 40 netns (+1 exec per netns 
           creation) to 20 netns/sec on 500 netns created 
           mark

  # Assumption (to be confirmed)

  rcu callbacks being offloaded to other cpus caused regression in
  copy_net_ns<-created_new_namespaces or unshare(clone_newnet).

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1328088/+subscriptions


References