← Back to team overview

touch-packages team mailing list archive

[Bug 1337873] Re: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition

 

Im getting back to this after sometime. After the discussion was brought
to upstream we did not get feedback regarding proposed changes but
investigating further it is clear that ifupdown is suffering from race
conditions that cannot be solved simply by creating:

1) big lock - since its ifup/ifdown/ifquery are reentrant*
2) big lock - does not attend to interface order/priority for parallel executions**
3) fine-grained lock - does not attend interface order/priority for parallel executions**

* could be solved by ENV variable being set not to lock childs) by up/down scripts.
**  group of interfaces such as "bridges" and all interfaces connected to it, interfaces and all vlans connected to it

Final approach here will be to guarantee:

1) interfaces should be locked independently on executions
2) locks have to respect interface hierarchy (locking group of inter-connected interfaces such as bridges/interfaces, interfaces/vlans)
3) all up/down scripts have to be reviewed after any locking mechanism change (deadlock by reentrant calls)

IMO

1) stanzas should be created to "group" interfaces to be locked (for parallel executions) respecting hierarchy/order between them
2) locking/state have to be together and independent 

FINALLY

The change to guarantee all that will involve code AND interfaces file
change (for adding special stanzas to make sure appropriate order and
locking is done during interfaces activation). It is not clear if this
change will be smooth enough for a "stable release update". If not I'll
try to provide a PPA to address any needed code-change for those who
suffer from this issue.

BY NOW

The only way to guarantee interfaces activation ordering (without
suffering from intermittent race conditions like the one explained on
this bug) would be to activate interface one by one outside sysv/upstart
scripts OR to use "pre/post" commands with reentrant calls to ifupdown
based on the desired order.

Any comments here are much appreciated.

Thank you

Rafael Tinoco

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ifupdown in Ubuntu.
https://bugs.launchpad.net/bugs/1337873

Title:
  Precise, Trusty, Utopic - ifupdown initialization problems caused by
  race condition

Status in “ifupdown” package in Ubuntu:
  In Progress
Status in “ifupdown” package in Debian:
  New

Bug description:
  * please consider my bonding examples are using eth1 and eth2 as slave
   interfaces.

  ifupdown some race conditions explained bellow. ifenslave does not
  behave well with sysv networking and upstart network-interface scripts
  running together.

  !!!!
  case 1)
  (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
                                    1-1. Wait for locking ifstate.lock
                                        file.
  1-2. Read ifstate file to check
       the target NIC.
  1-3. close(=release) ifstate.lock
       file.
  1-4. Judge that the target NIC
       isn't processed.
                                    1-2. Read ifstate file to check
                                         the target NIC.
                                    1-3. close(=release) ifstate.lock
                                         file.
                                    1-4. Judge that the target NIC
                                         isn't processed.
  2. Lock and update ifstate file.
     Release the lock.
                                    2. Lock and update ifstate file.
                                       Release the lock.
  !!!

  to be explained

  !!!
  case 2)
  (a) ifenslave of eth0			 (b) ifenslave of eth0
  ------------------------------------------------------------------
  3. Execute ifenslave of eth0.	 3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
     /sys/class/net/bond0/bonding
     /slaves then NIC gets up
                                    4. Link down the target NIC.
                                    5. Fails to write NIC id to
                                       /sys/class/net/bond0/bonding/
                                       slaves it is already written.
  !!!

  #####################################################################

  #### My setup:

  root@provisioned:~# cat /etc/modprobe.d/bonding.conf
  alias bond0 bonding options bonding mode=1 arp_interval=2000

  Both, /etc/init.d/networking and upstart network-interface begin
  enabled.

  #### Beginning:

  root@provisioned:~# cat /etc/network/interfaces
  # /etc/network/interfaces

  auto lo
  iface lo inet loopback

  auto eth0
  iface eth0 inet dhcp

  I'm able to boot with both scripts (networking and network-interface
  enabled) with no problem. I can also boot with only "networking" 
  script enabled:

  ---
  root@provisioned:~# initctl list | grep network
  network-interface stop/waiting
  networking start/running
  ---

  OR only the script "network-interface" enabled:

  ---
  root@provisioned:~# initctl list | grep network
  network-interface (eth2) start/running
  network-interface (lo) start/running
  network-interface (eth0) start/running
  network-interface (eth1) start/running
  ---

  #### Enabling bonding:

  Following ifenslave configuration example (/usr/share/doc/ifenslave/
  examples/two_hotplug_ethernet), my /etc/network/interfaces has to 
  look like this:

  ---
  auto eth1
  iface eth1 inet manual
      bond-master bond0

  auto eth2
  iface eth2 inet manual
      bond-master bond0

  auto bond0
  iface bond0 inet static
      bond-mode 1
      bond-miimon 100
      bond-primary eth1 eth2
  	address 192.168.169.1
  	netmask 255.255.255.0
  	broadcast 192.168.169.255
  ---

  Having both scripts running does not make any difference since we
  are missing "bond-slaves" keyword on slave interfaces, for ifenslave
  to work, and they are set to "manual".

  Ifenslave code:

  """
  for slave in $BOND_SLAVES ; do
  ...
  # Ensure $slave is down.
  ip link set "$slave" down 2>/dev/null
  if ! sysfs_add slaves "$slave" 2>/dev/null ; then
  	echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER 
  			ready and a bonding interface ?" >&2
  else
  	# Bring up slave if it is the target of an allow-bondX stanza.
  	# This is usefull to bring up slaves that need extra setup.
  	if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
  		--list | grep -q $slave; then
  		ifup $v --allow "$BOND_MASTER" "$slave"
  	fi
  """

  Without the keyword "bond-slaves" on the master interface declaration,
  ifenslave will NOT bring any slave interface up on the "master" 
  interface ifup invocation. 

  *********** Part 1

  So, having networking sysv init script AND upstart network-interface
  script running together... the following example works:

  ---
  root@provisioned:~# cat /etc/network/interfaces
  # /etc/network/interfaces

  auto lo
  iface lo inet loopback

  auto eth0
  iface eth0 inet dhcp

  auto eth1
  iface eth1 inet manual
      bond-master bond0

  auto eth2
  iface eth2 inet manual
      bond-master bond0

  auto bond0
  iface bond0 inet static
      bond-mode 1
      bond-miimon 100
      bond-primary eth1
      bond-slaves eth1 eth2
      address 192.168.169.1
      netmask 255.255.255.0
      broadcast 192.168.169.255
  ---

  Ifenslave script sets link down to all slave interfaces, declared by 
  "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave 
  script ONLY tries to make a reentrant call to ifupdown if the slave 
  interfaces have "allow-bondX" stanza (not our case). 

  So this should not work, since when the master bonding interface
  (bond0) is called, ifenslave does not configure slaves without
  "allow-bondX" stanza. What is happening, why is it working ?

  If we disable upstart "network-interface" script.. our bonding stops
  to work on the boot. This is because upstart was the one setting 
  the slave interfaces up (with the configuration above) and not
  sysv networking scripts. 

  It is clear that ifenslave from sysv script invocation can set the 
  slave interface down anytime (even during upstart script execution) 
  so it might work and might not:

  """
  ip link set "$slave" down 2>/dev/null
  """

  root@provisioned:~# initctl list | grep network-interface
  network-interface (eth2) start/running
  network-interface (lo) start/running
  network-interface (bond0) start/running
  network-interface (eth0) start/running
  network-interface (eth1) start/running

  Since having the interface down is a requirement to slave it, 
  running both scripts together (upstart and sysv) could create a 
  situation where upstart puts slave interface online but ifenslave
  from sysv script puts it down and never bring it up again (because
  it does not have "allow-bondX" stanza). 

  *********** Part 2

  What if I disable upstart "network-interface", stay only with the sysv
  script but introduce the "allow-bondX" stanza to slave interfaces ? 

  The funny part begins... without upstart, the ifupdown tool calls
  ifenslave, for bond0 interface, and ifenslave calls this line:

  """
  for slave in $BOND_SLAVES ; do
  ...
  	if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
  		--list | grep -q $slave; then
  		ifup $v --allow "$BOND_MASTER" "$slave"
  	fi
  """

  But ifenslave stays waiting for the bond0 interface to be online
  forever. We do have a chicken egg situation now:

  * ifupdown trys to put bond0 interface online. 
  * we are not running upstart network-interface script.
  * ifupdown for bond0 calls ifenslave.
  * ifenslave tries to find interfaces with "allow-bondX" stanza
  * ifenslave tries to ifup slave interfaces with that stanza
  * slave interfaces keep forever waiting for the master
  * master is waiting for the slave interface
  * slave interface is waiting for the master interface
  ... :D

  And we have an infinite loop for ifenslave:

  """ 
  # Wait for the master to be ready
  [ ! -f /run/network/ifenslave.$BOND_MASTER ] && 
  	echo "Waiting for bond master $BOND_MASTER to be ready"
  while :; do
      if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
          break
      fi
      sleep 0.1
  done
  """

  *********** Conclusion

  That can be achieved if correct triggers are set (like the ones I just
  showed). Not having ifupdown parallel executions (sysv and upstart, 
  for example) can make an infinite loop to happen during the boot.

  Having parallel ifupdown executions can trigger race conditions
  between:

  1) ifupdown itself (case a on the bug description).
  2) ifupdown and ifenslave script (case b on the bug description).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions