touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #134230
[Bug 1337873] Re: ifupdown initialization problems caused by race condition
Hello Max,
My guess (supported by a test I made in a test environment) is the cause of the problem are those lines under iface bond0:
pre-ip ifup eth2
pre-up ifup eth3
Those are most probably causing a deadlock, since the new release aims to fix the race condition causing the original issue (described above).
Removing those lines (and hence following the convention described in /usr/share/doc/ifenslave/README.Debian.gz) will make your configuration compatible with the supported (and documented) convention.
In your case ifupdown will be responsible for bringing eth2 and eth3
devices while setting up bond0, so you don't need to undertake any
additional actions in the bond0 section - please depend on this.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ifupdown in Ubuntu.
https://bugs.launchpad.net/bugs/1337873
Title:
ifupdown initialization problems caused by race condition
Status in ifenslave package in Ubuntu:
Fix Released
Status in ifupdown package in Ubuntu:
Fix Released
Status in ifenslave source package in Precise:
Won't Fix
Status in ifupdown source package in Precise:
Won't Fix
Status in ifenslave source package in Trusty:
Fix Released
Status in ifupdown source package in Trusty:
Fix Released
Status in ifenslave source package in Vivid:
Fix Released
Status in ifupdown source package in Vivid:
Won't Fix
Status in ifenslave source package in Wily:
Fix Released
Status in ifupdown source package in Wily:
Fix Released
Status in ifupdown package in Debian:
New
Bug description:
[Impact]
* Lack of proper synchronization in ifupdown causes a race condition
resulting in occasional incorrect network interface initialization
(e.g. in bonding case - wrong bonding settings, network unavailable
because slave<->master interfaces initialization order was wrong
* This is very annoying in case of large deployments (e.g. when
bringing up 1000 machines it is almost guaranteed that at least a few
of them will end up with network down).
* It has been fixed by introducing hierarchical and per-interface
locking mechanism ensuring the right order (along with the correct
order in the /e/n/interfaces file) of initialization
[Test Case]
1. Create a VM with bonding configured with at least 2 slave interfaces.
2. Reboot.
3. If all interfaces are up - go to 2.
[Regression Potential]
* This change has been introduced upstream in Debian.
* It does not require any config changes to existing installations.
[Other Info]
Original bug description:
* please consider my bonding examples are using eth1 and eth2 as slave
interfaces.
ifupdown some race conditions explained bellow. ifenslave does not
behave well with sysv networking and upstart network-interface scripts
running together.
!!!!
case 1)
(a) ifup eth0 (b) ifup -a for eth0
-----------------------------------------------------------------
1-1. Lock ifstate.lock file.
1-1. Wait for locking ifstate.lock
file.
1-2. Read ifstate file to check
the target NIC.
1-3. close(=release) ifstate.lock
file.
1-4. Judge that the target NIC
isn't processed.
1-2. Read ifstate file to check
the target NIC.
1-3. close(=release) ifstate.lock
file.
1-4. Judge that the target NIC
isn't processed.
2. Lock and update ifstate file.
Release the lock.
2. Lock and update ifstate file.
Release the lock.
!!!
to be explained
!!!
case 2)
(a) ifenslave of eth0 (b) ifenslave of eth0
------------------------------------------------------------------
3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0.
4. Link down the target NIC.
5. Write NIC id to
/sys/class/net/bond0/bonding
/slaves then NIC gets up
4. Link down the target NIC.
5. Fails to write NIC id to
/sys/class/net/bond0/bonding/
slaves it is already written.
!!!
#####################################################################
#### My setup:
root@provisioned:~# cat /etc/modprobe.d/bonding.conf
alias bond0 bonding options bonding mode=1 arp_interval=2000
Both, /etc/init.d/networking and upstart network-interface begin
enabled.
#### Beginning:
root@provisioned:~# cat /etc/network/interfaces
# /etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
I'm able to boot with both scripts (networking and network-interface
enabled) with no problem. I can also boot with only "networking"
script enabled:
---
root@provisioned:~# initctl list | grep network
network-interface stop/waiting
networking start/running
---
OR only the script "network-interface" enabled:
---
root@provisioned:~# initctl list | grep network
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running
---
#### Enabling bonding:
Following ifenslave configuration example (/usr/share/doc/ifenslave/
examples/two_hotplug_ethernet), my /etc/network/interfaces has to
look like this:
---
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
Having both scripts running does not make any difference since we
are missing "bond-slaves" keyword on slave interfaces, for ifenslave
to work, and they are set to "manual".
Ifenslave code:
"""
for slave in $BOND_SLAVES ; do
...
# Ensure $slave is down.
ip link set "$slave" down 2>/dev/null
if ! sysfs_add slaves "$slave" 2>/dev/null ; then
echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER
ready and a bonding interface ?" >&2
else
# Bring up slave if it is the target of an allow-bondX stanza.
# This is usefull to bring up slaves that need extra setup.
if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
--list | grep -q $slave; then
ifup $v --allow "$BOND_MASTER" "$slave"
fi
"""
Without the keyword "bond-slaves" on the master interface declaration,
ifenslave will NOT bring any slave interface up on the "master"
interface ifup invocation.
*********** Part 1
So, having networking sysv init script AND upstart network-interface
script running together... the following example works:
---
root@provisioned:~# cat /etc/network/interfaces
# /etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1
bond-slaves eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
Ifenslave script sets link down to all slave interfaces, declared by
"bond-slaves" keyword, and assigns them to correct bonding. Ifenslave
script ONLY tries to make a reentrant call to ifupdown if the slave
interfaces have "allow-bondX" stanza (not our case).
So this should not work, since when the master bonding interface
(bond0) is called, ifenslave does not configure slaves without
"allow-bondX" stanza. What is happening, why is it working ?
If we disable upstart "network-interface" script.. our bonding stops
to work on the boot. This is because upstart was the one setting
the slave interfaces up (with the configuration above) and not
sysv networking scripts.
It is clear that ifenslave from sysv script invocation can set the
slave interface down anytime (even during upstart script execution)
so it might work and might not:
"""
ip link set "$slave" down 2>/dev/null
"""
root@provisioned:~# initctl list | grep network-interface
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (bond0) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running
Since having the interface down is a requirement to slave it,
running both scripts together (upstart and sysv) could create a
situation where upstart puts slave interface online but ifenslave
from sysv script puts it down and never bring it up again (because
it does not have "allow-bondX" stanza).
*********** Part 2
What if I disable upstart "network-interface", stay only with the sysv
script but introduce the "allow-bondX" stanza to slave interfaces ?
The funny part begins... without upstart, the ifupdown tool calls
ifenslave, for bond0 interface, and ifenslave calls this line:
"""
for slave in $BOND_SLAVES ; do
...
if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
--list | grep -q $slave; then
ifup $v --allow "$BOND_MASTER" "$slave"
fi
"""
But ifenslave stays waiting for the bond0 interface to be online
forever. We do have a chicken egg situation now:
* ifupdown trys to put bond0 interface online.
* we are not running upstart network-interface script.
* ifupdown for bond0 calls ifenslave.
* ifenslave tries to find interfaces with "allow-bondX" stanza
* ifenslave tries to ifup slave interfaces with that stanza
* slave interfaces keep forever waiting for the master
* master is waiting for the slave interface
* slave interface is waiting for the master interface
... :D
And we have an infinite loop for ifenslave:
"""
# Wait for the master to be ready
[ ! -f /run/network/ifenslave.$BOND_MASTER ] &&
echo "Waiting for bond master $BOND_MASTER to be ready"
while :; do
if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
break
fi
sleep 0.1
done
"""
*********** Conclusion
That can be achieved if correct triggers are set (like the ones I just
showed). Not having ifupdown parallel executions (sysv and upstart,
for example) can make an infinite loop to happen during the boot.
Having parallel ifupdown executions can trigger race conditions
between:
1) ifupdown itself (case a on the bug description).
2) ifupdown and ifenslave script (case b on the bug description).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifenslave/+bug/1337873/+subscriptions