sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #03092
[Bug 1947099] Re: ipconfig does not honour user-requested timeouts in some cases
I've setup a Lab with dnsmasq acting as DHCP Server, which I can use the
dhcp-reply-delay option to introduce a delay between the DHCPDISCOVER
and DHCPOFFER, as in the example below:
Mar 30 18:26:34 focal-dhcpsrv dnsmasq-dhcp[2470]: DHCPDISCOVER(ens3) 52:54:00:d7:10:13
Mar 30 18:26:34 focal-dhcpsrv dnsmasq-dhcp[2470]: 4195928115 reply delay: 3
Mar 30 18:26:37 focal-dhcpsrv dnsmasq-dhcp[2470]: DHCPOFFER(ens3) 192.168.150.119 52:54:00:d7:10:13
Just minor note that the delay only happens between DHCPDISCOVER and
DHCPOFFER, but not between DHCPREQUEST and DHCPACK:
Mar 30 18:22:13 focal-dhcpsrv dnsmasq-dhcp[2470]: DHCPREQUEST(ens3) 192.168.150.119 52:54:00:d7:10:13
Mar 30 18:22:13 focal-dhcpsrv dnsmasq-dhcp[2470]: DHCPACK(ens3) 192.168.150.119 52:54:00:d7:10:13 ubuntu
So, if this is a new interface (new mac) looking for an IP, the delay
will be added, but if this is an interface which was previously
configured and is asking to re-use the same IP (DHCPREQUEST), there
won't be any delays.
I believe this Lab will help reproducing the issue with and without the
proposed patch.
1. In a ideal scenario, where no delays were added (dhcp-reply-delay is
commented out), ipconfig will take approximately to do its job: it sends
a DHCPREQUEST, then DHCPDISCOVER and then another DHCPREQUEST in this
process:
- From the ipconfig perspective:
root@ubuntu:/etc/dhcp# date; /usr/lib/klibc/bin/ipconfig ens9; date
Wed Mar 30 19:25:44 UTC 2022
IP-Config: ens9 hardware address 52:54:00:f3:4e:0e mtu 1500 DHCP RARP
IP-Config: ens9 complete (dhcp from 192.168.150.2):
address: 192.168.150.105 broadcast: 192.168.150.255 netmask: 255.255.255.0
gateway: 192.168.150.1 dns0 : 192.168.150.2 dns1 : 0.0.0.0
domain : sts.lab
rootserver: 192.168.150.2 rootpath:
filename :
Wed Mar 30 19:25:54 UTC 2022
- From the dhcp server perspective:
Mar 30 19:25:45 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:25:45 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e ubuntu
Mar 30 19:25:54 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:25:54 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:25:54 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:25:54 focal-dhcpsrv dnsmasq-dhcp[2656]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e
2. Adding a 5 seconds delay in dnsmasq (dhcp-reply-delay=5) and without enforcing a timeout in ipconfig:
- From ipconfig perspective:
root@ubuntu:/etc/dhcp# date; /usr/lib/klibc/bin/ipconfig ens9; date
Wed Mar 30 19:34:09 UTC 2022
IP-Config: ens9 hardware address 52:54:00:f3:4e:0e mtu 1500 DHCP RARP
IP-Config: ens9 complete (dhcp from 192.168.150.2):
address: 192.168.150.105 broadcast: 192.168.150.255 netmask: 255.255.255.0
gateway: 192.168.150.1 dns0 : 192.168.150.2 dns1 : 0.0.0.0
domain : sts.lab
rootserver: 192.168.150.2 rootpath:
filename :
Wed Mar 30 19:34:27 UTC 2022
- From the dhcpserver perspective:
Mar 30 19:34:10 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:10 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e ubuntu
Mar 30 19:34:19 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:34:19 focal-dhcpsrv dnsmasq-dhcp[2773]: 1004609631 reply delay: 5
Mar 30 19:34:24 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:24 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:34:24 focal-dhcpsrv dnsmasq-dhcp[2773]: 1004609631 reply delay: 5
Mar 30 19:34:26 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:26 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:34:26 focal-dhcpsrv dnsmasq-dhcp[2773]: 1004609631 reply delay: 5
Mar 30 19:34:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:34:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e
It takes 18 seconds for the process to complete, as dnsmasq adds 3x 5
seconds delay in the process
3. Using the -t to specify a 15 seconds timeout (when the actual process takes 18 seconds):
- From the ipconfig perspective:
root@ubuntu:/etc/dhcp# date; /usr/lib/klibc/bin/ipconfig -t 15 ens9; date
Wed Mar 30 19:37:11 UTC 2022
IP-Config: ens9 hardware address 52:54:00:f3:4e:0e mtu 1500 DHCP RARP
IP-Config: no response after 15 secs - giving up
Wed Mar 30 19:37:28 UTC 2022
(the timeout is properly enforced)
- From the dhcp server perspective:
Mar 30 19:37:12 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:37:12 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e ubuntu
Mar 30 19:37:21 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:37:21 focal-dhcpsrv dnsmasq-dhcp[2773]: 2288189480 reply delay: 5
Mar 30 19:37:26 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:37:26 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:37:26 focal-dhcpsrv dnsmasq-dhcp[2773]: 2288189480 reply delay: 5
Mar 30 19:37:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:37:28 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPDISCOVER(ens3) 52:54:00:f3:4e:0e
Mar 30 19:37:28 focal-dhcpsrv dnsmasq-dhcp[2773]: 2288189480 reply delay: 5
Mar 30 19:37:30 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPOFFER(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:37:30 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPREQUEST(ens3) 192.168.150.105 52:54:00:f3:4e:0e
Mar 30 19:37:30 focal-dhcpsrv dnsmasq-dhcp[2773]: DHCPACK(ens3) 192.168.150.105 52:54:00:f3:4e:0e
4. The problem only happens when the dhcp server is not listening / responding on the network AND the interface is DOWN on the client, then the timeout gets ignored:
root@ubuntu:/etc/dhcp# ip l set ens9 down
root@ubuntu:/etc/dhcp# date; /usr/lib/klibc/bin/ipconfig -t 2 ens9; date
Wed Mar 30 19:40:52 UTC 2022
IP-Config: ens9 hardware address 52:54:00:f3:4e:0e mtu 1500 DHCP RARP
IP-Config: no response after 2 secs - giving up
Wed Mar 30 19:41:02 UTC 2022
If the interface is up (and dhcp client not listening / responding) the
timeout is honored[-ish]:
3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:f3:4e:0e brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:fef3:4e0e/64 scope link
valid_lft forever preferred_lft forever
root@ubuntu:/etc/dhcp# date; /usr/lib/klibc/bin/ipconfig -t 2 ens9; date
Wed Mar 30 19:42:00 UTC 2022
IP-Config: ens9 hardware address 52:54:00:f3:4e:0e mtu 1500 DHCP RARP
IP-Config: no response after 2 secs - giving up
Wed Mar 30 19:42:03 UTC 2022
Anyway, if someone can build a klibc-utils package for me with the proposed patch, I can test the ipconfig behavior in this Lab.
For testing the boot time optimization, I believe I will need a linux-
oracle kernel built with the patch, since initramfs will get the
ipconfig binary from bin/ipconfig inside the initrd. If you can build
that, I can get access to an instance where I can validate the boot
optimization improvements.
--
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1947099
Title:
ipconfig does not honour user-requested timeouts in some cases
Status in klibc package in Ubuntu:
New
Status in klibc source package in Bionic:
Incomplete
Bug description:
[Impact]
In some cases, ipconfig can take a longer time than the user-specified timeouts, causing unexpected delays.
[Test Plan]
Any situation where ipconfig encounters an error sending the DHCP packet, it will automatically set a delay of 10 seconds, which could be longer than the user-specified timeout. It can be reproduced by creating a dummy interface and attempting to run ipconfig on it with a timeout value of less than 10:
# ip link add eth1 type dummy
# date; /usr/lib/klibc/bin/ipconfig -t 2 eth1; date
Thu Nov 18 04:46:13 EST 2021
IP-Config: eth1 hardware address ae:e0:f5:9d:7e:00 mtu 1500 DHCP RARP
IP-Config: no response after 2 secs - giving up
Thu Nov 18 04:46:23 EST 2021
^ Notice above, ipconfig thinks that it waited 2 seconds, but the
timestamps show an actual delay of 10 seconds.
[Where problems could occur]
Please see reproduction steps above. We are seeing this in production too (see comment #2).
[Other Info]
A patch to fix the issue is being proposed here. It is a safe fix - it only checks before going into sleep that the timeout never exceeds the user-requested value.
[Original Description]
In some cases, ipconfig can take longer than the user-specified
timeouts, causing unexpected delays.
in main.c, in function loop(), the process can go into
process_timeout_event() (or process_receive_event() ) and if it
encounters an error situation, will set an attempt to "try again
later" at time equal now + 10 seconds by setting
s->expire = now + 10;
This can happen at any time during the main event loop, which can end
up extending the user-specified timeout if "now + 10" is greater than
"start_time + user-specified-timeout".
I believe a patch like the following is needed to avoid this problem:
--- a/usr/kinit/ipconfig/main.c
+++ b/usr/kinit/ipconfig/main.c
@@ -437,6 +437,13 @@ static int loop(void)
if (timeout > s->expire - now.tv_sec)
timeout = s->expire - now.tv_sec;
+
+ /* Compensate for already-lost time */
+ gettimeofday(&now, NULL);
+ if (now.tv_sec + timeout > start + loop_timeout) {
+ timeout = loop_timeout - (now.tv_sec - start);
+ printf("Lowered timeout to match user request = (%d s) \n", timeout);
+ }
}
I believe the current behaviour is buggy. This is confirmed when the
following line is executed:
if (loop_timeout >= 0 &&
now.tv_sec - start >= loop_timeout) {
printf("IP-Config: no response after %d "
"secs - giving up\n", loop_timeout);
rc = -1;
goto bail;
}
'loop_timeout' is the user-specified time-out. With a value of 2, in
case of error, this line prints:
IP-Config: no response after 2 secs - giving up
So it thinks that it waited 2 seconds - however, in reality it had
actually waited for 10 seconds.
The suggested code-change ensures that the timeout that is actually
used never exceeds the user-specified timeout.
[ Regression potential ]
This change ensures that user-specified timeouts are never exceeded, which is a problem that appears to happen only in case of interface errors.
It may be that someone is relying on current behaviour where they receive DHCP offers after their specified timeout (but within the 10-second error timeout). However, 1) that is buggy behaviour and should be exposed. Such a user would need to update their specified timeout to make it long enough to receive the DHCP offer (setting the timeout to 10 would keep the existing behaviour). 2) I think it is unlikely that such a scenario exists at all. The 10-second timeout problem happens when there are problems with the interface that prevent it from even sending out the DHCP request. I think it is very unlikely (or even, impossible) that DHCP offers would be received on a dead interface.
Based on the above points, I consider the regression potential to be
very low for this change. I do not expect anyone who is currently
using ipconfig successfully to notice this change.
I believe the only difference introduced by this is the reduction of
delays caused by dead or problematic network interfaces. Those error
delays are shortened such that they never exceeed user-specified
timeouts.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1947099/+subscriptions