yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95086
[Bug 2026757] Re: [SRU] dnsmasq on Ubuntu Jammy crashes on neutron-dhcp-agent updates
This bug was fixed in the package dnsmasq - 2.90-6ubuntu1
---------------
dnsmasq (2.90-6ubuntu1) plucky; urgency=medium
* d/p/fix-crash-when-reloading-DHCP-config-on-SIGHUP.patch:
Confusion in the code to free old DHCP configuration when
it's being reloaded causes invalid pointers to be followed
and a crash (LP: #2026757).
-- Miriam España Acebal <miriam.espana@xxxxxxxxxxxxx> Mon, 16 Dec 2024
15:50:16 +0100
** Changed in: dnsmasq (Ubuntu Plucky)
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2026757
Title:
[SRU] dnsmasq on Ubuntu Jammy crashes on neutron-dhcp-agent updates
Status in Ironic:
Triaged
Status in neutron:
New
Status in dnsmasq package in Ubuntu:
Fix Released
Status in dnsmasq source package in Jammy:
Confirmed
Status in dnsmasq source package in Kinetic:
Won't Fix
Status in dnsmasq source package in Mantic:
Invalid
Status in dnsmasq source package in Noble:
Triaged
Status in dnsmasq source package in Oracular:
In Progress
Status in dnsmasq source package in Plucky:
Fix Released
Status in dnsmasq package in Debian:
Fix Released
Bug description:
[ Impact ]
* A crash happens after reloading configurations (possibly more prone
with tags), especially blocking the CI infra of the Ironic project.
* The fix is taking from upstream but not yet released by them, but
incorporated already in Debian
[ Test Plan ]
From comment #46 [1], this might be work:
- to start dnsmasq with --dhcp-optsfile and pass it a file that sets tags (like neutron does)
- to have it answer to one request from the host mentioned in the configuration
- to send SIGHUP to the daemon
Unfortunately, there is no reliable set of steps for reproduction (I
didn't make the crash to happen with the above).
In this case, we only have experimental evidence of the Ironic CI
going well or not with the fixed package, as we already did with a
candidate fix in a PPA (comment #48 [2]). We need that the Ironic
community test the package in -proposed.
[1] https://bugs.launchpad.net/ironic/+bug/2026757/comments/46
[2] https://bugs.launchpad.net/ironic/+bug/2026757/comments/48
[ Where problems could occur ]
* Although there is always room for regressions to happen, this
change fixes the problem that comes after upstream adds new features
at some point in the development process (allowing more than one set
tag when using dhcp-option, allowing re-reading configuration) and
skips some scenarios impacted by them (freeing old structures related
to the old configuration). The free instruction still needs to happen,
so the risk of freeing something different with this fix is minimized
(avoided, I would say) by doing it "manually" in the same workspace as
the request.
[ Other Info ]
* The fix [3] explained deeper in [4] was accepted in Debian [5] after
the patch was forwarded to them by us.
[3] https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=f006be7842104a9f86fbf419326b7aad08ade61d
[4] https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2024q4/017828.html
[5] https://launchpad.net/debian/+source/dnsmasq/2.90-7
[ Original Report ]
------------------------------------------------
The Ironic project's CI has been having major blocking issues moving
to utilizing Ubuntu Jammy and with some investigation we were able to
isolate the issues down to the dhcp updates causing dnsmasq to crash
on Ubuntu Jammy, which ships with dnsmasq 2.86. This issue sounds
similar to an issue known about to the dnsmasq maintainers, where
dnsmasq would crash with updates occurring due to configuration
refresh[0].
This resulted in us upgrading dnsmasq to the version which ships with
Ubuntu Lunar.
Which was no better. Dnsmasq still crashed upon record updates for
addresses and ports getting configuration added/changed/removed.
We later downgraded to the version of dnsmasq shipped in Ubuntu Focal,
and dnsmasq stopped crashing and appeared stable enough to utilize for
CI purposes.
** Kernel log from Ubuntu Jammy Package **
[229798.876726] dnsmasq[81586]: segfault at 7c28 ip 00007f6e8313147e sp 00007fffb3d6f830 error 4 in libc.so.6[7f6e830b4000+195000]
[229798.876745] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[229805.444912] dnsmasq[401428]: segfault at dce8 ip 00007fe63bf6a47e sp 00007ffdb105b440 error 4 in libc.so.6[7fe63beed000+195000]
[229805.444933] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[230414.213448] dnsmasq[401538]: segfault at 78b8 ip 00007f12160e447e sp 00007ffed6ef2190 error 4 in libc.so.6[7f1216067000+195000]
[230414.213467] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[230465.098989] dnsmasq[402665]: segfault at c378 ip 00007f81458f047e sp 00007fff0db334a0 error 4 in libc.so.6[7f8145873000+195000]
[230465.099005] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[231787.247374] dnsmasq[402863]: segfault at 7318 ip 00007f3940b9147e sp 00007ffc8df4f010 error 4 in libc.so.6[7f3940b14000+195000]
[231787.247392] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[231844.886399] dnsmasq[405182]: segfault at dc58 ip 00007f32a29e147e sp 00007ffddedd7480 error 4 in libc.so.6[7f32a2964000+195000]
[231844.886420] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[234692.482154] dnsmasq[405289]: segfault at 67d8 ip 00007fab0c5c447e sp 00007fffd6fd8fa0 error 4 in libc.so.6[7fab0c547000+195000]
[234692.482173] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
** Kernel log entries from Ubuntu Lunar package **
[234724.842339] dnsmasq[409843]: segfault at fffffffffffffffd ip 00007f35a147647e sp 00007ffd536038c0 error 5 in libc.so.6[7f35a13f9000+195000]
[234724.842368] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[234784.918116] dnsmasq[410019]: segfault at fffffffffffffffd ip 00007f634233947e sp 00007fff33877f20 error 5 in libc.so.6[7f63422bc000+195000]
[234784.918133] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[235022.163339] dnsmasq[410151]: segfault at fffffffffffffffd ip 00007f21dd37f47e sp 00007fff9bf416d0 error 5 in libc.so.6[7f21dd302000+195000]
[235022.163362] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[235024.831325] dnsmasq[410445]: segfault at fffffffffffffffd ip 00007f7edf02147e sp 00007ffc4fb19cd0 error 5 in libc.so.6[7f7edefa4000+195000]
[235024.831354] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[236052.793683] dnsmasq[410630]: segfault at fffffffffffffffd ip 00007f3046ca147e sp 00007ffe5583df50 error 5 in libc.so.6[7f3046c24000+195000]
[236052.793704] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[236105.451351] dnsmasq[412107]: segfault at fffffffffffffffd ip 00007f4425bcd47e sp 00007fffd5337560 error 5 in libc.so.6[7f4425b50000+195000]
[236105.451368] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
** The command line the process is launched with **
dnsmasq --no-hosts --pid-
file=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/pid
--dhcp-
hostsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host
--addn-
hosts=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/addn_hosts
--dhcp-
optsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/opts
--dhcp-
leasefile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/leases
--dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-
service--bind-dynamic --dhcp-
range=set:subnet-3c1445e7-6f7d-4e62-997f-627bc53da72c,10.1.0.0,static,255.255.255.192,86400s
--dhcp-option-force=option:mtu,1380 --dhcp-lease-max=64 --conf-
file=/dev/null --domain=openstacklocal
** Neutron Logging **
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.dhcp.agent [-] neutron.agent.dhcp.agent.DhcpAgentWithStateReport method _port_delete called with arguments ({'port_id': 'bdeaa43c-687c-4e60-a24e-3725d6353828', 'network_id': 'c1ca059e-350d-4d78-9330-600f7315c380', 'fixed_ips': [{'subnet_id': '3c1445e7-6f7d-4e62-997f-627bc53da72c', 'ip_address': '10.1.0.14'}, {'subnet_id': '54bc71f6-bff5-417d-9e4b-1f5f58ed6318', 'ip_address': 'fdd9:92b1:9e2c:0:5054:ff:fe44:5c9f'}], 'priority': 6},) {} {{(pid=60941) wrapper /usr/local/lib/python3.10/dist-packages/oslo_log/helpers.py:65}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.dhcp.agent [-] Calling driver for network: c1ca059e-350d-4d78-9330-600f7315c380/seg=None action: reload_allocations {{(pid=60941) _call_driver /opt/stack/neutron/neutron/agent/dhcp/agent.py:246}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:384}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] CMD "ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f" returned: 0 in 0.011s {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:422}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[8a4f2794-3b63-4f8d-9604-53dd6a4a868c]: (4, ('', '')) {{(pid=78114) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f 01:52:54:00:44:5c:9f {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:384}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] CMD "ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f 01:52:54:00:44:5c:9f" returned: 0 in 0.011s {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:422}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[33a91aed-bc58-48dd-b673-d4a4d5da54f6]: (4, ('', '')) {{(pid=78114) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.dhcp [-] Building host file: /opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host {{(pid=60941) _output_hosts_file /opt/stack/neutron/neutron/agent/linux/dhcp.py:956}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.utils [-] Running command: ['env', 'LC_ALL=C', 'PATH=/sbin:/usr/sbin', 'dnsmasq', '--test', '--dhcp-host=tag:foo'] {{(pid=60941) create_process /opt/stack/neutron/neutron/agent/linux/utils.py:84}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.dhcp [-] Done building host file /opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host {{(pid=60941) _output_hosts_file /opt/stack/neutron/neutron/agent/linux/dhcp.py:997}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[f3dd1224-fe8c-4fb0-8113-699e779df64e]: (4, ('', '', 0)) {{(pid=62248) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.lockutils [-] Acquiring lock "_check_child_processes" by "neutron.agent.linux.external_process.ProcessMonitor._check_child_processes" {{(pid=60941) inner /usr/local/lib/python3.10/dist-packages/oslo_concurrency/lockutils.py:404}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" acquired by "neutron.agent.linux.external_process.ProcessMonitor._check_child_processes" :: waited 0.001s {{(pid=60941) inner /usr/local/lib/python3.10/dist-packages/oslo_concurrency/lockutils.py:409}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid c1ca059e-350d-4d78-9330-600f7315c380 not found. The process should not have died
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: WARNING neutron.agent.linux.external_process [-] Respawning dnsmasq for uuid c1ca059e-350d-4d78-9330-600f7315c380
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-c1ca059e-350d-4d78-9330-600f7315c380', 'env', 'PROCESS_TAG=dnsmasq-c1ca059e-350d-4d78-9330-600f7315c380', 'dnsmasq', '--no-hosts', '', '--pid-file=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/pid', '--dhcp-hostsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host', '--addn-hosts=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/addn_hosts', '--dhcp-optsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/opts', '--dhcp-leasefile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/leases', '--dhcp-match=set:ipxe,175', '--dhcp-userclass=set:ipxe6,iPXE', '--local-service', '--bind-dynamic', '--dhcp-range=set:subnet-3c1445e7-6f7d-4e62-997f-627bc53da72c,10.1.0.0,static,255.255.255.192,86400s', '--dhcp-option-force=option:mtu,1380', '--dhcp-lease-max=64', '--conf-file=/dev/null', '--domain=openstacklocal'] {{(pid=60941) execute_rootwrap_daemon /opt/stack/neutron/neutron/agent/linux/utils.py:108}}
We don't believe this is a neutron bug, at least outright, but suspect
neutron is also likely encountering this issue as well, at least with
any sort of exhaustive test jobs. Most of Ironic's one job tests would
pass with this dnsmasq, it was only where we continually ran new test
scenarios that we would see this issue crop up and cause failures.
In the mean time, the ironic project will likely downgrade dnsmasq to
unblock it's CI.
[0]: https://lists.thekelleys.org.uk/pipermail/dnsmasq-
discuss/2022q3/016562.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/2026757/+subscriptions
References