← Back to team overview

cloud-init-dev team mailing list archive

Re: [Merge] ~raharper/cloud-init:feature/cloud-init-hotplug-handler into cloud-init:master

 

Thanks for the review.  I've pushed some of your changes already.  Doing another round now.

Diff comments:

> diff --git a/cloudinit/cmd/devel/hotplug_hook.py b/cloudinit/cmd/devel/hotplug_hook.py
> new file mode 100644
> index 0000000..c24b1ff
> --- /dev/null
> +++ b/cloudinit/cmd/devel/hotplug_hook.py
> @@ -0,0 +1,195 @@
> +# This file is part of cloud-init. See LICENSE file for license information.
> +
> +"""Handle reconfiguration on hotplug events"""
> +import argparse
> +import os
> +import sys
> +
> +from cloudinit.event import EventType
> +from cloudinit import log
> +from cloudinit import reporting
> +from cloudinit.reporting import events
> +from cloudinit import sources
> +from cloudinit.stages import Init
> +from cloudinit.net import read_sys_net_safe
> +from cloudinit.net.network_state import parse_net_config_data
> +
> +
> +LOG = log.getLogger(__name__)
> +NAME = 'hotplug-hook'
> +
> +
> +def get_parser(parser=None):
> +    """Build or extend and arg parser for hotplug-hook utility.
> +
> +    @param parser: Optional existing ArgumentParser instance representing the
> +        subcommand which will be extended to support the args of this utility.
> +
> +    @returns: ArgumentParser with proper argument configuration.
> +    """
> +    if not parser:
> +        parser = argparse.ArgumentParser(prog=NAME, description=__doc__)
> +
> +    parser.add_argument("-d", "--devpath",
> +                        metavar="PATH",
> +                        help="sysfs path to hotplugged device")
> +    parser.add_argument("--hotplug-debug", action='store_true',
> +                        help='enable debug logging to stderr.')
> +    parser.add_argument("-s", "--subsystem",
> +                        choices=['net', 'block'])
> +    parser.add_argument("-u", "--udevaction",
> +                        choices=['add', 'change', 'remove'])
> +
> +    return parser
> +
> +
> +def log_console(msg):

I'd love to but I'm not sure how.  What I want is:

1) hotplug command messages from this code to go the the cloud-init-hotplugd.service unit which is running this code, but I don't want to see all of the cloudinit.* LOG messages, like the util.py* ones
2) /var/log/cloud-init.log should get the LOG messages from hotplug command, and include verbose logging as per-usual.

That's exactly what I get with this.  I'm happy to change but I've not figured out how to configure that via cloudinit.log module configuration.

> +    """Log messages to stderr console and configured logging."""
> +    sys.stderr.write(msg + '\n')
> +    sys.stderr.flush()
> +    LOG.debug(msg)
> +
> +
> +def devpath_to_macaddr(devpath):
> +    macaddr = read_sys_net_safe(os.path.basename(devpath), 'address')
> +    log_console('Checking if %s in netconfig' % macaddr)

+1, left-over from a previous iteration.

> +    return macaddr
> +
> +
> +def in_netconfig(unique_id, netconfig):
> +    netstate = parse_net_config_data(netconfig)
> +    found = [iface
> +             for iface in netstate.iter_interfaces()
> +             if iface.get('mac_address') == unique_id]
> +    log_console('Ifaces with ID=%s : %s' % (unique_id, found))
> +    return len(found) > 0
> +
> +
> +class UeventHandler(object):
> +    def __init__(self, ds, devpath, success_fn):
> +        self.datasource = ds
> +        self.devpath = devpath
> +        self.success_fn = success_fn
> +
> +    def apply(self):

+1

> +        raise NotImplemented()
> +
> +    @property
> +    def config(self):
> +        raise NotImplemented()
> +
> +    def detect(self, action):
> +        raise NotImplemented()
> +
> +    def success(self):
> +        return self.success_fn()
> +
> +    def update(self):
> +        self.datasource.update_metadata([EventType.UDEV])

The datasource itself reports if an update occurred.  I do need to return the value here;  However, we need to retry if the successful update didn't have new metadata (think racing the hotplug event with the metadata service update).  I'll update to return the result.  If update_metadata is False, or we don't detect the device in the update, we'll go down the retry path.

> +
> +
> +class NetHandler(UeventHandler):
> +    def __init__(self, ds, devpath, success_fn):
> +        super(NetHandler, self).__init__(ds, devpath, success_fn)
> +        self.id = devpath_to_macaddr(self.devpath)
> +
> +    def apply(self):
> +        return self.datasource.distro.apply_network_config(self.config,
> +                                                           bring_up=True)
> +
> +    @property
> +    def config(self):
> +        return self.datasource.network_config
> +
> +    def detect(self, action):
> +        detect_presence = None
> +        if action == 'add':
> +            detect_presence = True
> +        elif action == 'remove':
> +            detect_presence = False
> +        else:
> +            raise ValueError('Cannot detect unknown action: %s' % action)
> +
> +        return detect_presence == in_netconfig(self.id, self.config)
> +
> +
> +UEVENT_HANDLERS = {
> +    'net': NetHandler,
> +}
> +
> +SUBSYSTEM_TO_EVENT = {
> +    'net': 'network',
> +    'block': 'storage',
> +}
> +
> +
> +def handle_args(name, args):
> +    log_console('%s called with args=%s' % (NAME, args))
> +    hotplug_reporter = events.ReportEventStack(NAME, __doc__,
> +                                               reporting_enabled=True)
> +    with hotplug_reporter:
> +        # only handling net udev events for now
> +        event_handler_cls = UEVENT_HANDLERS.get(args.subsystem)
> +        if not event_handler_cls:
> +            log_console('hotplug-hook: cannot handle events for subsystem: '
> +                        '"%s"' % args.subsystem)
> +            return 1
> +
> +        log_console('Reading cloud-init configation')
> +        hotplug_init = Init(ds_deps=[], reporter=hotplug_reporter)
> +        hotplug_init.read_cfg()
> +
> +        log_console('Configuring logging')
> +        log.setupLogging(hotplug_init.cfg)
> +        if 'reporting' in hotplug_init.cfg:
> +            reporting.update_configuration(hotplug_init.cfg.get('reporting'))
> +
> +        log_console('Fetching datasource')
> +        try:
> +            ds = hotplug_init.fetch(existing="trust")
> +        except sources.DatasourceNotFoundException:
> +            log_console('No Ds found')
> +            return 1
> +
> +        subevent = SUBSYSTEM_TO_EVENT.get(args.subsystem)
> +        if hotplug_init.update_event_allowed(EventType.UDEV, scope=subevent):
> +            log_console('cloud-init not configured to handle udev events')

+1

> +            return
> +
> +        log_console('Creating %s event handler' % args.subsystem)
> +        event_handler = event_handler_cls(ds, args.devpath,
> +                                          hotplug_init._write_to_cache)
> +        retries = [1, 1, 1, 3, 5]
> +        for attempt, wait in enumerate(retries):
> +            log_console('subsystem=%s update attempt %s/%s' % (args.subsystem,
> +                                                               attempt,
> +                                                               len(retries)))
> +            try:
> +                log_console('Refreshing metadata')
> +                event_handler.update()
> +                if event_handler.detect(action=args.udevaction):
> +                    log_console('Detected update, apply config change')
> +                    event_handler.apply()
> +                    log_console('Updating cache')
> +                    event_handler.success()
> +                    break
> +                else:
> +                    raise Exception(

+1

> +                            "Failed to detect device change in metadata")
> +
> +            except Exception as e:
> +                if attempt + 1 >= len(retries):
> +                    raise
> +                log_console('exception while processing hotplug event. %s' % e)
> +
> +        log_console('exiting handler')
> +        reporting.flush_events()
> +
> +
> +if __name__ == '__main__':
> +    if 'TZ' not in os.environ:

+1

> +        os.environ['TZ'] = ":/etc/localtime"
> +    args = get_parser().parse_args()
> +    handle_args(NAME, args)
> +
> +# vi: ts=4 expandtab
> diff --git a/cloudinit/distros/debian.py b/cloudinit/distros/debian.py
> index d517fb8..4f1e6a9 100644
> --- a/cloudinit/distros/debian.py
> +++ b/cloudinit/distros/debian.py
> @@ -114,14 +114,23 @@ class Distro(distros.Distro):
>          return self._supported_write_network_config(netconfig)
>  
>      def _bring_up_interfaces(self, device_names):
> -        use_all = False
> -        for d in device_names:
> -            if d == 'all':
> -                use_all = True
> -        if use_all:
> -            return distros.Distro._bring_up_interface(self, '--all')
> +        render_name = self.net_renderer.name
> +        if render_name == 'eni':
> +            LOG.debug('Bringing up interfaces with eni/ifup')
> +            use_all = False
> +            for d in device_names:
> +                if d == 'all':
> +                    use_all = True
> +            if use_all:
> +                return distros.Distro._bring_up_interface(self, '--all')
> +            else:
> +                return distros.Distro._bring_up_interfaces(self, device_names)
> +        elif render_name == 'netplan':
> +            LOG.debug('Bringing up interfaces with netplan apply')

Yes, I think we could push this bit into renderer.bring_up_interfaces() or renderer.apply()?

That would leave the existing Distro._bring_up_interfaces() bit around but we could drop that later.

> +            util.subp(['netplan', 'apply'])
>          else:
> -            return distros.Distro._bring_up_interfaces(self, device_names)
> +            LOG.warning('Cannot bring up interfaces, unknown renderer: "%s"',
> +                        render_name)
>  
>      def _write_hostname(self, your_hostname, out_fn):
>          conf = None
> diff --git a/cloudinit/event.py b/cloudinit/event.py
> index f7b311f..77ce631 100644
> --- a/cloudinit/event.py
> +++ b/cloudinit/event.py
> @@ -2,16 +2,68 @@
>  
>  """Classes and functions related to event handling."""
>  
> +from cloudinit import log as logging
> +from cloudinit import util
> +
> +
> +LOG = logging.getLogger(__name__)
> +
>  
>  # Event types which can generate maintenance requests for cloud-init.
>  class EventType(object):
>      BOOT = "System boot"
>      BOOT_NEW_INSTANCE = "New instance first boot"
> +    UDEV = "Udev add|change event on net|storage"
>  
>      # TODO: Cloud-init will grow support for the follow event types:
> -    # UDEV
>      # METADATA_CHANGE
>      # USER_REQUEST
>  
> +EventTypeMap = {
> +    'boot': EventType.BOOT,
> +    'boot-new-instance': EventType.BOOT_NEW_INSTANCE,
> +    'udev': EventType.UDEV,

I don't feel strongly but it really is a udev event; though we have other places calling it hotplug so I should be consistent.

> +}
> +
> +# inverted mapping
> +EventNameMap = {v: k for k, v in EventTypeMap.items()}
> +
> +
> +def get_allowed_events(sys_events, ds_events):
> +    '''Merge datasource capabilties with system config to determine which
> +       update events are allowed.'''
> +
> +    # updates:
> +    #   policy-version: 1
> +    #   network:
> +    #     when: [boot-new-instance, boot, udev]
> +    #   storage:
> +    #     when: [boot-new-instance, udev]
> +    #     watch: http://169.254.169.254/metadata/storage_config/
> +
> +    LOG.debug('updates: system   cfg: %s', sys_events)
> +    LOG.debug('updates: datasrc caps: %s', ds_events)
> +
> +    updates = util.mergemanydict([sys_events, ds_events])
> +    LOG.debug('updates: merged  cfg: %s', updates)
> +
> +    events = {}
> +    for etype in ['network', 'storage']:
> +        events[etype] = (
> +            set([EventTypeMap.get(evt)
> +                 for evt in updates.get(etype, {}).get('when', [])
> +                 if evt in EventTypeMap]))
> +
> +    LOG.debug('updates: allowed events: %s', events)
> +    return events
> +
> +
> +def get_update_events_config(update_events):
> +    '''Return a dictionary of updates config'''
> +    evt_cfg = {'policy-version': 1}
> +    for scope, events in update_events.items():
> +        evt_cfg[scope] = {'when': [EventNameMap[evt] for evt in events]}
> +
> +    return evt_cfg
>  
>  # vi: ts=4 expandtab
> diff --git a/systemd/cloud-init-hotplugd.service b/systemd/cloud-init-hotplugd.service
> new file mode 100644
> index 0000000..6f231cd
> --- /dev/null
> +++ b/systemd/cloud-init-hotplugd.service
> @@ -0,0 +1,11 @@

The @ implies it's a template unit; which this is not.  lxd for example uses socket activation but does not take in a "socket" as the argument.  Services using the @ notation for socket activation make use of %i/%I to have systemd inject the value into the service Exec command.

https://www.freedesktop.org/software/systemd/man/systemd.socket.html

"Example: a socket file foo.socket needs a matching service foo.service if Accept=false is set. If Accept=true is set, a service template file foo@.service must exist from which services are instantiated for each incoming connection."

"Accept=
Takes a boolean argument. If true, a service instance is spawned for each incoming connection and only the connection socket is passed to it. If false, all listening sockets themselves are passed to the started service unit, and only one service unit is spawned for all connections (also see above). This value is ignored for datagram sockets and FIFOs where a single service unit unconditionally handles all incoming traffic. "

Here what a foo@.service invoked from a socket unit looks and makes use of %i

https://gist.github.com/drmalex07/28de61c95b8ba7e5017c

> +[Unit]
> +Description=cloud-init hotplug hook daemon
> +After=cloud-init-hotplugd.socket
> +
> +[Service]
> +Type=simple
> +ExecStart=/bin/bash -c 'read args <&3; echo "args=$args"; \
> +                        exec /usr/bin/cloud-init devel hotplug-hook $args; \
> +                        exit 0'
> +SyslogIdentifier=cloud-init-hotplugd
> +TimeoutStopSec=5
> diff --git a/tools/hook-hotplug b/tools/hook-hotplug
> new file mode 100755
> index 0000000..697d3ad
> --- /dev/null
> +++ b/tools/hook-hotplug
> @@ -0,0 +1,26 @@
> +#!/bin/bash
> +# This file is part of cloud-init. See LICENSE file for license information.
> +
> +# This script checks if cloud-init has hotplug hooked and if
> +# cloud-init has finished; if so invoke cloud-init hotplug-hook
> +
> +is_finished() {
> +    [ -e /run/cloud-init/result.json ] || return 1

+1; this is in tools/hook-dhclient this as well (which I duplicated for this hook).

> +}
> +
> +if is_finished; then
> +    # only hook pci devices at this time
> +    case ${DEVPATH} in
> +        /devices/pci*)
> +            # open cloud-init's hotplug-hook fifo rw
> +            exec 3<>/run/cloud-init/hook-hotplug-cmd

Yes, but read is needed to indicate the other side of the FIFO you need to open the socket.

> +            env_params=( \

Something didn't like it missing;  but i'll drop.

> +                --devpath=${DEVPATH}

They'll all be escaped because they're sysfs paths (not /dev).  I'll quote anyhow.

> +                --subsystem=${SUBSYSTEM}
> +                --udevaction=${ACTION}
> +            )
> +            # write params to cloud-init's hotplug-hook fifo
> +            echo "--hotplug-debug ${env_params[@]}" >&3
> +            ;;
> +    esac
> +fi


-- 
https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/356152
Your team cloud-init commiters is requested to review the proposed merge of ~raharper/cloud-init:feature/cloud-init-hotplug-handler into cloud-init:master.


References