cloud-init-dev team mailing list archive

Thread
Date
Re: [Merge] ~chad.smith/cloud-init:analyze into cloud-init:master

To: mp+328819@xxxxxxxxxxxxxxxxxx
From: Ryan Harper <ryan.harper@xxxxxxxxxxxxx>
Date: Thu, 10 Aug 2017 14:28:47 -0000
In-reply-to: <150233366411.1726.4099977047109734728.launchpad@ackee.canonical.com>
Reply-to: mp+328819@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
Thanks for getting this started.

Diff comments:

> diff --git a/cloudinit/analyze/README.md b/cloudinit/analyze/README.md
> new file mode 100644
> index 0000000..e496d74
> --- /dev/null
> +++ b/cloudinit/analyze/README.md

We'll want this in docs/rtd

> @@ -0,0 +1,167 @@
> +Getting Started
> +===============
> +
> +You'll need a working LXD setup.  If you can lxc launch a machine then you're
> +good to go.
> +
> +1. lxc init ubuntu-daily:yakkety y1

Use xenial here

> +
> +# for now, we need to adjust cloud-init's logging to emit messages in
> +# a more useful format for parsing.
> +#
> +# NOTE: you need mount-image-callback
> +# from https://code.launchpad.net/~smoser/cloud-utils/trunk.mic-lxd
> +2. sudo ./bin/inject-payload.sh lxd:y1
> +
> +# now you can start it
> +3. lxc start y1
> +

We can remove this inject section as it's being integrated now

> +# now you can analyze it:
> +# % bin/cloudinit-analyze --help
> +# usage: cloudinit analyze <subcommand> [<args>]
> +#
> +# Subcommands:
> +#    blame     Print list of executed stages ordered by time to init
> +#    dump      Print an in-order list of events in JSON format
> +#    show      Print an in-order list of stages with execution times
> +#
> +# Analyze cloud-init logs and data
> +#
> +# positional arguments:
> +#   subcommand     subcommand to run
> +# 
> +# optional arguments:
> +#   -h, --help  show this help message and exit
> +
> +# Print boot events, sections ran, unit time and total time, per boot
> +4. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit analyze show -i -

We'll want to revert this to something like:
lxc file pull y1/var/log/cloud-init.log - | cloud-init analyze show -i -

> +
> +
> +# % $cmd | tail -n 20
> +# |`->config-byobu previously ran @01.48600s +00.00000s
> +# Finished stage: (modules-config) 00.03300 seconds 
> +# 
> +# Starting stage: modules-final
> +# |`->config-rightscale_userdata previously ran @01.92300s +00.00000s
> +# |`->config-scripts-vendor previously ran @01.92300s +00.00100s
> +# |`->config-scripts-per-once previously ran @01.92400s +00.00000s
> +# |`->config-scripts-per-boot ran successfully @01.92400s +00.00100s
> +# |`->config-scripts-per-instance previously ran @01.92500s +00.00000s
> +# |`->config-scripts-user previously ran @01.92500s +00.00100s
> +# |`->config-ssh-authkey-fingerprints previously ran @01.92600s +00.00000s
> +# |`->config-keys-to-console previously ran @01.92600s +00.00000s
> +# |`->config-phone-home previously ran @01.92700s +00.00000s
> +# |`->config-final-message ran successfully @01.92700s +00.00600s
> +# |`->config-power-state-change previously ran @01.93300s +00.00000s
> +# Finished stage: (modules-final) 00.01800 seconds 
> +# 
> +# Total Time: 0.25700 seconds
> +# 
> +# 2 boot records analyzed
> +
> +
> +# dump json event stream parsted from cloud-init.log
> +5. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze dump -i -
> +
> +#  {
> +#   "description": "config-power-state-change previously ran",
> +#   "event_type": "finish",
> +#   "name": "modules-final/config-power-state-change",
> +#   "origin": "cloudinit",
> +#   "result": "SUCCESS",
> +#   "timestamp": 1473722226.711
> +#  },
> +#  {
> +#   "description": "running modules for final",
> +#   "event_type": "finish",
> +#   "name": "modules-final",
> +#   "origin": "cloudinit",
> +#   "result": "SUCCESS",
> +#   "timestamp": 1473722226.713
> +#  }
> +# ]
> +
> +
> +# sort events by time length (blame!)
> +6. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze blame -i -
> +
> +# -- Boot Record 01 --
> +#      00.68700s (modules-config/config-locale)
> +#      00.16800s (init-network/config-ssh)
> +#      00.14600s (modules-config/config-grub-dpkg)
> +#      00.07600s (modules-config/config-apt-configure)
> +#      00.01900s (modules-final/config-keys-to-console)
> +#      00.01900s (init-local/search-NoCloud)
> +#      00.01500s (init-network/config-users-groups)
> +#      00.00500s (modules-final/config-final-message)
> +#      00.00500s (init-network/consume-vendor-data)
> +#      00.00500s (init-network/config-growpart)
> +#      00.00400s (init-network/config-set_hostname)
> +#      00.00400s (init-network/config-resizefs)
> +#      00.00300s (init-network/consume-user-data)
> +#      00.00200s (modules-final/config-ssh-authkey-fingerprints)
> +#      00.00200s (modules-final/config-scripts-per-instance)
> +#      00.00200s (modules-final/config-phone-home)
> +#      00.00200s (modules-config/config-snappy)
> +#      00.00200s (modules-config/confg-mounts)
> +#      00.00200s (modules-config/config-fan)
> +
> +
> +Testing with KVM
> +================
> +
> +1. Download the current cloud image
> +
> +% https://cloud-images.ubuntu.com/daily/server/yakkety/current/yakkety-server-cloudimg-amd64.img
> +
> +2. Create snapshot image to preserve the original cloud-image.
> +
> +% qemu-img create -b yakkety-server-cloudimg-amd64.img -f qcow2 test-cloudinit.qcow2
> +
> +3. Inject the cloud-init modifications.  Note, this takes longer as it installs
> +profiling packages.
> +
> +% ./bin/inject-payload.sh ./test-cloudinit.qcow2
> +
> +4. Create a seed image using `cloud-localds`
> +

Drop the integration/injection stuff

> +% cat my-user-data
> +#cloud-config
> +password: passw0rd
> +chpasswd: { expire: False }
> +ssh_pwauth: True
> +ssh_import_id: <gh:userid or lp:userid>
> +% echo "instance-id: $(uuidgen || echo i-abcdefg)" > my-meta-data
> +% cloud-localds my-seed.img my-user-data my-meta-data
> +
> +5. Launch your modifed VM
> +
> +% kvm -m 512 -net nic -net user -redir tcp:2222::22 \
> +   -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \
> +   -drive file=my-seed.img,if=virtio,format=raw
> +
> +6. Analyze the boot (blame, dump, show)
> +
> +% ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | ./bin/./bin/cloudinit-analyze blame -i
> +
> +7. Profiling Data is present in /var/lib/cloud/data/cloud-init-<TIMESTAMP>-<PROFILER>
> +
> +On KVM and other VM-based images, with profling enabled, blktrace IO and perf
> +cpu stacks are recorded for later data analysis.
> +
> +# IO tracing on the cloud-image disk and seed device
> +% ssh -p 2222 ubuntu@localhost
> +% cd /var/lib/cloud-init/data/cloud-init-*-blktrace
> +% sudo zcat vda1.blktrace.* | blkparse -i - -s
> +
> +# or for btt data
> +sudo gunzip *blktrace*
> +sudo blkparse -i vda1 -d bp.vda1.bin
> +sudo btt -A -i bp.vda1.bin | sudo tee bp.vda1.txt
> +
> +
> +# Profiling of all CPUs during cloud-init run
> +% ssh -p 2222 ubuntu@localhost
> +% cd /var/lib/cloud-init/data/cloud-init-*-perf
> +% sudo gunzip perf.data.gz
> +% sudo perf report --stdio
> diff --git a/cloudinit/analyze/show.py b/cloudinit/analyze/show.py
> new file mode 100644
> index 0000000..6b4c10c
> --- /dev/null
> +++ b/cloudinit/analyze/show.py
> @@ -0,0 +1,241 @@
> +#!/usr/bin/env python3

Probablu should drop this

> +#   Copyright (C) 2016 Canonical Ltd.
> +#
> +#   Author: Ryan Harper <ryan.harper@xxxxxxxxxxxxx>
> +#
> +# This file is part of cloud-init. See LICENSE file for license information.
> +import argparse
> +import base64
> +import datetime
> +import json
> +import os
> +import sys
> +
> +#  An event:
> +'''
> +{
> +        "description": "executing late commands",
> +        "event_type": "start",
> +        "level": "INFO",
> +        "name": "cmd-install/stage-late"
> +        "origin": "cloudinit",
> +        "timestamp": 1461164249.1590767,
> +},
> +
> +    {
> +        "description": "executing late commands",
> +        "event_type": "finish",
> +        "level": "INFO",
> +        "name": "cmd-install/stage-late",
> +        "origin": "cloudinit",
> +        "result": "SUCCESS",
> +        "timestamp": 1461164249.1590767
> +    }
> +
> +'''
> +format_key = {
> +    '%d': 'delta',
> +    '%D': 'description',
> +    '%E': 'elapsed',
> +    '%e': 'event_type',
> +    '%I': 'indent',
> +    '%l': 'level',
> +    '%n': 'name',
> +    '%o': 'origin',
> +    '%r': 'result',
> +    '%t': 'timestamp',
> +    '%T': 'total_time',
> +}
> +
> +formatting_help = " ".join(["{}: {}".format(k.replace('%', '%%'), v)
> +                           for k, v in format_key.items()])
> +
> +
> +def format_record(msg, event):
> +    for i, j in format_key.items():
> +        if i in msg:
> +            # ensure consistent formatting of time values
> +            if j in ['delta', 'elapsed', 'timestamp']:
> +                msg = msg.replace(i, "{%s:08.5f}" % j)
> +            else:
> +                msg = msg.replace(i, "{%s}" % j)
> +    return msg.format(**event)
> +
> +
> +def dump_event_files(event):
> +    content = {k: v for k, v in event.items() if k not in ['content']}
> +    files = content['files']
> +    saved = []
> +    for f in files:
> +        fname = f['path']
> +        fn_local = os.path.basename(fname)
> +        fcontent = base64.b64decode(f['content']).decode('ascii')

Maybe we can use cloudinit.util.load_file

> +        with open(fn_local, 'w') as fh:
> +            fh.write(fcontent)

and util.write_file

> +        saved.append(fn_local)
> +
> +    return saved
> +
> +
> +def event_name(event):
> +    if event:
> +        return event.get('name')
> +    return None
> +
> +
> +def event_type(event):
> +    if event:
> +        return event.get('event_type')
> +    return None
> +
> +
> +def event_parent(event):
> +    if event:
> +        return event_name(event).split("/")[0]
> +    return None
> +
> +
> +def event_timestamp(event):
> +    return float(event.get('timestamp'))
> +
> +
> +def event_datetime(event):
> +    return datetime.datetime.utcfromtimestamp(event_timestamp(event))
> +
> +
> +def delta_seconds(t1, t2):
> +    return (t2 - t1).total_seconds()
> +
> +
> +def event_duration(start, finish):
> +    return delta_seconds(event_datetime(start), event_datetime(finish))
> +
> +
> +def event_record(start_time, start, finish):
> +    record = finish.copy()
> +    record.update({
> +        'delta': event_duration(start, finish),
> +        'elapsed': delta_seconds(start_time, event_datetime(start)),
> +        'indent': '|' + ' ' * (event_name(start).count('/') - 1) + '`->',
> +    })
> +
> +    return record
> +
> +
> +def total_time_record(total_time):
> +    return 'Total Time: %3.5f seconds\n' % total_time
> +
> +
> +def generate_records(events, blame_sort=False,
> +                     print_format="(%n) %d seconds in %I%D",
> +                     dump_files=False, log_datafiles=False):
> +
> +    sorted_events = sorted(events, key=lambda x: x['timestamp'])
> +    records = []
> +    start_time = None
> +    total_time = 0.0
> +    stage_start_time = {}
> +    stages_seen = []
> +    boot_records = []
> +
> +    unprocessed = []
> +    for e in range(0, len(sorted_events)):
> +        event = events[e]
> +        try:
> +            next_evt = events[e + 1]
> +        except IndexError:
> +            next_evt = None
> +
> +        if event_type(event) == 'start':
> +            if event.get('name') in stages_seen:
> +                records.append(total_time_record(total_time))
> +                boot_records.append(records)
> +                records = []
> +                start_time = None
> +                total_time = 0.0
> +
> +            if start_time is None:
> +                stages_seen = []
> +                start_time = event_datetime(event)
> +                stage_start_time[event_parent(event)] = start_time
> +
> +            # see if we have a pair
> +            if event_name(event) == event_name(next_evt):
> +                if event_type(next_evt) == 'finish':
> +                    records.append(format_record(print_format,
> +                                                 event_record(start_time,
> +                                                              event,
> +                                                              next_evt)))
> +            else:
> +                # This is a parent event
> +                records.append("Starting stage: %s" % event.get('name'))
> +                unprocessed.append(event)
> +                stages_seen.append(event.get('name'))
> +                continue
> +        else:
> +            prev_evt = unprocessed.pop()
> +            if event_name(event) == event_name(prev_evt):
> +                record = event_record(start_time, prev_evt, event)
> +                records.append(format_record("Finished stage: "
> +                                             "(%n) %d seconds ",
> +                                             record) + "\n")
> +                total_time += record.get('delta')
> +            else:
> +                # not a match, put it back
> +                unprocessed.append(prev_evt)
> +
> +    records.append(total_time_record(total_time))
> +    boot_records.append(records)
> +    return boot_records
> +
> +
> +def show_events(events, print_format):
> +    return generate_records(events, print_format=print_format)
> +
> +
> +def load_events(infile, rawdata=None):
> +    if rawdata:
> +        data = rawdata.read()
> +    else:
> +        data = infile.read()
> +
> +    j = None
> +    try:
> +        j = json.loads(data)
> +    except json.JSONDecodeError:
> +        pass
> +
> +    return j, data
> +
> +
> +def main():
> +    parser = argparse.ArgumentParser(
> +        description=('cloudinit-print-log - '

Not sure if we need to drop or modify this.

> +                     'pretty print and sort cloudinit logs'),
> +        prog='cloudinit-print-log')
> +    parser.add_argument('--format', action='store',
> +                        dest='print_format',
> +                        default='%I (%n) %D @%Es +%ds',
> +                        help='specify formatting of output. ' +
> +                        formatting_help)
> +    parser.add_argument('infile', nargs='?', type=argparse.FileType('r'),
> +                        help='Path to log to parse. Use - for stdin')
> +
> +    opts = parser.parse_args(sys.argv[1:])
> +    if not opts.infile:
> +        parser.print_help()
> +        sys.exit(1)
> +
> +    j, _ = load_events(opts.infile)
> +
> +    records = show_events(j, opts.print_format)
> +    for record in records:
> +        print('The total time elapsed since completing an event is printed'
> +              ' after the "@" character.')
> +        print('The time the event takes is printed after the "+" character.')
> +        print()
> +        print("\n".join(record))
> +
> +
> +if __name__ == '__main__':
> +    main()
> diff --git a/cloudinit/cmd/main.py b/cloudinit/cmd/main.py
> index 139e03b..8e02627 100644
> --- a/cloudinit/cmd/main.py
> +++ b/cloudinit/cmd/main.py
> @@ -72,18 +74,18 @@ LOG = logging.getLogger()
>  # and we still want to print exceptions...
>  def print_exc(msg=''):
>      if msg:
> -        sys.stderr.write("%s\n" % (msg))
> +        sys.stderr.write('%s\n' % (msg))
>      sys.stderr.write('-' * 60)
> -    sys.stderr.write("\n")
> +    sys.stderr.write('\n')
>      traceback.print_exc(file=sys.stderr)
>      sys.stderr.write('-' * 60)
> -    sys.stderr.write("\n")
> +    sys.stderr.write('\n')

flag day for " -> ' ?

>  
>  
>  def welcome(action, msg=None):
>      if not msg:
>          msg = welcome_format(action)
> -    util.multi_log("%s\n" % (msg),
> +    util.multi_log('%s\n' % (msg),
>                     console=False, stderr=True, log=LOG)
>      return msg
>  
> @@ -713,83 +715,74 @@ def main(sysv_args=None):
>                          default=False)
>  
>      parser.set_defaults(reporter=None)
> -    subparsers = parser.add_subparsers()
> +    subparsers = parser.add_subparsers(title='Subcommands', dest='subcommand')
> +    subparsers.required = True
>  
>      # Each action and its sub-options (if any)
> -    parser_init = subparsers.add_parser('init',
> -                                        help=('initializes cloud-init and'
> -                                              ' performs initial modules'))
> -    parser_init.add_argument("--local", '-l', action='store_true',
> -                             help="start in local mode (default: %(default)s)",
> -                             default=False)
> -    # This is used so that we can know which action is selected +
> -    # the functor to use to run this subcommand
> +    parser_analyze = subparsers.add_parser(
> +        'analyze', help='Analyze cloud-init logs and data')
> +    # Construct analyze subcommand parser
> +    parser_analyze = analyze_parser(parser_analyze)
> +
> +    parser_init = subparsers.add_parser(
> +        'init', help='initializes cloud-init and performs initial modules')
> +    parser_init.add_argument(
> +        '--local', '-l', action='store_true', default=False,
> +        help='start in local mode (default: %(default)s)')
>      parser_init.set_defaults(action=('init', main_init))
>  
>      # These settings are used for the 'config' and 'final' stages
> -    parser_mod = subparsers.add_parser('modules',
> -                                       help=('activates modules using '
> -                                             'a given configuration key'))
> -    parser_mod.add_argument("--mode", '-m', action='store',
> -                            help=("module configuration name "
> -                                  "to use (default: %(default)s)"),
> +    parser_mod = subparsers.add_parser(

There's a lot of style-refactoring here, but it's not consistent.  It's really hard to pick out what's needed for analyze integration versus style changes.

> +        'modules', help='activates modules using a given configuration key')
> +    parser_mod.add_argument('--mode', '-m', action='store',
> +                            help=('module configuration name '

here, you indent at the open brace, but the init parser you indent one tab in, which was different from before, all of them were indented to the first brace.

> +                                  'to use (default: %(default)s)'),
>                              default='config',
>                              choices=('init', 'config', 'final'))
>      parser_mod.set_defaults(action=('modules', main_modules))
>  
>      # These settings are used when you want to query information
>      # stored in the cloud-init data objects/directories/files
> -    parser_query = subparsers.add_parser('query',
> -                                         help=('query information stored '
> -                                               'in cloud-init'))
> -    parser_query.add_argument("--name", '-n', action="store",
> -                              help="item name to query on",
> -                              required=True,
> -                              choices=QUERY_DATA_TYPES)
> +    parser_query = subparsers.add_parser(
> +        'query', help='query information stored in cloud-init')
> +    parser_query.add_argument(
> +        '--name', '-n', action='store', help='item name to query on',
> +        required=True, choices=QUERY_DATA_TYPES)

here is one tab in, pulled in from first brace

>      parser_query.set_defaults(action=('query', main_query))
>  
>      # This subcommand allows you to run a single module
> -    parser_single = subparsers.add_parser('single',
> -                                          help=('run a single module '))
> -    parser_single.add_argument("--name", '-n', action="store",
> -                               help="module name to run",
> -                               required=True)
> -    parser_single.add_argument("--frequency", action="store",
> -                               help=("frequency of the module"),
> -                               required=False,
> -                               choices=list(FREQ_SHORT_NAMES.keys()))
> -    parser_single.add_argument("--report", action="store_true",
> -                               help="enable reporting",
> -                               required=False)
> -    parser_single.add_argument("module_args", nargs="*",
> -                               metavar='argument',
> -                               help=('any additional arguments to'
> -                                     ' pass to this module'))
> +    parser_single = subparsers.add_parser(
> +        'single', help='run a single module')

that looks like it could fit on one line even

> +    parser_single.add_argument('--name', '-n', action='store', required=True,
> +                               help='module name to run')

this aren't one tab it, they're out to first brace.  If we're going to do all of this code-motion, we should pick something style-wise and be consistent.  And preferrably in it's own MP

> +    parser_single.add_argument('--frequency', action='store',
> +                               choices=list(FREQ_SHORT_NAMES.keys()),
> +                               help='frequency of the module')
> +    parser_single.add_argument('--report', action='store_true',
> +                               help='enable reporting')
> +    parser_single.add_argument('module_args', nargs='*', metavar='argument',
> +                               help=('any additional arguments to pass to '
> +                                     ' this module'))
>      parser_single.set_defaults(action=('single', main_single))
>  
> -    parser_dhclient = subparsers.add_parser('dhclient-hook',
> -                                            help=('run the dhclient hook'
> -                                                  'to record network info'))
> -    parser_dhclient.add_argument("net_action",
> -                                 help=('action taken on the interface'))
> +    parser_dhclient = subparsers.add_parser(
> +        'dhclient-hook', help='run the dhclient hook to record network info')
> +    parser_dhclient.add_argument('net_action',
> +                                 help='action taken on the interface')
>      parser_dhclient.add_argument("net_interface",
>                                   help=('the network interface being acted'
>                                         ' upon'))
>      parser_dhclient.set_defaults(action=('dhclient_hook', dhclient_hook))
>  
> -    parser_features = subparsers.add_parser('features',
> -                                            help=('list defined features'))
> +    parser_features = subparsers.add_parser(
> +        'features', help='list defined features')
>      parser_features.set_defaults(action=('features', main_features))
>  
>      args = parser.parse_args(args=sysv_args)
>  
> -    try:
> -        (name, functor) = args.action
> -    except AttributeError:
> -        parser.error('too few arguments')
> +    (name, functor) = args.action

why do we get to try the try block here?

>  
>      # Setup basic logging to start (until reinitialized)
> -    # iff in debug mode...

is it true we're always configuring logging or did we just accidentally drop this comment?

>      if args.debug:
>          logging.setupBasicLogging()
>  


-- 
https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/328819
Your team cloud-init commiters is requested to review the proposed merge of ~chad.smith/cloud-init:analyze into cloud-init:master.
References

[Merge] ~chad.smith/cloud-init:analyze into cloud-init:master
From: Chad Smith, 2017-08-10