cloud-init-dev team mailing list archive
-
cloud-init-dev team
-
Mailing list archive
-
Message #03062
Re: [Merge] ~chad.smith/cloud-init:analyze into cloud-init:master
Thanks for getting this started.
Diff comments:
> diff --git a/cloudinit/analyze/README.md b/cloudinit/analyze/README.md
> new file mode 100644
> index 0000000..e496d74
> --- /dev/null
> +++ b/cloudinit/analyze/README.md
We'll want this in docs/rtd
> @@ -0,0 +1,167 @@
> +Getting Started
> +===============
> +
> +You'll need a working LXD setup. If you can lxc launch a machine then you're
> +good to go.
> +
> +1. lxc init ubuntu-daily:yakkety y1
Use xenial here
> +
> +# for now, we need to adjust cloud-init's logging to emit messages in
> +# a more useful format for parsing.
> +#
> +# NOTE: you need mount-image-callback
> +# from https://code.launchpad.net/~smoser/cloud-utils/trunk.mic-lxd
> +2. sudo ./bin/inject-payload.sh lxd:y1
> +
> +# now you can start it
> +3. lxc start y1
> +
We can remove this inject section as it's being integrated now
> +# now you can analyze it:
> +# % bin/cloudinit-analyze --help
> +# usage: cloudinit analyze <subcommand> [<args>]
> +#
> +# Subcommands:
> +# blame Print list of executed stages ordered by time to init
> +# dump Print an in-order list of events in JSON format
> +# show Print an in-order list of stages with execution times
> +#
> +# Analyze cloud-init logs and data
> +#
> +# positional arguments:
> +# subcommand subcommand to run
> +#
> +# optional arguments:
> +# -h, --help show this help message and exit
> +
> +# Print boot events, sections ran, unit time and total time, per boot
> +4. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit analyze show -i -
We'll want to revert this to something like:
lxc file pull y1/var/log/cloud-init.log - | cloud-init analyze show -i -
> +
> +
> +# % $cmd | tail -n 20
> +# |`->config-byobu previously ran @01.48600s +00.00000s
> +# Finished stage: (modules-config) 00.03300 seconds
> +#
> +# Starting stage: modules-final
> +# |`->config-rightscale_userdata previously ran @01.92300s +00.00000s
> +# |`->config-scripts-vendor previously ran @01.92300s +00.00100s
> +# |`->config-scripts-per-once previously ran @01.92400s +00.00000s
> +# |`->config-scripts-per-boot ran successfully @01.92400s +00.00100s
> +# |`->config-scripts-per-instance previously ran @01.92500s +00.00000s
> +# |`->config-scripts-user previously ran @01.92500s +00.00100s
> +# |`->config-ssh-authkey-fingerprints previously ran @01.92600s +00.00000s
> +# |`->config-keys-to-console previously ran @01.92600s +00.00000s
> +# |`->config-phone-home previously ran @01.92700s +00.00000s
> +# |`->config-final-message ran successfully @01.92700s +00.00600s
> +# |`->config-power-state-change previously ran @01.93300s +00.00000s
> +# Finished stage: (modules-final) 00.01800 seconds
> +#
> +# Total Time: 0.25700 seconds
> +#
> +# 2 boot records analyzed
> +
> +
> +# dump json event stream parsted from cloud-init.log
> +5. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze dump -i -
> +
> +# {
> +# "description": "config-power-state-change previously ran",
> +# "event_type": "finish",
> +# "name": "modules-final/config-power-state-change",
> +# "origin": "cloudinit",
> +# "result": "SUCCESS",
> +# "timestamp": 1473722226.711
> +# },
> +# {
> +# "description": "running modules for final",
> +# "event_type": "finish",
> +# "name": "modules-final",
> +# "origin": "cloudinit",
> +# "result": "SUCCESS",
> +# "timestamp": 1473722226.713
> +# }
> +# ]
> +
> +
> +# sort events by time length (blame!)
> +6. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze blame -i -
> +
> +# -- Boot Record 01 --
> +# 00.68700s (modules-config/config-locale)
> +# 00.16800s (init-network/config-ssh)
> +# 00.14600s (modules-config/config-grub-dpkg)
> +# 00.07600s (modules-config/config-apt-configure)
> +# 00.01900s (modules-final/config-keys-to-console)
> +# 00.01900s (init-local/search-NoCloud)
> +# 00.01500s (init-network/config-users-groups)
> +# 00.00500s (modules-final/config-final-message)
> +# 00.00500s (init-network/consume-vendor-data)
> +# 00.00500s (init-network/config-growpart)
> +# 00.00400s (init-network/config-set_hostname)
> +# 00.00400s (init-network/config-resizefs)
> +# 00.00300s (init-network/consume-user-data)
> +# 00.00200s (modules-final/config-ssh-authkey-fingerprints)
> +# 00.00200s (modules-final/config-scripts-per-instance)
> +# 00.00200s (modules-final/config-phone-home)
> +# 00.00200s (modules-config/config-snappy)
> +# 00.00200s (modules-config/confg-mounts)
> +# 00.00200s (modules-config/config-fan)
> +
> +
> +Testing with KVM
> +================
> +
> +1. Download the current cloud image
> +
> +% https://cloud-images.ubuntu.com/daily/server/yakkety/current/yakkety-server-cloudimg-amd64.img
> +
> +2. Create snapshot image to preserve the original cloud-image.
> +
> +% qemu-img create -b yakkety-server-cloudimg-amd64.img -f qcow2 test-cloudinit.qcow2
> +
> +3. Inject the cloud-init modifications. Note, this takes longer as it installs
> +profiling packages.
> +
> +% ./bin/inject-payload.sh ./test-cloudinit.qcow2
> +
> +4. Create a seed image using `cloud-localds`
> +
Drop the integration/injection stuff
> +% cat my-user-data
> +#cloud-config
> +password: passw0rd
> +chpasswd: { expire: False }
> +ssh_pwauth: True
> +ssh_import_id: <gh:userid or lp:userid>
> +% echo "instance-id: $(uuidgen || echo i-abcdefg)" > my-meta-data
> +% cloud-localds my-seed.img my-user-data my-meta-data
> +
> +5. Launch your modifed VM
> +
> +% kvm -m 512 -net nic -net user -redir tcp:2222::22 \
> + -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \
> + -drive file=my-seed.img,if=virtio,format=raw
> +
> +6. Analyze the boot (blame, dump, show)
> +
> +% ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | ./bin/./bin/cloudinit-analyze blame -i
> +
> +7. Profiling Data is present in /var/lib/cloud/data/cloud-init-<TIMESTAMP>-<PROFILER>
> +
> +On KVM and other VM-based images, with profling enabled, blktrace IO and perf
> +cpu stacks are recorded for later data analysis.
> +
> +# IO tracing on the cloud-image disk and seed device
> +% ssh -p 2222 ubuntu@localhost
> +% cd /var/lib/cloud-init/data/cloud-init-*-blktrace
> +% sudo zcat vda1.blktrace.* | blkparse -i - -s
> +
> +# or for btt data
> +sudo gunzip *blktrace*
> +sudo blkparse -i vda1 -d bp.vda1.bin
> +sudo btt -A -i bp.vda1.bin | sudo tee bp.vda1.txt
> +
> +
> +# Profiling of all CPUs during cloud-init run
> +% ssh -p 2222 ubuntu@localhost
> +% cd /var/lib/cloud-init/data/cloud-init-*-perf
> +% sudo gunzip perf.data.gz
> +% sudo perf report --stdio
> diff --git a/cloudinit/analyze/show.py b/cloudinit/analyze/show.py
> new file mode 100644
> index 0000000..6b4c10c
> --- /dev/null
> +++ b/cloudinit/analyze/show.py
> @@ -0,0 +1,241 @@
> +#!/usr/bin/env python3
Probablu should drop this
> +# Copyright (C) 2016 Canonical Ltd.
> +#
> +# Author: Ryan Harper <ryan.harper@xxxxxxxxxxxxx>
> +#
> +# This file is part of cloud-init. See LICENSE file for license information.
> +import argparse
> +import base64
> +import datetime
> +import json
> +import os
> +import sys
> +
> +# An event:
> +'''
> +{
> + "description": "executing late commands",
> + "event_type": "start",
> + "level": "INFO",
> + "name": "cmd-install/stage-late"
> + "origin": "cloudinit",
> + "timestamp": 1461164249.1590767,
> +},
> +
> + {
> + "description": "executing late commands",
> + "event_type": "finish",
> + "level": "INFO",
> + "name": "cmd-install/stage-late",
> + "origin": "cloudinit",
> + "result": "SUCCESS",
> + "timestamp": 1461164249.1590767
> + }
> +
> +'''
> +format_key = {
> + '%d': 'delta',
> + '%D': 'description',
> + '%E': 'elapsed',
> + '%e': 'event_type',
> + '%I': 'indent',
> + '%l': 'level',
> + '%n': 'name',
> + '%o': 'origin',
> + '%r': 'result',
> + '%t': 'timestamp',
> + '%T': 'total_time',
> +}
> +
> +formatting_help = " ".join(["{}: {}".format(k.replace('%', '%%'), v)
> + for k, v in format_key.items()])
> +
> +
> +def format_record(msg, event):
> + for i, j in format_key.items():
> + if i in msg:
> + # ensure consistent formatting of time values
> + if j in ['delta', 'elapsed', 'timestamp']:
> + msg = msg.replace(i, "{%s:08.5f}" % j)
> + else:
> + msg = msg.replace(i, "{%s}" % j)
> + return msg.format(**event)
> +
> +
> +def dump_event_files(event):
> + content = {k: v for k, v in event.items() if k not in ['content']}
> + files = content['files']
> + saved = []
> + for f in files:
> + fname = f['path']
> + fn_local = os.path.basename(fname)
> + fcontent = base64.b64decode(f['content']).decode('ascii')
Maybe we can use cloudinit.util.load_file
> + with open(fn_local, 'w') as fh:
> + fh.write(fcontent)
and util.write_file
> + saved.append(fn_local)
> +
> + return saved
> +
> +
> +def event_name(event):
> + if event:
> + return event.get('name')
> + return None
> +
> +
> +def event_type(event):
> + if event:
> + return event.get('event_type')
> + return None
> +
> +
> +def event_parent(event):
> + if event:
> + return event_name(event).split("/")[0]
> + return None
> +
> +
> +def event_timestamp(event):
> + return float(event.get('timestamp'))
> +
> +
> +def event_datetime(event):
> + return datetime.datetime.utcfromtimestamp(event_timestamp(event))
> +
> +
> +def delta_seconds(t1, t2):
> + return (t2 - t1).total_seconds()
> +
> +
> +def event_duration(start, finish):
> + return delta_seconds(event_datetime(start), event_datetime(finish))
> +
> +
> +def event_record(start_time, start, finish):
> + record = finish.copy()
> + record.update({
> + 'delta': event_duration(start, finish),
> + 'elapsed': delta_seconds(start_time, event_datetime(start)),
> + 'indent': '|' + ' ' * (event_name(start).count('/') - 1) + '`->',
> + })
> +
> + return record
> +
> +
> +def total_time_record(total_time):
> + return 'Total Time: %3.5f seconds\n' % total_time
> +
> +
> +def generate_records(events, blame_sort=False,
> + print_format="(%n) %d seconds in %I%D",
> + dump_files=False, log_datafiles=False):
> +
> + sorted_events = sorted(events, key=lambda x: x['timestamp'])
> + records = []
> + start_time = None
> + total_time = 0.0
> + stage_start_time = {}
> + stages_seen = []
> + boot_records = []
> +
> + unprocessed = []
> + for e in range(0, len(sorted_events)):
> + event = events[e]
> + try:
> + next_evt = events[e + 1]
> + except IndexError:
> + next_evt = None
> +
> + if event_type(event) == 'start':
> + if event.get('name') in stages_seen:
> + records.append(total_time_record(total_time))
> + boot_records.append(records)
> + records = []
> + start_time = None
> + total_time = 0.0
> +
> + if start_time is None:
> + stages_seen = []
> + start_time = event_datetime(event)
> + stage_start_time[event_parent(event)] = start_time
> +
> + # see if we have a pair
> + if event_name(event) == event_name(next_evt):
> + if event_type(next_evt) == 'finish':
> + records.append(format_record(print_format,
> + event_record(start_time,
> + event,
> + next_evt)))
> + else:
> + # This is a parent event
> + records.append("Starting stage: %s" % event.get('name'))
> + unprocessed.append(event)
> + stages_seen.append(event.get('name'))
> + continue
> + else:
> + prev_evt = unprocessed.pop()
> + if event_name(event) == event_name(prev_evt):
> + record = event_record(start_time, prev_evt, event)
> + records.append(format_record("Finished stage: "
> + "(%n) %d seconds ",
> + record) + "\n")
> + total_time += record.get('delta')
> + else:
> + # not a match, put it back
> + unprocessed.append(prev_evt)
> +
> + records.append(total_time_record(total_time))
> + boot_records.append(records)
> + return boot_records
> +
> +
> +def show_events(events, print_format):
> + return generate_records(events, print_format=print_format)
> +
> +
> +def load_events(infile, rawdata=None):
> + if rawdata:
> + data = rawdata.read()
> + else:
> + data = infile.read()
> +
> + j = None
> + try:
> + j = json.loads(data)
> + except json.JSONDecodeError:
> + pass
> +
> + return j, data
> +
> +
> +def main():
> + parser = argparse.ArgumentParser(
> + description=('cloudinit-print-log - '
Not sure if we need to drop or modify this.
> + 'pretty print and sort cloudinit logs'),
> + prog='cloudinit-print-log')
> + parser.add_argument('--format', action='store',
> + dest='print_format',
> + default='%I (%n) %D @%Es +%ds',
> + help='specify formatting of output. ' +
> + formatting_help)
> + parser.add_argument('infile', nargs='?', type=argparse.FileType('r'),
> + help='Path to log to parse. Use - for stdin')
> +
> + opts = parser.parse_args(sys.argv[1:])
> + if not opts.infile:
> + parser.print_help()
> + sys.exit(1)
> +
> + j, _ = load_events(opts.infile)
> +
> + records = show_events(j, opts.print_format)
> + for record in records:
> + print('The total time elapsed since completing an event is printed'
> + ' after the "@" character.')
> + print('The time the event takes is printed after the "+" character.')
> + print()
> + print("\n".join(record))
> +
> +
> +if __name__ == '__main__':
> + main()
> diff --git a/cloudinit/cmd/main.py b/cloudinit/cmd/main.py
> index 139e03b..8e02627 100644
> --- a/cloudinit/cmd/main.py
> +++ b/cloudinit/cmd/main.py
> @@ -72,18 +74,18 @@ LOG = logging.getLogger()
> # and we still want to print exceptions...
> def print_exc(msg=''):
> if msg:
> - sys.stderr.write("%s\n" % (msg))
> + sys.stderr.write('%s\n' % (msg))
> sys.stderr.write('-' * 60)
> - sys.stderr.write("\n")
> + sys.stderr.write('\n')
> traceback.print_exc(file=sys.stderr)
> sys.stderr.write('-' * 60)
> - sys.stderr.write("\n")
> + sys.stderr.write('\n')
flag day for " -> ' ?
>
>
> def welcome(action, msg=None):
> if not msg:
> msg = welcome_format(action)
> - util.multi_log("%s\n" % (msg),
> + util.multi_log('%s\n' % (msg),
> console=False, stderr=True, log=LOG)
> return msg
>
> @@ -713,83 +715,74 @@ def main(sysv_args=None):
> default=False)
>
> parser.set_defaults(reporter=None)
> - subparsers = parser.add_subparsers()
> + subparsers = parser.add_subparsers(title='Subcommands', dest='subcommand')
> + subparsers.required = True
>
> # Each action and its sub-options (if any)
> - parser_init = subparsers.add_parser('init',
> - help=('initializes cloud-init and'
> - ' performs initial modules'))
> - parser_init.add_argument("--local", '-l', action='store_true',
> - help="start in local mode (default: %(default)s)",
> - default=False)
> - # This is used so that we can know which action is selected +
> - # the functor to use to run this subcommand
> + parser_analyze = subparsers.add_parser(
> + 'analyze', help='Analyze cloud-init logs and data')
> + # Construct analyze subcommand parser
> + parser_analyze = analyze_parser(parser_analyze)
> +
> + parser_init = subparsers.add_parser(
> + 'init', help='initializes cloud-init and performs initial modules')
> + parser_init.add_argument(
> + '--local', '-l', action='store_true', default=False,
> + help='start in local mode (default: %(default)s)')
> parser_init.set_defaults(action=('init', main_init))
>
> # These settings are used for the 'config' and 'final' stages
> - parser_mod = subparsers.add_parser('modules',
> - help=('activates modules using '
> - 'a given configuration key'))
> - parser_mod.add_argument("--mode", '-m', action='store',
> - help=("module configuration name "
> - "to use (default: %(default)s)"),
> + parser_mod = subparsers.add_parser(
There's a lot of style-refactoring here, but it's not consistent. It's really hard to pick out what's needed for analyze integration versus style changes.
> + 'modules', help='activates modules using a given configuration key')
> + parser_mod.add_argument('--mode', '-m', action='store',
> + help=('module configuration name '
here, you indent at the open brace, but the init parser you indent one tab in, which was different from before, all of them were indented to the first brace.
> + 'to use (default: %(default)s)'),
> default='config',
> choices=('init', 'config', 'final'))
> parser_mod.set_defaults(action=('modules', main_modules))
>
> # These settings are used when you want to query information
> # stored in the cloud-init data objects/directories/files
> - parser_query = subparsers.add_parser('query',
> - help=('query information stored '
> - 'in cloud-init'))
> - parser_query.add_argument("--name", '-n', action="store",
> - help="item name to query on",
> - required=True,
> - choices=QUERY_DATA_TYPES)
> + parser_query = subparsers.add_parser(
> + 'query', help='query information stored in cloud-init')
> + parser_query.add_argument(
> + '--name', '-n', action='store', help='item name to query on',
> + required=True, choices=QUERY_DATA_TYPES)
here is one tab in, pulled in from first brace
> parser_query.set_defaults(action=('query', main_query))
>
> # This subcommand allows you to run a single module
> - parser_single = subparsers.add_parser('single',
> - help=('run a single module '))
> - parser_single.add_argument("--name", '-n', action="store",
> - help="module name to run",
> - required=True)
> - parser_single.add_argument("--frequency", action="store",
> - help=("frequency of the module"),
> - required=False,
> - choices=list(FREQ_SHORT_NAMES.keys()))
> - parser_single.add_argument("--report", action="store_true",
> - help="enable reporting",
> - required=False)
> - parser_single.add_argument("module_args", nargs="*",
> - metavar='argument',
> - help=('any additional arguments to'
> - ' pass to this module'))
> + parser_single = subparsers.add_parser(
> + 'single', help='run a single module')
that looks like it could fit on one line even
> + parser_single.add_argument('--name', '-n', action='store', required=True,
> + help='module name to run')
this aren't one tab it, they're out to first brace. If we're going to do all of this code-motion, we should pick something style-wise and be consistent. And preferrably in it's own MP
> + parser_single.add_argument('--frequency', action='store',
> + choices=list(FREQ_SHORT_NAMES.keys()),
> + help='frequency of the module')
> + parser_single.add_argument('--report', action='store_true',
> + help='enable reporting')
> + parser_single.add_argument('module_args', nargs='*', metavar='argument',
> + help=('any additional arguments to pass to '
> + ' this module'))
> parser_single.set_defaults(action=('single', main_single))
>
> - parser_dhclient = subparsers.add_parser('dhclient-hook',
> - help=('run the dhclient hook'
> - 'to record network info'))
> - parser_dhclient.add_argument("net_action",
> - help=('action taken on the interface'))
> + parser_dhclient = subparsers.add_parser(
> + 'dhclient-hook', help='run the dhclient hook to record network info')
> + parser_dhclient.add_argument('net_action',
> + help='action taken on the interface')
> parser_dhclient.add_argument("net_interface",
> help=('the network interface being acted'
> ' upon'))
> parser_dhclient.set_defaults(action=('dhclient_hook', dhclient_hook))
>
> - parser_features = subparsers.add_parser('features',
> - help=('list defined features'))
> + parser_features = subparsers.add_parser(
> + 'features', help='list defined features')
> parser_features.set_defaults(action=('features', main_features))
>
> args = parser.parse_args(args=sysv_args)
>
> - try:
> - (name, functor) = args.action
> - except AttributeError:
> - parser.error('too few arguments')
> + (name, functor) = args.action
why do we get to try the try block here?
>
> # Setup basic logging to start (until reinitialized)
> - # iff in debug mode...
is it true we're always configuring logging or did we just accidentally drop this comment?
> if args.debug:
> logging.setupBasicLogging()
>
--
https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/328819
Your team cloud-init commiters is requested to review the proposed merge of ~chad.smith/cloud-init:analyze into cloud-init:master.
References