← Back to team overview

cloud-init-dev team mailing list archive

Re: [Merge] ~chad.smith/cloud-init:unify-datasource-get-data into cloud-init:master

 

userdata_raw for an instance should *always* be bytes, which is *always* unserializble in json:

>>> json.dumps({'key': b'bytes value'})
TypeError: Object of type 'bytes' is not JSON serializable

heres a test that shows more actual content of user-data:
 http://paste.ubuntu.com/25594310/

So that means that unless something is wrong in the Datasource, the 'user-data' (and maybe vendor-data) blobs are going to be redacted with the Warning.

Is meta-data one of our "normalized" values ?

I kind of expected that our exported data looks like:
 {'instance-id': <string>,
  'hostname': <hostname>,
  'user-data'
  'vendor-data': 

  'base64-keys': ['user-data', 'vendor-data', 'ec2/blobs/blob1', 'ec2/more-blobs/0']
  '_datasource': {
    'ec2-specific-key1': <string>,
    'blobs': { 'blob1': <bytes> },
    'more-blobs': [b'blobby'],
  }
  
So we'd have generally available keys in the top (many of them taken out of 'meta-data') with well known names, and then '_datasource' or other name with non-guaranteed stuff under that key.

Maybe i'm thinknig to far ahead, but I'm afraid of putting something in /run/cloud-init/INSTANCE_JSON_FILE that then someone reads and we're stuck with the format of.

the 'base64-keys' thought above is that those would be paths in the data to keys whose value is binary data.  We should also make a distinction between a value whos contents raise error on .decode('utf-8') and a value that should generally be regarded as binary.  Ie, we shouldnt just "try and see" if we can decode the values because then that leads to oddness for the user where sometimes a given key path might be in base64-keys and other times not.


I don't like that the value of a key can be a string 'Warning:' ... that seems
hard for someone to identify the difference between a possible but strange
value and "didnt work".  We should never come to this case I do not think.
And if we do, raising exception is prbably ok.

  


Diff comments:

> diff --git a/cloudinit/sources/__init__.py b/cloudinit/sources/__init__.py
> index 9a43fbe..f3ea27a 100644
> --- a/cloudinit/sources/__init__.py
> +++ b/cloudinit/sources/__init__.py
> @@ -33,6 +33,9 @@ DEP_FILESYSTEM = "FILESYSTEM"
>  DEP_NETWORK = "NETWORK"
>  DS_PREFIX = 'DataSource'
>  
> +# Directory in which instance meta-data, user-data and vendor-data is written

directory?

> +INSTANCE_JSON_FILE = 'instance-data.json'
> +
>  LOG = logging.getLogger(__name__)
>  
>  
> @@ -78,6 +81,32 @@ class DataSource(object):
>      def __str__(self):
>          return type_utils.obj_name(self)
>  
> +    def get_data(self):
> +        """Datasources implement _get_data to setup metadata and userdata_raw.
> +
> +        Minimally, the datasource should return a boolean True on success.
> +        """
> +        return_value = self._get_data()
> +        json_file = os.path.join(self.paths.run_dir, INSTANCE_JSON_FILE)
> +        if return_value:
> +            instance_data = {
> +                'meta-data': self.metadata,
> +                'user-data': self.get_userdata_raw(),
> +                'vendor-data': self.get_vendordata_raw()}
> +            LOG.info('Persisting instance data JSON: %s', json_file)

log levels suck.  but currently INFO goes to the console (and stdout or error).
This seems a bit high.  Every time we do this we're going to mention that when it is really just "working as expected".

> +            try:
> +                content = util.json_dumps(instance_data)
> +            except TypeError as e:
> +                LOG.warning('Error persisting instance-data.json: %s', str(e))
> +                return return_value
> +            util.write_file(json_file, content, mode=0o600)

might as well use atomic_helper.write_file i thikn

> +        return return_value
> +
> +    def _get_data(self):
> +        raise NotImplementedError(
> +            'Subclasses of DataSource must implement _get_data which'
> +            ' sets self.metadata, vendordata_raw and userdata_raw.')
> +
>      def get_userdata(self, apply_filter=False):
>          if self.userdata is None:
>              self.userdata = self.ud_proc.process(self.get_userdata_raw())
> diff --git a/cloudinit/sources/tests/test_init.py b/cloudinit/sources/tests/test_init.py
> new file mode 100644
> index 0000000..35016f7
> --- /dev/null
> +++ b/cloudinit/sources/tests/test_init.py
> @@ -0,0 +1,133 @@
> +# This file is part of cloud-init. See LICENSE file for license information.
> +
> +import os
> +import stat
> +
> +from cloudinit.helpers import Paths
> +from cloudinit.sources import INSTANCE_JSON_FILE, DataSource
> +from cloudinit.tests.helpers import CiTestCase
> +from cloudinit.user_data import UserDataProcessor
> +from cloudinit import util
> +
> +
> +class DataSourceTestSubclassNet(DataSource):
> +
> +    def __init__(self, sys_cfg, distro, paths, custom_userdata=None):
> +        super(DataSourceTestSubclassNet, self).__init__(
> +            sys_cfg, distro, paths)
> +        self._custom_userdata = custom_userdata
> +
> +    def _get_data(self):
> +        self.metadata = {'DataSourceTestSubclassNet': 'was here'}
> +        if self._custom_userdata:
> +            self.userdata_raw = self._custom_userdata
> +        else:
> +            self.userdata_raw = 'userdata_raw'
> +        self.vendordata_raw = 'vendordata_raw'
> +        return True
> +
> +
> +class InvalidDataSourceTestSubclassNet(DataSource):
> +    pass
> +
> +
> +class TestDataSource(CiTestCase):
> +
> +    with_logs = True
> +
> +    def setUp(self):
> +        super(TestDataSource, self).setUp()
> +        self.sys_cfg = {'datasource': {'': {'key1': False}}}
> +        self.distro = 'distrotest'  # generally should be a Distro object
> +        self.paths = Paths({})
> +        self.datasource = DataSource(self.sys_cfg, self.distro, self.paths)
> +
> +    def test_datasource_init(self):
> +        """DataSource initializes metadata attributes, ds_cfg and ud_proc."""
> +        self.assertEqual(self.paths, self.datasource.paths)
> +        self.assertEqual(self.sys_cfg, self.datasource.sys_cfg)
> +        self.assertEqual(self.distro, self.datasource.distro)
> +        self.assertIsNone(self.datasource.userdata)
> +        self.assertEqual({}, self.datasource.metadata)
> +        self.assertIsNone(self.datasource.userdata_raw)
> +        self.assertIsNone(self.datasource.vendordata)
> +        self.assertIsNone(self.datasource.vendordata_raw)
> +        self.assertEqual({'key1': False}, self.datasource.ds_cfg)
> +        self.assertIsInstance(self.datasource.ud_proc, UserDataProcessor)
> +
> +    def test_datasource_init_strips_classname_for_ds_cfg(self):
> +        """Init strips DataSource prefix and Net suffix for ds_cfg."""
> +        sys_cfg = {'datasource': {'TestSubclass': {'key2': False}}}
> +        distro = 'distrotest'  # generally should be a Distro object
> +        paths = Paths({})
> +        datasource = DataSourceTestSubclassNet(sys_cfg, distro, paths)
> +        self.assertEqual({'key2': False}, datasource.ds_cfg)
> +
> +    def test_str_is_classname(self):
> +        """The string representation of the datasource is the classname."""
> +        self.assertEqual('DataSource', str(self.datasource))
> +        self.assertEqual(
> +            'DataSourceTestSubclassNet',
> +            str(DataSourceTestSubclassNet('', '', self.paths)))
> +
> +    def test__get_data_unimplemented(self):
> +        """Raise an error when _get_data is not implemented."""
> +        with self.assertRaises(NotImplementedError) as context_manager:
> +            self.datasource.get_data()
> +        self.assertIn(
> +            'Subclasses of DataSource must implement _get_data',
> +            str(context_manager.exception))
> +        datasource2 = InvalidDataSourceTestSubclassNet(
> +            self.sys_cfg, self.distro, self.paths)
> +        with self.assertRaises(NotImplementedError) as context_manager:
> +            datasource2.get_data()
> +        self.assertIn(
> +            'Subclasses of DataSource must implement _get_data',
> +            str(context_manager.exception))
> +
> +    def test_get_data_calls_subclass__get_data(self):
> +        """Datasource.get_data uses the subclass' version of _get_data."""
> +        tmp = self.tmp_dir()
> +        datasource = DataSourceTestSubclassNet(
> +            self.sys_cfg, self.distro, Paths({'run_dir': tmp}))
> +        self.assertTrue(datasource.get_data())
> +        self.assertEqual(
> +            {'DataSourceTestSubclassNet': 'was here'},
> +            datasource.metadata)
> +        self.assertEqual('userdata_raw', datasource.userdata_raw)
> +        self.assertEqual('vendordata_raw', datasource.vendordata_raw)
> +
> +    def test_get_data_write_json_instance_data(self):
> +        """get_data writes INSTANCE_JSON_FILE to run_dir as readonly root."""
> +        tmp = self.tmp_dir()
> +        datasource = DataSourceTestSubclassNet(
> +            self.sys_cfg, self.distro, Paths({'run_dir': tmp}))
> +        datasource.get_data()
> +        json_file = self.tmp_path(INSTANCE_JSON_FILE, tmp)
> +        content = util.load_file(json_file)
> +        expected = {
> +            'meta-data': {'DataSourceTestSubclassNet': 'was here'},

{'DataSourceTestSubclassNet': 'was here'} is copied a bunch.

> +            'user-data': 'userdata_raw',
> +            'vendor-data': 'vendordata_raw'}
> +        self.assertEqual(expected, util.load_json(content))
> +        file_stat = os.stat(json_file)
> +        self.assertEqual(0o600, stat.S_IMODE(file_stat.st_mode))
> +
> +    def test_get_data_handles_unserializable_content(self):
> +        """get_data warns unserializable content in INSTANCE_JSON_FILE."""
> +        tmp = self.tmp_dir()
> +        datasource = DataSourceTestSubclassNet(
> +            self.sys_cfg, self.distro, Paths({'run_dir': tmp}),
> +            custom_userdata={'key1': 'val1', 'key2': {'key2.1': self.paths}})
> +        self.assertTrue(datasource.get_data())
> +        json_file = self.tmp_path(INSTANCE_JSON_FILE, tmp)
> +        content = util.load_file(json_file)
> +        expected = {
> +            'meta-data': {'DataSourceTestSubclassNet': 'was here'},
> +            'user-data': {
> +                'key1': 'val1',
> +                'key2': {
> +                    'key2.1': "Warning: redacted unserializable type <class"
> +                              " 'cloudinit.helpers.Paths'>"}},
> +            'vendor-data': 'vendordata_raw'}
> +        self.assertEqual(expected, util.load_json(content))


-- 
https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/330115
Your team cloud-init commiters is requested to review the proposed merge of ~chad.smith/cloud-init:unify-datasource-get-data into cloud-init:master.


References