savanna-all team mailing list archive

Thread
Date
Re: Some thoughts on configuration

To: Jon Maron <jmaron@xxxxxxxxxxxxxxx>
From: Philip Langdale <philipl@xxxxxxxxxxxx>
Date: Wed, 29 May 2013 14:04:22 -0700
Cc: savanna-all@xxxxxxxxxxxxxxxxxxx
In-reply-to: <144D8126-92F2-45B5-A995-85367C539325@hortonworks.com>
Hi Jon,

Yes, that's a good point. A naive interpretation would result in each role
type getting its own top level tab, when you'd really want some kind of
nesting or other categorization mechanism within each service. As for
opaque extra data, I think that's a fine idea for anything that isn't
required to be communicated to the user. In my specific example, that would
be functional (but not perfect) for role type, as I'd then put the role
type information in the config's descriptive text. I agree it's going to be
generally useful to have this kind of flexibility, and if necessary, it can
formally defined in such a way as to survive persistence.

--phil


On 29 May 2013 13:21, Jon Maron <jmaron@xxxxxxxxxxxxxxx> wrote:

> If you look at some of the latest mockups, there does seem to be a link
> between the identifiers and the associated service editors (e.g. here they
> are displayed as tab headings:
> https://wiki.openstack.org/wiki/Savanna/UIMockups/NodeGroupTemplates).
>  At least my assumption is that there controller itself does not retain
> information about the available services, but rather relies on the config
> items returned by the plugin to identify the service mix.
>
> Having said that, though, I too have wondered, along the same lines,
> whether a plugin can simply embed some additional attributes in an
> extension of the standard Config object defined by the Savanna API.  For
> example, perhaps an additional attribute can be added by a plugin that
> identifies a specific file mapping for a given property?  From a UI
> perspective the base attributes will be utilized.  But the user inputs
> simply reference the associated config object returned from the
> get_configs() call, so I don't see why a plugin can't add specific
> information, opaque to the controller, that is only utilized by the plugin
> during the processing of the returned inputs.  I don't believe there is any
> plan on the part of savanna to persist the config objects.
>
> -- Jon
>
> On May 29, 2013, at 4:03 PM, Philip Langdale <philipl@xxxxxxxxxxxx> wrote:
>
> Hi Sergey,
>
> Thanks.
>
> Looking at the docs more, is there actually any semantic significance for
> Savanna to the service identifiers used in the config objects and
> get_configs()? If these strings are opaque to savanna - and it seems kind
> of like they are, then it wouldn't require any changes inside Savanna to
> support the additional scoping I described - "service:foo:bar" where bar is
> the role type. Maybe I'm missing a dependency (in the UI?) on that.
>
> Oh, and one thing I haven't seen discussed, but I'm assuming that config
> naming is plugin specific? I don't see anything explicit either way so I'm
> assuming that's considered opaque to Savanna.
>
> --phil
>
>
> On 29 May 2013 12:06, Sergey Lukjanov <slukjanov@xxxxxxxxxxxx> wrote:
>
>> Hi Phil,
>>
>> very glad to see you participating in Savanna architecture discussions.
>>
>> First of all, thank you for such detailed description of CM configuration
>> system, terminology and that is especially important - their mappings to
>> our current vision. Our team will carefully consider this information.
>>
>> As you rightly said, there is no time in the second phase to change the
>> core architecture ideas such as configuration system, but in the background
>> we can think about it and discuss it and potentially upgrade it in the
>> future, for example, in the next phase.
>>
>>
>> Sincerely yours,
>> Sergey Lukjanov
>> Software Engineer
>> Mirantis Inc.
>> GTalk: me@xxxxxxxxxxx
>> Skype: lukjanovsv
>>
>> On May 29, 2013, at 22:33, Philip Langdale <philipl@xxxxxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> As a quick introduction, I'm one of the engineers working on Cloudera
>> Manager and have been looking at how it would work in conjunction with
>> Savanna, I've been reading through the docs and the recent conversations on
>> configuration and scoping, and I'd like to talk a bit about how Cloudera
>> Manager handles configuration and how this maps to the Savanna API as I
>> currently understand it.
>>
>> CM Terminology:
>>
>> * Cluster: A logical cluster, which contains a set of hosts and the
>> services deployed on those hosts
>> * Service Type: A type of service (duh): "HDFS", "MAPREDUCE", etc
>> * Service Instance: A concrete instance of a service, running on a
>> cluster: "My first HDFS", etc
>> * Role Type: A particular type of role within a service: "NAMENODE",
>> "TASKTRACKER', etc
>> * Role Instance: A concrete instance of a role type, assigned to a
>> specific host/node: "NAMENODE-1 on host1.domain.com", etc. Only one
>> instance of a given role type can be assigned to a single host
>> * Process: The actual running process associated with a role instance. So
>> while a process only exists while it's running, the role instance always
>> exists.
>> * Role Group: A set of role instances, within a single service, of a
>> single role type, that share common configurations.
>> * Host: A host - not very profound.
>> * Host Template: A set of role groups. When a template is applied to a
>> host, for each role group, a role instance is created and assigned to that
>> host.
>>
>> When it comes to configuration, CM defines configs at the Service Type
>> and Role Type level. So a given service or role type has a fixed set of
>> possible configurations associated with it.
>>
>> For example:
>>
>> HDFS: Replication Factor (default 3)
>> Namenode: Listening Port (default 8020)
>> Datanode: Handler Count (default 3)
>>
>> and so on.
>>
>> When it comes time to set a configuration value, that value is associated
>> with an instance. Service type config values are always associated with a
>> service instance, but role type config values can be associated with either
>> a role group or a role instance - with the role instance value overriding
>> the role group value. In this way it's possible to define values that apply
>> to a whole group, but also specialize certain instances where necessary.
>>
>> At the time a process is started, CM will generate the process' relevant
>> config files on the fly, based on a set of internal logic that map configs
>> to actual entries in config files (and/or where appropriate, environment
>> variables or command line arguments). In most cases, these are 1:1 but
>> sometimes the handling is more complicated. For example, when generating
>> the fs.default.name, we combine our knowledge of the hostname of the
>> Namenode with the user specific listening port config. As these config
>> files are generated per-process, they will look different for different
>> role types - so a datanode's hdfs-site.xml looks different from a
>> namenode's hdfs-site.xml - and only contains the config entries that are
>> relevant to it. Configuration files are regenerated every time a role
>> instance is (re)started to ensure consistency.
>>
>> Some configuration is indirect - coming from dependency services, rather
>> than from the service itself. This is modelled through the use of
>> dependency configurations. So a mapreduce service instance has a config
>> that indicates which hdfs service instance it depends on, and in this way
>> it is able to discover the fs.default.name and other relevant
>> configuration.
>>
>> Finally, services can have a Gateway role type, which indicates a host
>> that does not run any processes for a service, but which can act as a
>> client for the service. When a host is assigned a gateway role instance, CM
>> will ensure that the system-wide config directories in /etc are correctly
>> populated to connect to the service. (Remembering that process config files
>> are private and per-process, they have no effect on the system-wide
>> configuration that client applications see)
>>
>> Hosts also have a set of configurations associated with them. Values can
>> be defined at the 'all hosts' level or the individual host level.
>>
>> Now, with all that said, we can consider how these concepts map to the
>> configuration model described in the Provisioning Plugin API.
>>
>> The config object:
>>
>> Unsurprisingly, most of the fields here are directly mappable, with the
>> difficult ones being the applicable_target and the scope.
>>
>> Currently defined applicable targets are 'general' and 'service<service
>> instance>'
>> Currently defined scopes are 'node' or 'cluster'
>>
>> Let's now consider how these combinations map to the CM concepts and then
>> identify which CM concepts cannot be expressed.
>>
>> 1) applicable_target=general, scope=cluster
>>
>> This maps to an 'all hosts' configuration
>>
>> 2) applicable_target=general, scope=node
>>
>> This maps to a 'single host' configuration
>>
>> 3) applicable_target=service:instance, scope=cluster
>>
>> This maps to a service type configuration
>>
>> 4) applicable_target=service:instance, scope=node
>>
>> This doesn't exactly map to anything, unfortunately. service type
>> configurations cannot be specialized to individual nodes, and the configs
>> that apply to an individual node are scoped at the roletype level.
>>
>> So, we are left in a somewhat difficult situation where the majority of
>> our configurations don't actually map cleanly to anything. Now, we can
>> obviously do poor man's namespacing and prefix the config names that expose
>> through the plugin (so listening port would be "namenode:listening_port"
>> for example). If we did this, we'd be able to map (4) to a role instance
>> level config (as there's only one role instance per type per host, we can
>> work out which instance a namespaced config applies to)
>>
>> Then what does it mean for (3) with a role type config? The only thing it
>> can mean is a config assigned to an implicit role group that covers all the
>> hosts of the given role type in the cluster.
>>
>> This should be functional in the short term, but obviously we'd like to
>> more explicitly support these concepts to avoid relying on more fragile
>> mechanisms like namespacing.
>>
>> In an ideal world, we'd like to be able to have an
>> applicable_target=service:instance:roletype and a scope=node_group which
>> would allow us to directly express role instance configs on a node and
>> configs against role groups.
>>
>> So:
>>
>> 5) applicable_target=service:instance:roletype, scope=node
>>
>> This is a role instance config
>>
>> 6) applicable_target=service:instance:roletype, scope=node_group
>>
>> This is a role group config, roughly speaking. As CM role groups need not
>> be aligned across services, it implies a stricter model than CM allows, but
>> I think it's workable. Exposing role groups as a full capability would
>> probably be challenging, and I think anyone wanting to use this would want
>> to use the convert() api and provide a CM deployment descriptor.
>>
>> 7) applicable_target=service:instance:roletype, scope=cluster
>>
>> This would not be supported, as it doesn't map to any remaining concept.
>> Also, (4) would not be used for anything either.
>>
>> Does this seem like a reasonable thing to do - perhaps not for phase 2
>> given the current timing, but beyond that?
>>
>> Thanks,
>>
>> --phil
>>  --
>> Mailing list: https://launchpad.net/~savanna-all
>> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~savanna-all
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help   : https://help.launchpad.net/ListHelp
>
>
>
Follow ups

Re: Some thoughts on configuration
From: Dmitry Mescheryakov, 2013-05-30
References

Some thoughts on configuration
From: Philip Langdale, 2013-05-29
Re: Some thoughts on configuration
From: Sergey Lukjanov, 2013-05-29
Re: Some thoughts on configuration
From: Philip Langdale, 2013-05-29
Re: Some thoughts on configuration
From: Jon Maron, 2013-05-29