savanna-all team mailing list archive
Mailing list archive
Re: Some thoughts on configuration
Looking at the docs more, is there actually any semantic significance for
Savanna to the service identifiers used in the config objects and
get_configs()? If these strings are opaque to savanna - and it seems kind
of like they are, then it wouldn't require any changes inside Savanna to
support the additional scoping I described - "service:foo:bar" where bar is
the role type. Maybe I'm missing a dependency (in the UI?) on that.
Oh, and one thing I haven't seen discussed, but I'm assuming that config
naming is plugin specific? I don't see anything explicit either way so I'm
assuming that's considered opaque to Savanna.
On 29 May 2013 12:06, Sergey Lukjanov <slukjanov@xxxxxxxxxxxx> wrote:
> Hi Phil,
> very glad to see you participating in Savanna architecture discussions.
> First of all, thank you for such detailed description of CM configuration
> system, terminology and that is especially important - their mappings to
> our current vision. Our team will carefully consider this information.
> As you rightly said, there is no time in the second phase to change the
> core architecture ideas such as configuration system, but in the background
> we can think about it and discuss it and potentially upgrade it in the
> future, for example, in the next phase.
> Sincerely yours,
> Sergey Lukjanov
> Software Engineer
> Mirantis Inc.
> GTalk: me@xxxxxxxxxxx
> Skype: lukjanovsv
> On May 29, 2013, at 22:33, Philip Langdale <philipl@xxxxxxxxxxxx> wrote:
> Hi all,
> As a quick introduction, I'm one of the engineers working on Cloudera
> Manager and have been looking at how it would work in conjunction with
> Savanna, I've been reading through the docs and the recent conversations on
> configuration and scoping, and I'd like to talk a bit about how Cloudera
> Manager handles configuration and how this maps to the Savanna API as I
> currently understand it.
> CM Terminology:
> * Cluster: A logical cluster, which contains a set of hosts and the
> services deployed on those hosts
> * Service Type: A type of service (duh): "HDFS", "MAPREDUCE", etc
> * Service Instance: A concrete instance of a service, running on a
> cluster: "My first HDFS", etc
> * Role Type: A particular type of role within a service: "NAMENODE",
> "TASKTRACKER', etc
> * Role Instance: A concrete instance of a role type, assigned to a
> specific host/node: "NAMENODE-1 on host1.domain.com", etc. Only one
> instance of a given role type can be assigned to a single host
> * Process: The actual running process associated with a role instance. So
> while a process only exists while it's running, the role instance always
> * Role Group: A set of role instances, within a single service, of a
> single role type, that share common configurations.
> * Host: A host - not very profound.
> * Host Template: A set of role groups. When a template is applied to a
> host, for each role group, a role instance is created and assigned to that
> When it comes to configuration, CM defines configs at the Service Type and
> Role Type level. So a given service or role type has a fixed set of
> possible configurations associated with it.
> For example:
> HDFS: Replication Factor (default 3)
> Namenode: Listening Port (default 8020)
> Datanode: Handler Count (default 3)
> and so on.
> When it comes time to set a configuration value, that value is associated
> with an instance. Service type config values are always associated with a
> service instance, but role type config values can be associated with either
> a role group or a role instance - with the role instance value overriding
> the role group value. In this way it's possible to define values that apply
> to a whole group, but also specialize certain instances where necessary.
> At the time a process is started, CM will generate the process' relevant
> config files on the fly, based on a set of internal logic that map configs
> to actual entries in config files (and/or where appropriate, environment
> variables or command line arguments). In most cases, these are 1:1 but
> sometimes the handling is more complicated. For example, when generating
> the fs.default.name, we combine our knowledge of the hostname of the
> Namenode with the user specific listening port config. As these config
> files are generated per-process, they will look different for different
> role types - so a datanode's hdfs-site.xml looks different from a
> namenode's hdfs-site.xml - and only contains the config entries that are
> relevant to it. Configuration files are regenerated every time a role
> instance is (re)started to ensure consistency.
> Some configuration is indirect - coming from dependency services, rather
> than from the service itself. This is modelled through the use of
> dependency configurations. So a mapreduce service instance has a config
> that indicates which hdfs service instance it depends on, and in this way
> it is able to discover the fs.default.name and other relevant
> Finally, services can have a Gateway role type, which indicates a host
> that does not run any processes for a service, but which can act as a
> client for the service. When a host is assigned a gateway role instance, CM
> will ensure that the system-wide config directories in /etc are correctly
> populated to connect to the service. (Remembering that process config files
> are private and per-process, they have no effect on the system-wide
> configuration that client applications see)
> Hosts also have a set of configurations associated with them. Values can
> be defined at the 'all hosts' level or the individual host level.
> Now, with all that said, we can consider how these concepts map to the
> configuration model described in the Provisioning Plugin API.
> The config object:
> Unsurprisingly, most of the fields here are directly mappable, with the
> difficult ones being the applicable_target and the scope.
> Currently defined applicable targets are 'general' and 'service<service
> Currently defined scopes are 'node' or 'cluster'
> Let's now consider how these combinations map to the CM concepts and then
> identify which CM concepts cannot be expressed.
> 1) applicable_target=general, scope=cluster
> This maps to an 'all hosts' configuration
> 2) applicable_target=general, scope=node
> This maps to a 'single host' configuration
> 3) applicable_target=service:instance, scope=cluster
> This maps to a service type configuration
> 4) applicable_target=service:instance, scope=node
> This doesn't exactly map to anything, unfortunately. service type
> configurations cannot be specialized to individual nodes, and the configs
> that apply to an individual node are scoped at the roletype level.
> So, we are left in a somewhat difficult situation where the majority of
> our configurations don't actually map cleanly to anything. Now, we can
> obviously do poor man's namespacing and prefix the config names that expose
> through the plugin (so listening port would be "namenode:listening_port"
> for example). If we did this, we'd be able to map (4) to a role instance
> level config (as there's only one role instance per type per host, we can
> work out which instance a namespaced config applies to)
> Then what does it mean for (3) with a role type config? The only thing it
> can mean is a config assigned to an implicit role group that covers all the
> hosts of the given role type in the cluster.
> This should be functional in the short term, but obviously we'd like to
> more explicitly support these concepts to avoid relying on more fragile
> mechanisms like namespacing.
> In an ideal world, we'd like to be able to have an
> applicable_target=service:instance:roletype and a scope=node_group which
> would allow us to directly express role instance configs on a node and
> configs against role groups.
> 5) applicable_target=service:instance:roletype, scope=node
> This is a role instance config
> 6) applicable_target=service:instance:roletype, scope=node_group
> This is a role group config, roughly speaking. As CM role groups need not
> be aligned across services, it implies a stricter model than CM allows, but
> I think it's workable. Exposing role groups as a full capability would
> probably be challenging, and I think anyone wanting to use this would want
> to use the convert() api and provide a CM deployment descriptor.
> 7) applicable_target=service:instance:roletype, scope=cluster
> This would not be supported, as it doesn't map to any remaining concept.
> Also, (4) would not be used for anything either.
> Does this seem like a reasonable thing to do - perhaps not for phase 2
> given the current timing, but beyond that?
> Mailing list: https://launchpad.net/~savanna-all
> Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help : https://help.launchpad.net/ListHelp