← Back to team overview

savanna-all team mailing list archive

Re: Some questions regarding configuration

 

Jon,

You should familiarize with the concept of templates (
https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates).
They simplify the process of cluster configuration. In the general case,
user chose templates and starts Hadoop cluster from one click from it. All
configuration parameters for cluster will be contained in them. Your
concerns about fast increasing number of nodes types are correct, and your
interface describing the interaction between plugin and Savanna is great.


I can suggest following structure for node templates:

{
   flavor: m1.tiny,
   node_group: slaves,
   components: [
        {
            name: ”data node”,
            config: configuration_parameter_map,
        },
        {
            name: ”task tracker”,
            config: configuration_parameter_map,
        }
   ]
}


Methods get_supported_node_groups(), get_supported_components (node_group),
get_configs(component) should be implemented on the Hadoop provider side.
Method create_node_group(name, components[]) is not need and will be
covered with changes in templates mechanism and will be in Savanna side.

We will update documentation on Monday.

Alexander Kuznetsov.


On Wed, May 8, 2013 at 1:59 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx> wrote:

> Hi,
>
>  As we understand it, the current configuration approach has the following
> key APIs:
>
>  1)  Plugins return the set of node types via the
> get_supported_node_types() call.  The return value is a list of strings
> that describe the current set of node types supported by this plugin.
>  2)  Plugins return the set of configuration items they support via the
> get_configs() API call.  The configuration items currently appear to be
> mapped to components(?).
>  3)  The cluster descriptions used during validation (validate_cluster())
> and cluster launch (configure_cluster(), start_cluster()) take the
> following arguments:
> - cluster_description:
>  - cluster_name
>  - cluster_configs
>  - hadoop_version
>  - vm_groups
>
>  where vm_groups is a list of vm_group instances.  vm_group has the
> following attributes:
>  - node_type
>  - flavor
>  - configs
>  - count
>
>  So, the sequence of configuration-associated steps (and the* *issues/questions
> we see with each) during cluster provisioning are (assuming no pre-existing
> node template):
>
> - user selects a cluster name
> - user selects a plugin
> - controller queries for supported hadoop versions from selected plugin
> (get_versions())
> - user selects a cluster template
> * - this appears to be a new concept - I can't find mentions of it
> elsewhere?  Is that where the cluster level config items are to be
> edited/created (alluded to in "cluster_configs" above)?
> * - controller calls get_supported_node_types()
> * - There doesn't seem to be a provision for creating new node types.
>  Rather, the plugin returns the set of supported node types.  How do we
> account for new services or tailored combinations of services?
>  - which node types are displayed to the user?  Given the large number of
> services available in a hadoop deployment, the list of services and the
> possible combinations can be rather large
> * - For a selected node type, the set of applicable config items are
> displayed in a "create node template" dialog
>  - *how are the proper config items selected?*  The configs appear to
> have a "component" attribute for each config item, but components are only
> currently encoded into the node type description. The current node type is
> just a string (e.g "jt+nn").  Discerning the set of components by parsing
> the description seems error prone and possible confusing.  We believe there
> is a need for an actual structure:
>
>  node_type:
>  description
>  components[]
>  role
>
>  This would allow for:
>  - descriptions that are more apt for the given plugin (e.g. "master",
> "master with monitoring")
>  - an ability to map the config items to the set of components available
> from a given node type
>  - an ability to discern the role of a give node (currently the "mgmt" vs
> "slave" vs "master" decision seems to be based on the controller's ability
> to parse up the set of components on a node?)
>
> * - There is an implied ability to configure host/node level config
> overrides based on the "configs" attribute of a vm_group.  At what point
> are those entered?
> *
> General Concerns:
>
> - node types - given the large number of services/components, creating a
> set of node types that handles all possible valid combinations seems
> daunting.  For example, assuming we have 5 components that can be validly
> deployed to single node, we would have to define 5! node types to account
> for all possible deployment combinations, wouldn't we?
>
> Perhaps it would be more appropriate to:
>
>  1)  Define a set of agreed upon node groups (e.g. "master", "slave",
> "monitored slave", etc) across all plugins (get_supported_node_groups())
>  2)  Allow plugins to return a set of components per role (e.g. for
> "master" return "job tracker", "name node" etc) (get_supported_components
> (node_group))
>  3)  Allow users to designate the set of components they want to
> associate to each node group (create_node_group(name, components[])
>  4)  Query the plugin for the set of config items they make available per
> component. (get_configs(component))
>
>   We look forward to your responses.
>
> -- Jon
>
>
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References