savanna-all team mailing list archive

Thread
Date
Re: Some questions regarding configuration

To: Alexander Kuznetsov <akuznetsov@xxxxxxxxxxxx>
From: Jon Maron <jmaron@xxxxxxxxxxxxxxx>
Date: Wed, 8 May 2013 11:45:40 -0400
Cc: savanna-all@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAA-P=7peCBkH9kWB+5r9SuAGj4v_tFXZGvJESxbMa9YQqt7h_A@mail.gmail.com>
Great!  We look forward to the documentation updates.  It would probably be beneficial if you could update the mockups to reflect these new concepts (e.g. I think a user should have a view that allows the mapping of node group to flavor, template, and number of instances)

We should also be able to provide our take on the advanced configuration option in the next day or so.

I have another question regarding cluster configuration:

When a user selects the node group, flavor, template and number of instances and clicks on "Launch Cluster", the controller first invokes the VM provisioning element to create the VM instances.  The information returned is provided in the vm_specs object.  However, there doesn't appear to be any indication of the the node group association in the vm_specs.  In other words, there is nothing in the vm_specs or its associated attributes that indicates the node group (or node type) so that the cluster provider can subsequently select the appropriate VM to provision with the correct node type.  Am I missing some correlation?  It seems to me that the server instances should possibly specify the node_group they were provisioned for?

-- Jon

On May 8, 2013, at 10:54 AM, Alexander Kuznetsov <akuznetsov@xxxxxxxxxxxx> wrote:

> Jon,
> 
> You should familiarize with the concept of templates (https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates). They simplify the process of cluster configuration. In the general case, user chose templates and starts Hadoop cluster from one click from it. All configuration parameters for cluster will be contained in them. Your concerns about fast increasing number of nodes types are correct, and your interface describing the interaction between plugin and Savanna is great.
> 
> 
> I can suggest following structure for node templates:
> 
> {
>    flavor: m1.tiny,
>    node_group: slaves,
>    components: [
>         {
>             name: ”data node”,
>             config: configuration_parameter_map,
>         },
>         {
>             name: ”task tracker”,
>             config: configuration_parameter_map,
>         }
>    ]
> }
> 
> 
> Methods get_supported_node_groups(), get_supported_components (node_group), get_configs(component) should be implemented on the Hadoop provider side. Method create_node_group(name, components[]) is not need and will be covered with changes in templates mechanism and will be in Savanna side.  
> 
> We will update documentation on Monday.

> 
> Alexander Kuznetsov.
> 
> 
> On Wed, May 8, 2013 at 1:59 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx> wrote:
> Hi,
> 
>  As we understand it, the current configuration approach has the following key APIs:
> 
>  1)  Plugins return the set of node types via the get_supported_node_types() call.  The return value is a list of strings that describe the current set of node types supported by this plugin. 
>  2)  Plugins return the set of configuration items they support via the get_configs() API call.  The configuration items currently appear to be mapped to components(?).
>  3)  The cluster descriptions used during validation (validate_cluster()) and cluster launch (configure_cluster(), start_cluster()) take the following arguments:
> 	- cluster_description:
> 		- cluster_name
> 		- cluster_configs 
> 		- hadoop_version
> 		- vm_groups
> 
> 		where vm_groups is a list of vm_group instances.  vm_group has the following attributes:
> 			- node_type
> 			- flavor
> 			- configs
> 			- count
> 
>  So, the sequence of configuration-associated steps (and the issues/questions we see with each) during cluster provisioning are (assuming no pre-existing node template):
> 
> 	- user selects a cluster name
> 	- user selects a plugin
> 	- controller queries for supported hadoop versions from selected plugin (get_versions())
> 	- user selects a cluster template
> 		- this appears to be a new concept - I can't find mentions of it elsewhere?  Is that where the cluster level config items are to be edited/created (alluded to in "cluster_configs" above)?
> 	- controller calls get_supported_node_types()
> 		- There doesn't seem to be a provision for creating new node types.  Rather, the plugin returns the set of supported node types.  How do we account for new services or tailored combinations of services?
> 		- which node types are displayed to the user?  Given the large number of services available in a hadoop deployment, the list of services and the possible combinations can be rather large
> 	- For a selected node type, the set of applicable config items are displayed in a "create node template" dialog
> 		- how are the proper config items selected?  The configs appear to have a "component" attribute for each config item, but components are only currently encoded into the node type description. The current node type is just a string (e.g "jt+nn").  Discerning the set of components by parsing the description seems error prone and possible confusing.  We believe there is a need for an actual structure:
> 
> 		node_type:
> 			description
> 			components[]
> 			role
> 
> 		This would allow for:
> 			- descriptions that are more apt for the given plugin (e.g. "master", "master with monitoring")
> 			- an ability to map the config items to the set of components available from a given node type
> 			- an ability to discern the role of a give node (currently the "mgmt" vs "slave" vs "master" decision seems to be based on the controller's ability to parse up the set of components on a node?)
> 
> 	- There is an implied ability to configure host/node level config overrides based on the "configs" attribute of a vm_group.  At what point are those entered?
> 	
> General Concerns:
> 
> 	- node types - given the large number of services/components, creating a set of node types that handles all possible valid combinations seems daunting.  For example, assuming we have 5 components that can be validly deployed to single node, we would have to define 5! node types to account for all possible deployment combinations, wouldn't we?
> 
> 	Perhaps it would be more appropriate to:
> 
> 		1)  Define a set of agreed upon node groups (e.g. "master", "slave", "monitored slave", etc) across all plugins (get_supported_node_groups())
> 		2)  Allow plugins to return a set of components per role (e.g. for "master" return "job tracker", "name node" etc) (get_supported_components (node_group))
> 		3)  Allow users to designate the set of components they want to associate to each node group (create_node_group(name, components[])
> 		4)  Query the plugin for the set of config items they make available per component. (get_configs(component))
> 
>   We look forward to your responses.
> 
> -- Jon
> 
> 
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help   : https://help.launchpad.net/ListHelp
> 
>
Follow ups

Re: Some questions regarding configuration
From: Alexander Kuznetsov, 2013-05-13
References

Some questions regarding configuration
From: Jon Maron, 2013-05-07
Re: Some questions regarding configuration
From: Alexander Kuznetsov, 2013-05-08