← Back to team overview

savanna-all team mailing list archive

Some questions regarding configuration



 As we understand it, the current configuration approach has the following key APIs:

 1)  Plugins return the set of node types via the get_supported_node_types() call.  The return value is a list of strings that describe the current set of node types supported by this plugin. 
 2)  Plugins return the set of configuration items they support via the get_configs() API call.  The configuration items currently appear to be mapped to components(?).
 3)  The cluster descriptions used during validation (validate_cluster()) and cluster launch (configure_cluster(), start_cluster()) take the following arguments:
	- cluster_description:
		- cluster_name
		- cluster_configs 
		- hadoop_version
		- vm_groups

		where vm_groups is a list of vm_group instances.  vm_group has the following attributes:
			- node_type
			- flavor
			- configs
			- count

 So, the sequence of configuration-associated steps (and the issues/questions we see with each) during cluster provisioning are (assuming no pre-existing node template):

	- user selects a cluster name
	- user selects a plugin
	- controller queries for supported hadoop versions from selected plugin (get_versions())
	- user selects a cluster template
		- this appears to be a new concept - I can't find mentions of it elsewhere?  Is that where the cluster level config items are to be edited/created (alluded to in "cluster_configs" above)?
	- controller calls get_supported_node_types()
		- There doesn't seem to be a provision for creating new node types.  Rather, the plugin returns the set of supported node types.  How do we account for new services or tailored combinations of services?
		- which node types are displayed to the user?  Given the large number of services available in a hadoop deployment, the list of services and the possible combinations can be rather large
	- For a selected node type, the set of applicable config items are displayed in a "create node template" dialog
		- how are the proper config items selected?  The configs appear to have a "component" attribute for each config item, but components are only currently encoded into the node type description. The current node type is just a string (e.g "jt+nn").  Discerning the set of components by parsing the description seems error prone and possible confusing.  We believe there is a need for an actual structure:


		This would allow for:
			- descriptions that are more apt for the given plugin (e.g. "master", "master with monitoring")
			- an ability to map the config items to the set of components available from a given node type
			- an ability to discern the role of a give node (currently the "mgmt" vs "slave" vs "master" decision seems to be based on the controller's ability to parse up the set of components on a node?)

	- There is an implied ability to configure host/node level config overrides based on the "configs" attribute of a vm_group.  At what point are those entered?
General Concerns:

	- node types - given the large number of services/components, creating a set of node types that handles all possible valid combinations seems daunting.  For example, assuming we have 5 components that can be validly deployed to single node, we would have to define 5! node types to account for all possible deployment combinations, wouldn't we?

	Perhaps it would be more appropriate to:

		1)  Define a set of agreed upon node groups (e.g. "master", "slave", "monitored slave", etc) across all plugins (get_supported_node_groups())
		2)  Allow plugins to return a set of components per role (e.g. for "master" return "job tracker", "name node" etc) (get_supported_components (node_group))
		3)  Allow users to designate the set of components they want to associate to each node group (create_node_group(name, components[])
		4)  Query the plugin for the set of config items they make available per component. (get_configs(component))

  We look forward to your responses.

-- Jon

Follow ups