← Back to team overview

savanna-all team mailing list archive

Re: Hadoop Provider Integration

 

Eric,

As we agreed, send us more detailed proposal on advanced configuration.

>> Management Tool should likely be renamed 'Hadoop Provider' or the like.
To me "Hadoop Provider" also sounds more clear

>> We are not sure what 'Cluster Template' is. Can you please clarify?
Cluster Template essentially is a list of cluster-wide parameters like HDFS
Replication Factor. They are mentioned in the following blueprint:
https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates

>> It *should* be possible to run the Management Node on the same node as
the Master. The UI does not seem to allow for this?
>> Our main concern with this approach is that the we will need to provide
a large number of combinations of templates and that it could become
unwieldy. In order for this simple configuration to stay simple, we need to
rationalize the template options. It is also for this reason that an
advanced configuration is necessary allowing the user complete freedom in
terms of how the cluster is provisioned and configured (see above note on
advanced configuration).

Seems like this concern is explained in more details in recent Jon's email.
So let's discuss it further in that thread.

>> Can we see a representation of the node template structure and how the
hierarchical templates 'work'? i.e. is this handled through some sort of
naming convention?

Our current thoughts on that are reflected in the blueprint I've referenced
above. It might be not well-defined yet as right now we concentrate on
Provisioning Mechanism.

Thanks,

Dmitry


2013/5/6 Erik Bergenholtz <ebergenholtz@xxxxxxxxxxxxxxx>

> Team,
>
> Provided that the default cluster configuration mechanism stays simple it
> makes sense to provide this functionality as you have specified. However,
> we feel strongly that users will have very specific configurations that
> they will want to specify which will not easily be achieved through this
> mechanism. Therefore, allowing for an 'advanced' configuration where the
> underlying provider configuration facility is leveraged will be necessary.
> In other words, exposing the provider's proprietary configuration is
> required and we believe based on several customer use-cases is very
> important.  This would involve exposing effectively two APIs, one to set
> the configuration and one to get the configuration for a given cluster. i.e.
>
> In the plugin SPI function, create_cluster(...) , named arguments could be
> used to set the cluster configuration.  There could be an argument for the
> map of properties and another for the provider specific configuration.
>  This way the provider knows which form of configuration is receiving for
> the create cluster call.
>
> get_cluster_template() - where the provider specific configuration is
> returned. This configuration can be persisted by the user, modified and
> used in future invocations of set_cluster_template.
>
> To be clear, a user would be able to either use the currently proposed
> configuration mechanism, or, the advanced configuration option.  Also, the
> user could potentially provide some information in the "simple
> configuration" and then generate a provider specific configuration that
> could be edited and supplied as an advanced configuration.  This generation
> of a plugin specific configuration based on a simple configuration would
> require an additional SPI call such as create_configuration(properties).
>
> Other comments in-line:
>
>
> On May 3, 2013, at 5:20 AM, Dmitry Mescheryakov <
> dmescheryakov@xxxxxxxxxxxx> wrote:
>
> Team,
>
> I've created the mockups for UI dialogs responsible for launching cluster
> and creating node template:
>
> https://wiki.openstack.org/wiki/File:Savanna-create-cluster-mockup.png
>
> Management Tool should likely be renamed 'Hadoop Provider' or the like.
> We are not sure what 'Cluster Template' is. Can you please clarify?
> It *should* be possible to run the Management Node on the same node as the
> Master. The UI does not seem to allow for this?
> Our main concern with this approach is that the we will need to provide a
> large number of combinations of templates and that it could become
> unwieldy. In order for this simple configuration to stay simple, we need to
> rationalize the template options. It is also for this reason that an
> advanced configuration is necessary allowing the user complete freedom in
> terms of how the cluster is provisioned and configured (see above note on
> advanced configuration).
>
> https://wiki.openstack.org/wiki/File:Savanna-node-template-mockup.png
> I think they should help understand our view in the whole, and clarify our
> Plugin API design reasoning. IMHO, the create cluster dialog maps pretty
> nicely to cluster_description object from the Plugin API. The main thing
> which is different in the API is that Node Templates are replaced with
> configs.
>
> Can we see a representation of the node template structure and how the
> hierarchical templates 'work'? i.e. is this handled through some sort of
> naming convention?
>
>
> Initially we introduced Node Templates to let user specify some parameters
> only once during template creation. When user creates a cluster, he don't
> need to specify each parameter, he just needs to select one of the
> templates. To simplify Node Templates usage and management, we want to
> organize them in hierarchical structure, so that a template can extend
> another template. This approach will let user specify most parameters in
> base template, and override only few of them in each descending template.
>
> As you can see from the second mockup, a template in nutshell is just a
> list of key-value pairs.
>
> We don't want to burden plugin with processing Templates, deciding who
> inherits who, etc. We want Savanna to handle this and provide plugin only
> the resulting list of configs (key-value pairs), hence such transition from
> UI to API.
>
> Hope that clarifies,
>
> Dmitry
>
>
>
> 2013/5/2 Dmitry Mescheryakov <dmescheryakov@xxxxxxxxxxxx>
>
>> Erik, team,
>>
>> We've composed a document which more precisely describes our vision of
>> Plugin API:
>>
>> https://wiki.openstack.org/w/images/1/19/Savanna-plugin-api.pdf
>>
>> The document mostly focuses on describing functions we've already
>> introduced. And it almost does not incorporate your input yet, this is work
>> to be done.
>>
>> Thanks,
>>
>> Dmitry
>>
>>
>>
>>
>> 2013/5/1 Dmitry Mescheryakov <dmescheryakov@xxxxxxxxxxxx>
>>
>>> *
>>> Hey Erik, team,
>>>
>>> Thank you for deeper dive into plugin mechanism architecture. That is
>>> really a step forward.
>>>
>>> Let us discuss the general approach first. When we started the
>>> architecture design, we also first thought of an IoC concept, similar to
>>> what you suggest. But after some thinking, we found that it is better to
>>> split one big “create cluster” call into a number of smaller ones,
>>> retaining more control on the core side. The benefits of such approach (and
>>> disadvantages of the opposite one) are:
>>>
>>> * In case of one “create cluster” call each plugin will be allowed to
>>> have different behavior, which means we will introduce a lot of undefined
>>> behavior. Especially in error cases, because each plugin could handle error
>>> case differently. Another thing is that handling such cases might require
>>> deep knowledge of OpenStack from plugin creator. Our goal is to simplify
>>> plugin creation process by handling all the OpenStack related logic inside
>>> the Savanna code, not in the plugin code.
>>>
>>> In case of separate calls for each step:
>>>  * Plugin is separated into several consecutive parts/methods.
>>> Transitions between these methods might be persisted which would increase
>>> reliability of the workflow.
>>> In perspective, it might allow to run plugin in distributed environment.
>>>
>>> * Separate and defined by API plugin methods allow code reuse. It also
>>> serves as a documentation of plugins responsibilities.
>>>
>>> * It’ll allow timeout handling for each step on the core side.
>>>
>>>
>>> Methods for interacting directly with a VM/Server instance like
>>> install(), open_file(), interactive execute() don’t seem to be relevant to
>>> the Plugin API. It’s better to keep such method away from the API to keep
>>> that API as simple as possible. It can be just a set of helper methods
>>> under utils package.
>>>
>>> We suggest to move provider specific details to separate blueprints.
>>> Examples are “HDP specific details” in create cluster flow and in add hosts
>>> flow.
>>>
>>> Thanks,
>>>
>>> Dmitry, on behalf of Mirantis team
>>>
>>> *
>>>
>>>
>>> 2013/4/30 Erik Bergenholtz <ebergenholtz@xxxxxxxxxxxxxxx>
>>>
>>>> Dmitry - uploaded a new doc that looks a bit better:
>>>> https://wiki.openstack.org/w/images/5/5e/Savanna_Deployment_Engine_Architecture.pdf
>>>>
>>>> Erik
>>>>
>>>> On Apr 30, 2013, at 4:20 PM, Dmitry Mescheryakov <
>>>> dmescheryakov@xxxxxxxxxxxx> wrote:
>>>>
>>>> Hey Erik,
>>>>
>>>> Some tables in "Savanna Deployment Engine Architecture" doc are
>>>> flattened out, see attached screenshot for example. Could you reassemble
>>>> the PDF with correct tables' sizes?
>>>>
>>>> Thanks,
>>>>
>>>> Dmitry
>>>>
>>>>
>>>> 2013/4/30 Erik Bergenholtz <ebergenholtz@xxxxxxxxxxxxxxx>
>>>>
>>>>> Team - yet another update. See
>>>>> https://wiki.openstack.org/w/images/9/97/Savanna_hadoop_host_group_mapping.pdf for
>>>>> a document illustrating how hadoop nodes get mapped to provisioned VMs.
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>> On Apr 30, 2013, at 11:07 AM, Erik Bergenholtz <
>>>>> ebergenholtz@xxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Team,  John has updated the below referenced documents with better
>>>>> descriptions of the flows (same links apply).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Erik
>>>>>
>>>>> On Apr 30, 2013, at 6:35 AM, Erik Bergenholtz <
>>>>> ebergenholtz@xxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Team,
>>>>>
>>>>> Below are a few documents intended to describe only a slightly
>>>>> modified approach to hadoop provider integration into Savanna (see existing
>>>>> blueprint in draft:
>>>>> https://blueprints.launchpad.net/savanna/+spec/pluggable-cluster-provisioning). These
>>>>> documents should not be considered a blueprint, but a vehicle for
>>>>> continuing discussion on the topic.
>>>>>
>>>>> To summarize there are two changes to note:
>>>>>
>>>>> 1. There is some IoC introduced into the design allowing hadoop plugin
>>>>> providers flexibility to integrate into Savanna while reducing the burden
>>>>> on the Savanna controller itself. This differs from the current approach of
>>>>> the controller invoking APIs on the provider at specific lifecycle points.
>>>>>
>>>>> 2. Attempts have been made to keep normalization of management APIs
>>>>> across providers at a minimum at the controller level. Our view is that
>>>>> existing Hadoop users are already familiar with their Hadoop distribution
>>>>> management API (CDH, Ambari, MapR etc.) and as such would want to leverage
>>>>> existing investments vs. learning a new management API specific to Savanna.
>>>>> This eases adoption and lowers the barrier of entry of adoption of Hadoop
>>>>> on OpenStack.
>>>>>
>>>>> Documents to review:
>>>>>
>>>>> Savanna Deployment Engine Architecture<https://wiki.openstack.org/w/images/5/5e/Savanna_Deployment_Engine_Architecture.pdf> -
>>>>> Puts forth the architecture of the deployment engine
>>>>> Savanna_add_hosts_flow<https://wiki.openstack.org/w/images/d/dc/Savanna_add_hosts_flow.pdf> -
>>>>> Describes sequence of steps executed in order to add a host to an existing
>>>>> Hadoop Cluster
>>>>> Savanna_create_cluster_flow<https://wiki.openstack.org/w/images/0/0f/Savanna_create_cluster_flow.pdf> -
>>>>> Describes the sequence of steps executed in order to create a new cluster
>>>>> Savanna_invoke_provider_rest_api_flow<https://wiki.openstack.org/w/images/a/a6/Savanna_invoke_provider_rest_api_flow.pdf> -
>>>>> Describes the sequence of steps executed in a REST request by the provider
>>>>> plugin on the controller.
>>>>>
>>>>> Please review at your earliest convenience and let us know your
>>>>> feedback.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jon Maron, John Speidel and Erik Bergenholtz
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~savanna-all
>>>>> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~savanna-all
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>> <Screen Shot 2013-04-30 at 1.15.11 PM.png>
>>>>
>>>>
>>>>
>>>
>>
>
>

References