savanna-all team mailing list archive

Thread
Date
Re: advanced hadoop configuration

To: Jon Maron <jmaron@xxxxxxxxxxxxxxx>
From: Ruslan Kamaldinov <rkamaldinov@xxxxxxxxxxxx>
Date: Sat, 11 May 2013 16:46:01 +0400
Cc: "savanna-all@xxxxxxxxxxxxxxxxxxx" <savanna-all@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <C87EF458-00E1-4045-A1BF-701170BCA1B0@hortonworks.com>
*

> I don't believe that openstack currently has rack awareness?

This is one of the main goals of Savanna. It is targeted for phase 2. Quote
from the roadmap:

Hadoop cluster topology configuration parameters

- Data node placement control

- HDFS location

- Swift integration


While your approach targets all the Hadoop related configs, it misses all
the OpenStack related configurations. Advanced Hadoop cluster will require
advanced OpenStack configuration: Swift, Cinder, placement control, etc.

We **have to** give user control over both worlds: Hadoop and OpenStack.
Giving control to the plugin means that user will lose control over
OpenStack-related configuration.


Hierarchical node/cluster templates (see
https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates) were
designed specifically to support both Hadoop and OpenStack advanced
configurations.

If you think, that current design misses something, that something doesn't
allow to support "Hadoop Blueprint Specification" let's discuss it. It was
designed to support such configurations and it **has to support them**.


Thanks,

Ruslan

*

On Sat, May 11, 2013 at 1:17 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx> wrote:

>
> On May 10, 2013, at 4:35 PM, Ruslan Kamaldinov <rkamaldinov@xxxxxxxxxxxx>
> wrote:
>
> Hi John,
>
> If controller doesn't know anything about services which will run on VMs,
> then it will not be able to place them correctly. The whole cluster might
> end up on one physical machine (or rack).
>
>
> I don't believe that openstack currently has rack awareness?  In addition,
> the controller doesn't need actual service or hadoop information to make a
> determination about which physical machine to utilize (I think that would
> actually be a mistake and could limit the controllers ability to extend to
> other potential scenarios).  Rather, if we deem it necessary we could
> create some additional VM specific configuration it can utilize to
> appropriately provision the VMs, independent of the hadoop configuration.
>  We think it'd be a mistake to expect the controller in general to
> interpret hadoop specific information (standard or advanced).  The
> controller is simply providing services and managing the cluster creation
> workflow.  There should be a clear VM provisioning element that reads the
> VM specific configuration and provisions accordingly, and then the hadoop
> configuration (standard or advanced), along with the vm specs, should be
> passed to the plugin and allow it to proceed with service/component
> installations.
>
>
> That's why we need to pass more detailed config to the controller, so it
> would be able to place VMs in correct place. And we can't have this logic
> inside the plugin.
>
>
> I don't quite understand your concern.
>
> The controller is going to deal with the VM provisioning element and
> request it to create the VMs based on the information provided (number of
> VMs, flavors).   The VM information will then be related to the plugin
> within the vm_specs object.   Then, given a list of VMs and their
> characteristics, the plugin will be able to select the appropriate VMs to
> install the various hadoop services based on predicates available within
> the hadoop cluster configuration within the advanced configuration file.
>  For example, for the name node the hadoop configuration may include a min
> memory requirement.  The plugin will be able to iterate thru the list of
> VMs and find one that has the appropriate amount of memory.  Once a VM is
> found that meets all the criteria listed for the given component, the
> installation can proceed.
>
> It was indicated to us that the plugin will have the responsibility of
> installing services, not the controller.
>
>
>
> Here is what we can do:
> Add rest api call:
> /v0.3/plugin/ambari/actions/convertAdvancedConfToSavannaConf
>
> User will be able to use advanced configs and Savanna will be able to
> process advanced configs, at least that part which is required to properly
> provision VMs.
>
>
> The advanced configuration is much richer than the standard approach's
> configuration items (it may include required packages, host selection
> predicates, etc).  One of the reasons it is proposed is to specifically
> handle cases that the savanna standard configuration simply can't handle
> (it's not the only reason - we've already expressed all the other drivers).
>  I don't believe a conversion would work.
>
>
>
> Thanks,
> Ruslan
>
>
> On Sat, May 11, 2013 at 12:24 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx>wrote:
>
>> John answered most of your questions below.  One more note inline…
>>
>> On May 10, 2013, at 4:06 PM, Ruslan Kamaldinov <rkamaldinov@xxxxxxxxxxxx>
>> wrote:
>>
>> Jon, John,
>>
>> Could you please shed more light on how these Advanced configs could be
>> processed by Savanna controller?
>>
>> There is an example stack configuration in "Hadoop Blueprint
>> Specification". And there is "cardinality" field - in our case it's the
>> number of VMs per specific service, data-node for example.
>>
>> Let's imagine user passed such config to Savanna and defined two VM
>> Groups (
>> https://wiki.openstack.org/wiki/File:Savanna_Create_Cluster_Mockup_-_Advanced_Tab.png
>> ).
>>
>> What happens then? How will Savanna controller be able to create VMs with
>> specific to service properties? How will it  be possible to use different
>> data node placement options?
>>
>> How will Savanna be able to store cluster information in templates (see
>> https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates)?
>>
>>
>> The hierarchical templates will still play a role in the "standard"
>> processing.  But for advanced configuration the relevant configuration will
>> be in the provider specific configuration.  In addition, the mechanisms to
>> persist the provider configuration (advanced) will also exist.
>>
>>
>>
>> Thanks,
>> Ruslan
>>
>>
>> On Fri, May 10, 2013 at 6:56 PM, Jon Maron <jmaron@xxxxxxxxxxxxxxx>wrote:
>>
>>> Hi,
>>>
>>>   We have uploaded some mockups that illustrate the "Advanced"
>>> configuration mechanism we've been proposing:
>>>
>>>
>>> http://wiki.openstack.org/wiki/File:Savanna_Create_Cluster_Mockup_-_Standard_Tab.png
>>>
>>>
>>> http://wiki.openstack.org/wiki/File:Savanna_Create_Cluster_Mockup_-_Advanced_Tab.png
>>>
>>>   The advanced mechanism essentially leverages existing APIs
>>> (configure_cluster(), create_cluster()) but the cluster description
>>> parameter passed to those methods includes the user selected configuration
>>> file that is specific to the Hadoop provider rather than the standard list
>>> of configuration items.
>>>
>>> -- Jon
>>>
>>> On May 8, 2013, at 6:40 PM, John Speidel <jspeidel@xxxxxxxxxxxxxxx>
>>> wrote:
>>>
>>>  Here are more details on the advanced Hadoop configuration that we
>>> discussed the other day.
>>>
>>>
>>> Savanna Advanced Hadoop Configuration****
>>>
>>>
>>> In addition to the proposed “config items” based Hadoop configuration,
>>> it will be necessary to provide an advanced configuration mechanism in
>>> Savanna.  This mechanism should allow for very fine-grained and
>>> extensive configuration of a Hadoop cluster provisioned by Savanna.  It
>>> is expected that a user would likely use the simple node group based
>>> configuration for cases where little configuration is required and use the
>>> advanced configuration where more control is desired.  The advanced
>>> cluster configuration would be specific to a Hadoop plugin and it’s content
>>> opaque to the Savanna controller.****
>>>
>>>
>>> For reference, here is a link to the Hadoop Blueprint Specification<https://issues.apache.org/jira/browse/AMBARI-1783>proposed by the Ambari.
>>> ****
>>>
>>> Advanced Hadoop Configuration Use Cases****
>>>
>>> ·      A user has an existing on premise or non-virtualized cluster and
>>> wants to clone the cluster  (topology/configuration not data) in a
>>> virtualized environment using Savanna.****
>>>
>>>
>>> In this case, the user will export a configuration for the existing
>>> cluster using provider/management product specific tooling.  This
>>> configuration can then be used to create a new cluster using Savanna.***
>>> *
>>>
>>>
>>> ·      A user wants to provision a new cluster in a virtualized
>>> environment using savanna and needs very fine-grained control of the Hadoop
>>> cluster configuration.    This could include configuration of host
>>> level roles, configuration of a large number of properties across many
>>> optional services and potentially even Hadoop stack configuration related
>>> to packages and repository locations.****
>>> Changes to UI Workflow****
>>>
>>> To allow a user to specify an advanced configuration, some UI changes
>>> are necessary.  ****
>>>
>>> ** **
>>>
>>> The create cluster screen would need an “advanced Hadoop configuration”
>>> tab or button.  In the initial implementation, the advanced
>>> configuration screen would allow a user to specify the location of a plugin
>>> specific configuration file (select file dialog).  This configuration
>>> file would contain all necessary Hadoop related configuration.   In
>>> future releases, we may want a link to provider specific tooling, which
>>> could be used to create/edit provider configurations.****
>>>
>>> ** **
>>>
>>> The UI would still need to allow a user to specify VM details such as
>>> flavor, count, etc., but the user wouldn’t specify node groups or
>>> configuration for the VM’s.  Instead, host/role mapping would be
>>> specified in the provider specific configuration file.  ****
>>>
>>> Changes to Hadoop Plugin SPI****
>>>
>>> The addition of “Advanced Hadoop Configuration” using plugin specific
>>> configuration will result in small changes to the proposed Hadoop Plugin
>>> SPI. ****
>>>
>>>
>>> cluster_description: The cluster description object would need to be
>>> updated to contain an advanced_configuration field in addition to
>>> cluster_configs.  In the case of a user providing an advanced
>>> configuration, it would be available in advanced_configuration and
>>> cluster_configs would be empty.  ****
>>>
>>>
>>> configure_cluster(..)****
>>>
>>> Because the provider specific configuration is opaque to Savanna, it
>>> might be necessary for the plugin to return some cluster topology
>>> information from this method for rendering purposes.  The specifics of
>>> this information would be dependent on what cluster information is required
>>> by Savanna.****
>>>
>>>
>>>
>>>             ****
>>>
>>>                         ****
>>> ** **
>>>
>>> *            *****
>>>
>>>  ****
>>>  --
>>> Mailing list: https://launchpad.net/~savanna-all
>>> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~savanna-all
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>>
>>> --
>>> Mailing list: https://launchpad.net/~savanna-all
>>> Post to     : savanna-all@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~savanna-all
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>
>
References

advanced hadoop configuration
From: John Speidel, 2013-05-08
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-10
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-10
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10