savanna-all team mailing list archive
-
savanna-all team
-
Mailing list archive
-
Message #00064
Re: advanced hadoop configuration
Folks,
We've adjusted our specs to incorporate the case when cluster is created
from provider-specific configuration file. The "Cluster Lifecycle for
Config File Mode" covers this case briefly.
The spec is attached. I will upload it to the wiki once they fix
authentication.
Thanks,
Dmitry
2013/5/14 John Speidel <jspeidel@xxxxxxxxxxxxxxx>
> Thanks Ruslan, we are very happy that there is agreement to include
> advanced Hadoop configuration functionality in Savanna.
> Now we can focus on the technical aspects of implementing this
> functionality.
>
> I would like to take some time to clarify a possible misunderstanding in
> the advanced configuration approach previously described.
>
> The plugin specific advanced hadoop configuration would not dictate any
> specific VM configuration. In the advanced use case, the user would still
> need to specify VM related information independent of the hadoop
> configuration which would include VM count, flavor of VM's, etc. In the
> future, we may need to allow additional VM related information to be
> provided such as rack, physical host, etc., but this would not be in the
> hadoop configuration. This information is used to provision all VM's. The
> module responsible for provisioning the VM's would not need any Hadoop
> related information from the advanced configuration. The VM would be
> provisioned with no knowledge from the hadoop configuration. It would not
> know anything about hadoop services, roles, etc. The VM's would basically
> be Hadoop agnostic. Basically, the VM provisioning module would be
> responsible for interacting with OpenStack (nova) to provision vanilla VM's
> based on a VM configuration which would contain instance count, VM images,
> flavors of the VM's and potentially rack and other topology related
> information. In the VM configuration it would be possible to specify the
> image/flavor for all VM's.
>
> At the completion of the VM provisioning step, no Hadoop related
> configuration/provisioning has occurred on any VM.
>
> After all of the VM's have been provisioned, the Hadoop plugin would be
> invoked and the advanced Hadoop configuration as well as VM cluster
> information would be provided to the Hadoop plugin. The VM information
> would contain specific information about all VM's that have been
> provisioned in the previous step. For each VM that was provisioned, the VM
> configuration would contain VM properties that would describe the VM
> properties to the Hadoop plugin. This would include things such as VM
> flavor, image, networking information, rack, physical host, etc. The
> hadoop configuration would specify all hadoop services as well as rules for
> mapping services/roles to a set of physical VM's, which have already been
> provisioned. These rules would utilize the properties provided in the VM
> configuration. For example, a configuration could dictate that all master
> services would run on a single VM which had a minimum of 2G RAM and that
> all slave roles run on every other machine. These role mapping rules would
> be in simple query form such as MEMORY > 2G and DISK_SPACE > X. The rules
> would not dictate the number of necessary hosts for a given cluster. After
> the Hadoop provider determined which services/roles would be placed on each
> VM, the Hadoop provider would be responsible for installing all Hadoop
> bits, configuring and starting all services on each VM. For HDP, this
> would be done using Ambari.
>
> The important thing that I am trying to convey is that VM configuration is
> distinct from hadoop configuration. The VM plugin would provision vanilla
> VM's(may use hadoop plugin image with local repos installed) and the Hadoop
> plugin would map services/roles to the VM's that have been provisioned
> based on simple rules in the advanced configuration.
>
> If you feel that there is still a compelling reason that the controller
> would need information from the advanced hadoop configuration to provision
> VM's, please provide specific details.
>
> Thanks,
> -John
>
>
>
>
>
>
>
>
>
> On 5/13/13 1:10 PM, Ruslan Kamaldinov wrote:
>
> Jon, John,
>
> We are concerned that proposed architecture will not allow user to
> configure Hadoop and OpenStack at the same time. It allows to configure
> Hadoop, but doesn’t allow to configure OpenStack: flavor, swift, etc. It
> also doesn't allow user to specify flavor per node, what we usually do
> when we deploy Hadoop on real hardware.
>
> We understand that advanced Hadoop configuration is important feature
> for you. And we absolutely don’t want to restrict this feature.
>
> So, here is how this problem could be avoided:
> - User passes advanced Hadoop config to Savanna
> - Savanna passes this config to Plugin plugin.convert_advanced_config()
> - Plugin returns template for the cluster which is understandable by
> Savanna. Template might contain that advanced config unmodified. It can be
> just an inner json object inside the plugin-specific template. Template
> should also contain information about the number and types of nodes in the
> cluster
> - User maps OpenStack-specific configuration to this template. For
> example, disk mapping for HDFS, node placement, flavor of each node (or of
> node groups).
>
> We also like our approach because we will be able to use the same
> standard flow which is already designed. What do you think?
>
> I understand that current blueprint for hierarchical templates is not
> complete and we definitely need to update it. We are working on this. Once
> document is updated we hope that Advanced Hadoop configuration will fit
> into hierarchical templates architecture.
>
>
> And we agree with your vision on separation of responsibilities in
> Savanna:
> - Savanna core manages OpenStack
> - Plugin manages Hadoop and Hadoop management tool
>
>
> Thanks,
> Ruslan
>
> On Saturday, May 11, 2013 at 8:14 PM, Jon Maron wrote:
>
> It may also be helpful to see a representative sample of a
> configuration you envision passing to the controller.
>
> On May 11, 2013, at 11:59 AM, John Speidel <jspeidel@xxxxxxxxxxxxxxx>
> wrote:
>
> Ruslan,
>
> It would be helpful if you could describe how the controller would use
> the data that you mention, DN placement, HDFS, etc., while provisioning
> vm's.
>
> Thanks,
> John
>
> On 5/11/13 10:09 AM, Jon Maron wrote:
>
>
> On May 11, 2013, at 8:46 AM, Ruslan Kamaldinov <rkamaldinov@xxxxxxxxxxxx>
> wrote:
>
> *
> > I don't believe that openstack currently has rack awareness?
> This is one of the main goals of Savanna. It is targeted for phase 2.
> Quote from the roadmap:
> Hadoop cluster topology configuration parameters
> - Data node placement control
> - HDFS location
> - Swift integration
>
>
> While your approach targets all the Hadoop related configs, it misses
> all the OpenStack related configurations. Advanced Hadoop cluster will
> require advanced OpenStack configuration: Swift, Cinder, placement control,
> etc.
> We **have to** give user control over both worlds: Hadoop and OpenStack.
> Giving control to the plugin means that user will lose control over
> OpenStack-related configuration.
> *
>
>
> I do not disagree. I just feel that there should strive to have the
> configuration structures in such a way that the VM configuration element of
> the controller doesn't need to process Hadoop configuration and the Hadoop
> plugin doesn't need to comprehend VM related configuration. We are
> striving for a design that allows each component of the savanna system to
> process their configuration alone while having enough information about the
> system to make appropriate decisions. So I'd view the goal to be:
>
> 1) controller assembles information, based on use input, that has both
> VM cluster and Hadoop cluster information.
> 2) The VM cluster configuration is passed to a VM provisioning component.
> The output of that invocation is a VM cluster spec with server instances
> that provide information about their characteristics.
> 3) The controller passes the Hadoop cluster configuration (either
> standard or advanced) and the VM cluster spec to the Hadoop plugin.
> 4) The plugin leverages the configuration it is provided, and the set of
> VMs it is made aware of via the VM cluster spec, to execute the appropriate
> package installations, configuration file edits, etc to setup the Hadoop
> cluster on the given VMs.
>
> I think this allows for the cleanest separation of responsibilities and
> for the most effective and extensible design for savanna. I think we
> should follow this approach to drive the structures we come up with to
> designate the cluster and Hadoop configurations.
>
> *
>
> Hierarchical node/cluster templates (see
> https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates)
> were designed specifically to support both Hadoop and OpenStack advanced
> configurations.
> *
>
>
> We don't object to the template approach. It'll probably cover a great
> deal of the scenarios we may encouter. However, we've just been through
> enough similar efforts to realize that:
>
> 1) There are always edge cases that need the most flexible approach
> 2) Users like to use existing assets (e.g. Ambari blueprints they've
> already assembled in a non-openstack/VM environment). They will resent or
> resist having to learn a new management mechanism on top of the one they
> already understand and implement.
>
> *
> If you think, that current design misses something, that something
> doesn't allow to support "Hadoop Blueprint Specification" let's discuss it.
> It was designed to support such configurations and it **has to support
> them**.
>
>
> Thanks,
> Ruslan
>
> *
>
> On Sat, May 11, 2013 at 1:17 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx>wrote:
>
>
> On May 10, 2013, at 4:35 PM, Ruslan Kamaldinov <rkamaldinov@xxxxxxxxxxxx>
> wrote:
>
> Hi John,
>
> If controller doesn't know anything about services which will run on
> VMs, then it will not be able to place them correctly. The whole cluster
> might end up on one physical machine (or rack).
>
>
> I don't believe that openstack currently has rack awareness? In
> addition, the controller doesn't need actual service or hadoop information
> to make a determination about which physical machine to utilize (I think
> that would actually be a mistake and could limit the controllers ability to
> extend to other potential scenarios). Rather, if we deem it necessary we
> could create some additional VM specific configuration it can utilize to
> appropriately provision the VMs, independent of the hadoop configuration.
> We think it'd be a mistake to expect the controller in general to
> interpret hadoop specific information (standard or advanced). The
> controller is simply providing services and managing the cluster creation
> workflow. There should be a clear VM provisioning element that reads the
> VM specific configuration and provisions accordingly, and then the hadoop
> configuration (standard or advanced), along with the vm specs, should be
> passed to the plugin and allow it to proceed with service/component
> installations.
>
>
> That's why we need to pass more detailed config to the controller, so it
> would be able to place VMs in correct place. And we can't have this logic
> inside the plugin.
>
>
> I don't quite understand your concern.
>
> The controller is going to deal with the VM provisioning element and
> request it to create the VMs based on the information provided (number of
> VMs, flavors). The VM information will then be related to the plugin
> within the vm_specs object. Then, given a list of VMs and their
> characteristics, the plugin will be able to select the appropriate VMs to
> install the various hadoop services based on predicates available within
> the hadoop cluster configuration within the advanced configuration file.
> For example, for the name node the hadoop configuration may include a min
> memory requirement. The plugin will be able to iterate thru the list of
> VMs and find one that has the appropriate amount of memory. Once a VM is
> found that meets all the criteria listed for the given component, the
> installation can proceed.
>
> It was indicated to us that the p
>
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help : https://help.launchpad.net/ListHelp
>
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
> --
> Mailing list: https://launchpad.net/~savanna-all
> Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~savanna-all
> More help : https://help.launchpad.net/ListHelp
>
>
Attachment:
InteroperabilitySpecs.pdf
Description: Adobe PDF document
References
-
advanced hadoop configuration
From: John Speidel, 2013-05-08
-
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10
-
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-10
-
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10
-
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-10
-
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-10
-
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-11
-
Re: advanced hadoop configuration
From: John Speidel, 2013-05-11
-
Re: advanced hadoop configuration
From: Jon Maron, 2013-05-11
-
Re: advanced hadoop configuration
From: Ruslan Kamaldinov, 2013-05-13
-
Re: advanced hadoop configuration
From: John Speidel, 2013-05-13