Jon, John,
We are concerned that proposed architecture will not allow user to
configure Hadoop and OpenStack at the same time. It allows to
configure Hadoop, but doesn’t allow to configure OpenStack: flavor,
swift, etc. It also doesn't allow user to specify flavor per node,
what we usually do when we deploy Hadoop on real hardware.
We understand that advanced Hadoop configuration is important feature
for you. And we absolutely don’t want to restrict this feature.
So, here is how this problem could be avoided:
- User passes advanced Hadoop config to Savanna
- Savanna passes this config to Plugin plugin.convert_advanced_config()
- Plugin returns template for the cluster which is understandable by
Savanna. Template might contain that advanced config unmodified. It
can be just an inner json object inside the plugin-specific template.
Template should also contain information about the number and types of
nodes in the cluster
- User maps OpenStack-specific configuration to this template. For
example, disk mapping for HDFS, node placement, flavor of each node
(or of node groups).
We also like our approach because we will be able to use the same
standard flow which is already designed. What do you think?
I understand that current blueprint for hierarchical templates is not
complete and we definitely need to update it. We are working on this.
Once document is updated we hope that Advanced Hadoop configuration
will fit into hierarchical templates architecture.
And we agree with your vision on separation of responsibilities in
Savanna:
- Savanna core manages OpenStack
- Plugin manages Hadoop and Hadoop management tool
Thanks,
Ruslan
On Saturday, May 11, 2013 at 8:14 PM, Jon Maron wrote:
It may also be helpful to see a representative sample of a
configuration you envision passing to the controller.
On May 11, 2013, at 11:59 AM, John Speidel <jspeidel@xxxxxxxxxxxxxxx
<mailto:jspeidel@xxxxxxxxxxxxxxx>> wrote:
Ruslan,
It would be helpful if you could describe how the controller would
use the data that you mention, DN placement, HDFS, etc., while
provisioning vm's.
Thanks,
John
On 5/11/13 10:09 AM, Jon Maron wrote:
On May 11, 2013, at 8:46 AM, Ruslan Kamaldinov
<rkamaldinov@xxxxxxxxxxxx <mailto:rkamaldinov@xxxxxxxxxxxx>> wrote:
*
> I don't believe that openstack currently has rack awareness?
This is one of the main goals of Savanna. It is targeted for phase
2. Quote from the roadmap:
/Hadoop cluster topology configuration parameters/
/- Data node placement control/
/- HDFS location/
/- Swift integration/
While your approach targets all the Hadoop related configs, it
misses all the OpenStack related configurations. Advanced Hadoop
cluster will require advanced OpenStack configuration: Swift,
Cinder, placement control, etc.
We **have to** give user control over both worlds: Hadoop and
OpenStack. Giving control to the plugin means that user will lose
control over OpenStack-related configuration.
*
I do not disagree. I just feel that there should strive to have
the configuration structures in such a way that the VM
configuration element of the controller doesn't need to process
Hadoop configuration and the Hadoop plugin doesn't need to
comprehend VM related configuration. We are striving for a design
that allows each component of the savanna system to process their
configuration alone while having enough information about the
system to make appropriate decisions. So I'd view the goal to be:
1) controller assembles information, based on use input, that has
both VM cluster and Hadoop cluster information.
2) The VM cluster configuration is passed to a VM provisioning
component. The output of that invocation is a VM cluster spec with
server instances that provide information about their characteristics.
3) The controller passes the Hadoop cluster configuration (either
standard or advanced) and the VM cluster spec to the Hadoop plugin.
4) The plugin leverages the configuration it is provided, and the
set of VMs it is made aware of via the VM cluster spec, to execute
the appropriate package installations, configuration file edits,
etc to setup the Hadoop cluster on the given VMs.
I think this allows for the cleanest separation of responsibilities
and for the most effective and extensible design for savanna. I
think we should follow this approach to drive the structures we
come up with to designate the cluster and Hadoop configurations.
*
Hierarchical node/cluster templates (see
https://blueprints.launchpad.net/savanna/+spec/hierarchical-templates)
were designed specifically to support both Hadoop and OpenStack
advanced configurations.
*
We don't object to the template approach. It'll probably cover a
great deal of the scenarios we may encouter. However, we've just
been through enough similar efforts to realize that:
1) There are always edge cases that need the most flexible approach
2) Users like to use existing assets (e.g. Ambari blueprints
they've already assembled in a non-openstack/VM environment). They
will resent or resist having to learn a new management mechanism on
top of the one they already understand and implement.
*
If you think, that current design misses something, that something
doesn't allow to support "Hadoop Blueprint Specification" let's
discuss it. It was designed to support such configurations and it
**has to support them**.
Thanks,
Ruslan
*
On Sat, May 11, 2013 at 1:17 AM, Jon Maron <jmaron@xxxxxxxxxxxxxxx
<mailto:jmaron@xxxxxxxxxxxxxxx>> wrote:
On May 10, 2013, at 4:35 PM, Ruslan Kamaldinov
<rkamaldinov@xxxxxxxxxxxx <mailto:rkamaldinov@xxxxxxxxxxxx>> wrote:
Hi John,
If controller doesn't know anything about services which will
run on VMs, then it will not be able to place them correctly.
The whole cluster might end up on one physical machine (or rack).
I don't believe that openstack currently has rack awareness? In
addition, the controller doesn't need actual service or hadoop
information to make a determination about which physical machine
to utilize (I think that would actually be a mistake and could
limit the controllers ability to extend to other potential
scenarios). Rather, if we deem it necessary we could create some
additional VM specific configuration it can utilize to
appropriately provision the VMs, independent of the hadoop
configuration. We think it'd be a mistake to expect the
controller in general to interpret hadoop specific information
(standard or advanced). The controller is simply providing
services and managing the cluster creation workflow. There
should be a clear VM provisioning element that reads the VM
specific configuration and provisions accordingly, and then the
hadoop configuration (standard or advanced), along with the vm
specs, should be passed to the plugin and allow it to proceed
with service/component installations.
That's why we need to pass more detailed config to the
controller, so it would be able to place VMs in correct place.
And we can't have this logic inside the plugin.
I don't quite understand your concern.
The controller is going to deal with the VM provisioning element
and request it to create the VMs based on the information
provided (number of VMs, flavors). The VM information will then
be related to the plugin within the vm_specs object. Then,
given a list of VMs and their characteristics, the plugin will be
able to select the appropriate VMs to install the various hadoop
services based on predicates available within the hadoop cluster
configuration within the advanced configuration file. For
example, for the name node the hadoop configuration may include a
min memory requirement. The plugin will be able to iterate thru
the list of VMs and find one that has the appropriate amount of
memory. Once a VM is found that meets all the criteria listed
for the given component, the installation can proceed.
It was indicated to us that the p
--
Mailing list: https://launchpad.net/~savanna-all
<https://launchpad.net/%7Esavanna-all>
Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
<mailto:savanna-all@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~savanna-all
<https://launchpad.net/%7Esavanna-all>
More help : https://help.launchpad.net/ListHelp
--
Mailing list: https://launchpad.net/~savanna-all
<https://launchpad.net/%7Esavanna-all>
Post to : savanna-all@xxxxxxxxxxxxxxxxxxx
<mailto:savanna-all@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~savanna-all
<https://launchpad.net/%7Esavanna-all>
More help : https://help.launchpad.net/ListHelp