← Back to team overview

savanna-all team mailing list archive

Hadoop Provider Integration



Below are a few documents intended to describe only a slightly modified approach to hadoop provider integration into Savanna (see existing blueprint in draft: https://blueprints.launchpad.net/savanna/+spec/pluggable-cluster-provisioning). These documents should not be considered a blueprint, but a vehicle for continuing discussion on the topic. 

To summarize there are two changes to note:

1. There is some IoC introduced into the design allowing hadoop plugin providers flexibility to integrate into Savanna while reducing the burden on the Savanna controller itself. This differs from the current approach of the controller invoking APIs on the provider at specific lifecycle points.

2. Attempts have been made to keep normalization of management APIs across providers at a minimum at the controller level. Our view is that existing Hadoop users are already familiar with their Hadoop distribution management API (CDH, Ambari, MapR etc.) and as such would want to leverage existing investments vs. learning a new management API specific to Savanna. This eases adoption and lowers the barrier of entry of adoption of Hadoop on OpenStack.

Documents to review:

Savanna Deployment Engine Architecture - Puts forth the architecture of the deployment engine 
Savanna_add_hosts_flow - Describes sequence of steps executed in order to add a host to an existing Hadoop Cluster
Savanna_create_cluster_flow - Describes the sequence of steps executed in order to create a new cluster
Savanna_invoke_provider_rest_api_flow - Describes the sequence of steps executed in a REST request by the provider plugin on the controller.

Please review at your earliest convenience and let us know your feedback.


Jon Maron, John Speidel and Erik Bergenholtz

Follow ups