bigdata-dev team mailing list archive
-
bigdata-dev team
-
Mailing list archive
-
Message #00035
[Merge] lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/trunk
Cory Johns has proposed merging lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/trunk.
Requested reviews:
Juju Big Data Development (bigdata-dev)
For more details, see:
https://code.launchpad.net/~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/readme/+merge/252620
New READMEs and minor relation cleanups
--
Your team Juju Big Data Development is requested to review the proposed merge of lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-yarn-master/trunk.
=== added file 'README.dev.md'
--- README.dev.md 1970-01-01 00:00:00 +0000
+++ README.dev.md 2015-03-11 17:17:08 +0000
@@ -0,0 +1,86 @@
+## Overview
+
+This charm provides computation and storage resources for an Apache Hadoop
+deployment, and is intended to be used only as a part of that deployment.
+This document describes how this charm connects to and interacts with the
+other components of the deployment.
+
+
+## Provided Relations
+
+### resourcemanager (mapred)
+
+This relation connects this charm to the apache-hadoop-client charm.
+The relation exchanges the following keys:
+
+* Sent to the client:
+
+ * `private-address`: Address of this unit, to provide the ResourceManager
+ * `ready`: A flag indicating that YARN is ready to execute map-reduce jobs
+
+* Received from the client:
+
+ *There are no keys received from the client*
+
+Ports will soon be added to both of these.
+
+
+## Required Relations
+
+### namenode (dfs)
+
+This relation connects this charm to the apache-hadoop-hdfs-master charm.
+The relation exchanges the following keys:
+
+* Sent to hdfs-master:
+
+ *There are no keys sent to hdfs-master*
+
+* Received from hdfs-master:
+
+ * `private-address`: Address of the HDFS master unit, to provide the NameNode
+ * `ready`: A flag indicating that HDFS is ready to store data
+
+Ports will soon be added to both of these.
+
+
+### nodemanager (mapred-slave)
+
+This relation connects this charm to the apache-hadoop-compute-slave charm.
+The relation exchanges the following keys:
+
+* Sent to compute-slave:
+
+ * `private-address`: Address of the YARN master unit, to provide the ResourceManager
+ * `ready`: A flag indicating that YARN is ready to register NodeManagers
+
+* Received from compute-slave:
+
+ * `private-address`: Address of the remote unit, to be registered as a NodeManager
+
+Ports will soon be added to both of these.
+
+
+## Manual Deployment
+
+The easiest way to deploy the core Apache Hadoop platform is to use one of
+the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
+However, to manually deploy the base Apache Hadoop platform without using one of the
+bundles, you can use the following:
+
+ juju deploy apache-hadoop-hdfs-master hdfs-master
+ juju deploy apache-hadoop-hdfs-secondary secondary-namenode
+ juju deploy apache-hadoop-yarn-master yarn-master
+ juju deploy apache-hadoop-compute-slave compute-slave -n3
+ juju deploy apache-hadoop-client client
+ juju add-relation yarn-master hdfs-master
+ juju add-relation secondary-namenode hdfs-master
+ juju add-relation compute-slave yarn-master
+ juju add-relation compute-slave hdfs-master
+ juju add-relation client yarn-master
+ juju add-relation client hdfs-master
+
+This will create a scalable deployment with separate nodes for each master,
+and a three unit compute slave (NodeManager and DataNode) cluster. The master
+charms also support co-locating using the `--to` option to `juju deploy` for
+more dense deployments.
=== modified file 'README.md'
--- README.md 2015-02-13 22:32:42 +0000
+++ README.md 2015-03-11 17:17:08 +0000
@@ -1,154 +1,75 @@
## Overview
-This charm is a component of the Apache Hadoop platform. It is intended
-to be deployed with the other components using the bundle:
-`bundle:~bigdata-charmers/apache-hadoop`
-
-**What is Apache Hadoop?**
-
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using a simple programming model.
-It is designed to scale up from single servers to thousands of machines,
-each offering local computation and storage. Rather than rely on hardware
-to deliver high-avaiability, the library itself is designed to detect
-and handle failures at the application layer, so delivering a
-highly-availabile service on top of a cluster of computers, each of
-which may be prone to failures.
-
-Apache Hadoop 2.4.1 consists of significant improvements over the previous stable
-release (hadoop-1.x).
-
-Here is a short overview of the improvments to both HDFS and MapReduce.
-
- - **HDFS Federation**
- In order to scale the name service horizontally, federation uses multiple
- independent Namenodes/Namespaces. The Namenodes are federated, that is, the
- Namenodes are independent and don't require coordination with each other.
- The datanodes are used as common storage for blocks by all the Namenodes.
- Each datanode registers with all the Namenodes in the cluster. Datanodes
- send periodic heartbeats and block reports and handles commands from the
- Namenodes.
-
- More details are available in the HDFS Federation document:
- <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html>
-
- - **MapReduce NextGen aka YARN aka MRv2**
- The new architecture introduced in hadoop-0.23, divides the two major functions of the
- JobTracker: resource management and job life-cycle management into separate components.
- The new ResourceManager manages the global assignment of compute resources to
- applications and the per-application ApplicationMaster manages the application‚
- scheduling and coordination.
- An application is either a single job in the sense of classic MapReduce jobs or a DAG of
- such jobs.
-
- The ResourceManager and per-machine NodeManager daemon, which manages the user
- processes on that machine, form the computation fabric.
-
- The per-application ApplicationMaster is, in effect, a framework specific
- library and is tasked with negotiating resources from the ResourceManager and
- working with the NodeManager(s) to execute and monitor the tasks.
-
- More details are available in the YARN document:
- <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html>
+This charm deploys a node running the ResourceManager component of
+[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/),
+which manages the computation resources and job execution for the platform.
## Usage
-This charm manages the YARN master node, otherwise known as the ResourceManager.
-It is intended to be used with `apache-hadoop-hdfs-master` and
-`apache-hadoop-compute-slave`.
-
-### Simple Usage: Single YARN / HDFS master deployment
-
-In this configuration, the YARN and HDFS master components run on the same
-machine. This is useful for lower-resource deployments::
-
- juju deploy apache-hadoop-hdfs-master hdfs-master
- juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 1
- juju deploy apache-hadoop-yarn-master yarn-master --to 1
- juju deploy apache-hadoop-compute-slave compute-slave
- juju deploy apache-hadoop-client client
- juju add-relation yarn-master hdfs-master
- juju add-relation secondary-namenode hdfs-master
- juju add-relation compute-slave yarn-master
- juju add-relation compute-slave hdfs-master
- juju add-relation client yarn-master
- juju add-relation client hdfs-master
-
-Note that the machine number (`--to 1`) should match the machine number
-for the `hdfs-master` charm. If you previously deployed other services
-in your environment, you may need to adjust the machine number appropriately.
-
-
-### Scale Out Usage: Separate HDFS, YARN, and compute nodes
-
-In this configuration the HDFS and YARN deployments operate on
-different service units as separate services::
-
- juju deploy apache-hadoop-hdfs-master hdfs-master
- juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode
- juju deploy apache-hadoop-yarn-master yarn-master
- juju deploy apache-hadoop-compute-slave compute-slave -n 3
- juju deploy apache-hadoop-client client
- juju add-relation yarn-master hdfs-master
- juju add-relation secondary-namenode hdfs-master
- juju add-relation compute-slave yarn-master
- juju add-relation compute-slave hdfs-master
- juju add-relation client yarn-master
- juju add-relation client hdfs-master
-
-The `-n 3` option can be adjusted according to the number of compute nodes
-you need. You can also add additional compute nodes later::
-
- juju add-unit compute-slave -n 2
-
-
-### TO deploy a Hadoop service with elasticsearch service::
- # deploy ElasticSearch locally:
- **juju deploy elasticsearch elasticsearch**
- # elasticsearch-hadoop.jar file will be added to LIBJARS path
- # Recommanded to use hadoop -libjars option to included elk jar file
- **juju add-unit -n elasticsearch**
- # deploy hive service by any senarios mentioned above
- # associate Hive with elasticsearch
- **juju add-relation {hadoop master}:elasticsearch elasticsearch:client**
+This charm is intended to be deployed via one of the
+[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
+For example:
+
+ juju quickstart u/bigdata-dev/apache-analytics-sql
+
+This will deploy the Apache Hadoop platform with Apache Hive available to
+perform SQL-like queries against your data.
+
+You can also manually load and run map-reduce jobs via the client:
+
+ juju scp my-job.jar client/0:
+ juju ssh client/0
+ hadoop jar my-job.jar
## Deploying in Network-Restricted Environments
-The Apache Hadoop charms support being deployed in environments with limited
-outgoing networking access. To do so, you will need to mirror the resources
-and packages required by the charms.
+The Apache Hadoop charms can be deployed in environments with limited network
+access. To deploy in this environment, you will need a local mirror to serve
+the packages and resources required by these charms.
+
### Mirroring Packages
-You will need to mirror the apt packages, such as using squid-deb-proxy.
-For instructions on setting this up, see the
+You can setup a local mirror for apt packages using squid-deb-proxy.
+For instructions on configuring juju to use this, see the
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
+
### Mirroring Resources
In addition to apt packages, the Apache Hadoop charms require a few binary
-resources, which are normally hosted on Launchpad. However, the `jujuresources`
-library makes it easy to create a mirror of these resources:
+resources, which are normally hosted on Launchpad. If access to Launchpad
+is not available, the `jujuresources` library makes it easy to create a mirror
+of these resources:
sudo pip install jujuresources
- juju resources fetch --all apache-hadoop-yarn-master/resources.yaml -d /tmp/resources
+ juju resources fetch --all apache-hadoop-compute-slave/resources.yaml -d /tmp/resources
juju resources serve -d /tmp/resources
This will fetch all of the resources needed by this charm and serve them via a
-simple HTTP server. You can then use the `resources_mirror` config option to
-point the charm to that server.
+simple HTTP server. You can then set the `resources_mirror` config option to
+have the charm use this server for retrieving resources.
-You can fetch the resources for all of the Apache Hadoop charms into a single
+You can fetch the resources for all of the Apache Hadoop charms
+(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
+`apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single
directory and serve them all with a single `juju resources serve` instance.
## Contact Information
-amir sanjar <amir.sanjar@xxxxxxxxxxxxx>
+
+* Amir Sanjar <amir.sanjar@xxxxxxxxxxxxx>
+* Cory Johns <cory.johns@xxxxxxxxxxxxx>
+* Kevin Monroe <kevin.monroe@xxxxxxxxxxxxx>
+
## Hadoop
+
- [Apache Hadoop](http://hadoop.apache.org/) home page
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
=== modified file 'resources.yaml'
--- resources.yaml 2015-03-06 22:42:42 +0000
+++ resources.yaml 2015-03-11 17:17:08 +0000
@@ -8,8 +8,8 @@
six:
pypi: six
charmhelpers:
- pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
- hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b
+ pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
+ hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c
hash_type: sha256
optional_resources:
hadoop-aarch64:
Follow ups