bigdata-dev team mailing list archive

Thread
Date

[Merge] lp:~bigdata-dev/charms/trusty/apache-hadoop-client/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-client/trunk

To: mp+252615@xxxxxxxxxxxxxxxxxx
From: Cory Johns <cory.johns@xxxxxxxxxxxxx>
Date: Wed, 11 Mar 2015 17:15:34 -0000
Reply-to: mp+252615@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Cory Johns has proposed merging lp:~bigdata-dev/charms/trusty/apache-hadoop-client/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-client/trunk.

Requested reviews:
  Juju Big Data Development (bigdata-dev)

For more details, see:
https://code.launchpad.net/~bigdata-dev/charms/trusty/apache-hadoop-client/readme/+merge/252615

New READMEs and some minor relation cleanups
-- 
Your team Juju Big Data Development is requested to review the proposed merge of lp:~bigdata-dev/charms/trusty/apache-hadoop-client/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-client/trunk.

=== added file 'README.dev.md'
--- README.dev.md	1970-01-01 00:00:00 +0000
+++ README.dev.md	2015-03-11 17:15:01 +0000
@@ -0,0 +1,131 @@
+## Overview
+
+This charm is intended to serve as a platform for Hadoop client software.
+That is, software such as Apache Hive, or Apache Pig, which need to interact
+with Hadoop as a client, but are not otherwise concerned with the details of
+the particular distribution or deployment.  This charm is intended to make it
+easy to create charms for that client software, by managing the Hadoop
+libraries and connections.
+
+
+## Creating Workload Subordinates
+
+To create a charm which communicates with Hadoop, you only need to implement
+a single relation interface: `hadoop-client`.  Your `metadata.yaml` should
+contain:
+
+    requires:
+      hadoop-client:
+        interface: hadoop-client
+        scope: container
+
+This is a subordinate relation which deploys the new charm alongside the
+Apache Hadoop Client charm.  The benefit of using this subordinate interface
+is that your charm only needs to handle the single relation, it does not need
+to install or manage the Apache Hadoop libraries, and it is decoupled from the
+distribution, enabling easy swapping of the client from one distribution (in
+this case, vanilla Apache Hadoop) with another.
+
+It is recommended that you use the framework pattern and the `HadoopClient`
+relation class defined in `charmhelpers.contrib.bigdata.relations`.  For example:
+
+    from charmhelpers.core.ch_framework import Manager
+    from charmhelpers.contrib.bigdata.relations import HadoopClient
+    Manager([
+        {
+            'requires': [
+                HadoopClient,
+            ],
+            'callbacks': [
+                callbacks.install_and_configure,
+            ],
+        },
+    ]).manage()
+
+Additionally, the `JAVA_HOME`, `HADOOP_HOME`, `HADOOP_CONF_DIR`, and other
+environment variables will be set via `/etc/environment`.  This includes putting
+the Hadoop bin and sbin directories on the `PATH`.  There are also helpers in
+`charmhelpers.contrib.bigdata.utils` to assist with using the environment file.
+For example, to run the `hdfs` command to create a directory as the `ubuntu` user:
+
+    from charmhelpers.contrib.bigdata.utils import run_as
+    run_as('ubuntu', 'hdfs', 'dfs', '-mkdir', '-p', '/home/ubuntu/foo')
+
+As noted in the main README, scaling client nodes will also duplicate any
+associated subordinate workload charms.
+
+
+## Provided Relations
+
+### hadoop-client (hadoop-client)
+
+This relation connects this charm to the subordinate workload charms, as
+described above.  The relation exchanges the following keys:
+
+* Sent to subordinate client:
+
+  * `hdfs-ready`: Flag indicating that HDFS is ready to store data
+
+* Received from subordinate client:
+
+  *There are no keys received from the subordinate client*
+
+## Required Relations
+
+### namenode (dfs)
+
+This relation connects this charm to the apache-hadoop-hdfs-master charm.
+The relation exchanges the following keys:
+
+* Sent to hdfs-master:
+
+  *(there are no keys sennt to the hdfs-master)*
+
+* Received from hdfs-master:
+
+  * `private-address`: Address of the HDFS master unit, to be used as the NameNode
+  * `ready`: A flag indicating that HDFS is ready to begin storing data
+
+Ports will soon be added to both of these.
+
+
+### resourcemanager (mapred)
+
+This relation connects this charm to the apache-hadoop-yarn-master charm.
+The relation exchanges the following keys:
+
+* Sent to yarn-master:
+
+  *There are no keys sent to the hdfs-master*
+
+* Received from yarn-master:
+
+  * `private-address`: Address of the YARN master unit, to be used as the ResourceManager
+  * `ready`: A flag indicating that YARN is ready to perform work
+
+Ports will soon be added to both of these.
+
+
+## Manual Deployment
+
+The easiest way to deploy the core Apache Hadoop platform is to use one of
+the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
+However, to manually deploy the base Apache Hadoop platform without using one of the
+bundles, you can use the following:
+
+    juju deploy apache-hadoop-hdfs-master hdfs-master
+    juju deploy apache-hadoop-hdfs-secondary secondary-namenode
+    juju deploy apache-hadoop-yarn-master yarn-master
+    juju deploy apache-hadoop-compute-slave compute-slave -n3
+    juju deploy apache-hadoop-client client
+    juju add-relation yarn-master hdfs-master
+    juju add-relation secondary-namenode hdfs-master
+    juju add-relation compute-slave yarn-master
+    juju add-relation compute-slave hdfs-master
+    juju add-relation client yarn-master
+    juju add-relation client hdfs-master
+
+This will create a scalable deployment with separate nodes for each master,
+and a three unit compute slave (NodeManager and DataNode) cluster.  The master
+charms also support co-locating using the `--to` option to `juju deploy` for
+more dense deployments.

=== modified file 'README.md'
--- README.md	2015-02-13 22:32:12 +0000
+++ README.md	2015-03-11 17:15:01 +0000
@@ -1,117 +1,48 @@
 ## Overview
 
-This charm is a component of the Apache Hadoop platform.  It is intended
-to be deployed with the other components using the bundle:
-`bundle:~bigdata-charmers/apache-hadoop`
-
-**What is Apache Hadoop?**
-
 The Apache Hadoop software library is a framework that allows for the
 distributed processing of large data sets across clusters of computers
 using a simple programming model.
 
-It is designed to scale up from single servers to thousands of machines,
-each offering local computation and storage. Rather than rely on hardware
-to deliver high-avaiability, the library itself is designed to detect
-and handle failures at the application layer, so delivering a
-highly-availabile service on top of a cluster of computers, each of
-which may be prone to failures.
-
-Apache Hadoop 2.4.1 consists of significant improvements over the previous stable
-release (hadoop-1.x).
-
-Here is a short overview of the improvments to both HDFS and MapReduce.
-
- - **HDFS Federation**
-   In order to scale the name service horizontally, federation uses multiple
-   independent Namenodes/Namespaces. The Namenodes are federated, that is, the
-   Namenodes are independent   and don't require coordination with each other.
-   The datanodes are used as common storage for blocks by all the Namenodes.
-   Each datanode registers with all the Namenodes in the cluster.   Datanodes
-   send periodic heartbeats and block reports and handles commands from the
-   Namenodes.
-
-   More details are available in the HDFS Federation document:
-   <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html>
-
- - **MapReduce NextGen aka YARN aka MRv2**
-   The new architecture introduced in hadoop-0.23, divides the two major functions of the
-   JobTracker: resource management and job life-cycle management into separate components.
-   The new ResourceManager manages the global assignment of compute resources to
-   applications and the per-application ApplicationMaster manages the application‚
-   scheduling and coordination.
-   An application is either a single job in the sense of classic MapReduce jobs or a DAG of
-   such jobs.
-
-   The ResourceManager and per-machine NodeManager daemon, which manages the user
-   processes on   that machine, form the computation fabric.
-
-   The per-application ApplicationMaster is, in effect, a framework specific
-   library and is tasked with negotiating resources from the ResourceManager and
-   working with the NodeManager(s) to execute and monitor the tasks.
-
-   More details are available in the YARN document:
-   <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html>
+This charm deploys a client node running
+[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/)
+from which workloads can be run, either manually or via workload
+charms.
 
 ## Usage
 
-This charm manages the hadoop client node.  It is intended to be used with
-`apache-hadoop-hdfs-master` and `apache-hadoop-yarn-master`.
-
-### Simple Usage: Single YARN / HDFS master deployment
-
-In this configuration, the YARN and HDFS master components run on the same
-machine.  This is useful for lower-resource deployments::
-
-    juju deploy apache-hadoop-hdfs-master hdfs-master
-    juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 1
-    juju deploy apache-hadoop-yarn-master yarn-master --to 1
-    juju deploy apache-hadoop-compute-slave compute-slave
-    juju deploy apache-hadoop-client client
-    juju add-relation yarn-master hdfs-master
-    juju add-relation secondary-namenode hdfs-master
-    juju add-relation compute-slave yarn-master
-    juju add-relation compute-slave hdfs-master
-    juju add-relation client yarn-master
-    juju add-relation client hdfs-master
-
-Note that the machine number (`--to 1`) should match the machine number
-for the `hdfs-master` charm.  If you previously deployed other services
-in your environment, you may need to adjust the machine number appropriately.
-
-
-### Scale Out Usage: Separate HDFS, YARN, and compute nodes
-
-In this configuration the HDFS and YARN deployments operate on
-different service units as separate services::
-
-    juju deploy apache-hadoop-hdfs-master hdfs-master
-    juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode
-    juju deploy apache-hadoop-yarn-master yarn-master
-    juju deploy apache-hadoop-compute-slave compute-slave -n 3
-    juju deploy apache-hadoop-client client
-    juju add-relation yarn-master hdfs-master
-    juju add-relation secondary-namenode hdfs-master
-    juju add-relation compute-slave yarn-master
-    juju add-relation compute-slave hdfs-master
-    juju add-relation client yarn-master
-    juju add-relation client hdfs-master
-
-The `-n 3` option can be adjusted according to the number of compute nodes
-you need.  You can also add additional compute nodes later::
-
-    juju add-unit compute-slave -n 2
-
-
-### To deploy a Hadoop service with elasticsearch service::
-    # deploy ElasticSearch locally:
-    **juju deploy elasticsearch elasticsearch**
-    # elasticsearch-hadoop.jar file will be added to LIBJARS path
-    # Recommanded to use hadoop -libjars option to included elk jar file
-    **juju add-unit -n elasticsearch**
-    # deploy hive service by any senarios mentioned above
-    # associate Hive with elasticsearch
-    **juju add-relation {hadoop master}:elasticsearch elasticsearch:client**
+This charm is intended to be deployed via one of the
+[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
+For example:
+
+    juju quickstart u/bigdata-dev/apache-analytics-sql
+
+This will deploy the Apache Hadoop platform with a single client node
+which is running Apache Hive to perform SQL-like queries against your data.
+
+If you wanted to also wanted to be able to analyze your data using Apache Pig,
+you could deploy it on the same client:
+
+    juju deploy cs:~bigdata-dev/apache-pig pig
+    juju add-relation client pig
+
+Note that horizontally scaling client nodes with `juju add-unit` will also
+replicate all of the node's associated workload charms, since they are
+subordinates.  While this can be useful to provide HA fail-over, if you
+actually intend to have separate client nodes for, e.g., Apache Hive and Apache
+Pig, you should instead deploy a separate instance of apache-hadoop-client:
+
+    juju deploy cs:~bigdata-dev/apache-hadoop-client pig-client
+    juju add-relation pig-client yarn-master
+    juju add-relation pig-client hdfs-master
+    juju deploy cs:~bigdata-dev/apache-pig pig
+    juju add-relation pig-client pig
+
+You can also manually load and run map-reduce jobs via the client:
+
+    juju scp my-job.jar client/0:
+    juju ssh client/0
+    hadoop jar my-job.jar
 
 
 ## Deploying in Network-Restricted Environments
@@ -120,12 +51,14 @@
 access. To deploy in this environment, you will need a local mirror to serve
 the packages and resources required by these charms.
 
+
 ### Mirroring Packages
 
 You can setup a local mirror for apt packages using squid-deb-proxy.
 For instructions on configuring juju to use this, see the
 [Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
 
+
 ### Mirroring Resources
 
 In addition to apt packages, the Apache Hadoop charms require a few binary
@@ -148,9 +81,14 @@
 
 
 ## Contact Information
-amir sanjar <amir.sanjar@xxxxxxxxxxxxx>
+
+* Amir Sanjar <amir.sanjar@xxxxxxxxxxxxx>
+* Cory Johns <cory.johns@xxxxxxxxxxxxx>
+* Kevin Monroe <kevin.monroe@xxxxxxxxxxxxx>
+
 
 ## Hadoop
+
 - [Apache Hadoop](http://hadoop.apache.org/) home page
 - [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
 - [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)

=== modified file 'hooks/common.py'
--- hooks/common.py	2015-03-06 22:40:25 +0000
+++ hooks/common.py	2015-03-11 17:15:01 +0000
@@ -56,7 +56,7 @@
         {
             'name': 'client',
             'provides': [
-                bigdata.relations.HadoopClient(hadoop),
+                bigdata.relations.HadoopClient(),
             ],
             'requires': [
                 hadoop.is_installed,

=== modified file 'resources.yaml'
--- resources.yaml	2015-03-06 22:40:25 +0000
+++ resources.yaml	2015-03-11 17:15:01 +0000
@@ -8,8 +8,8 @@
   six:
     pypi: six
   charmhelpers:
-    pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
-    hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b
+    pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
+    hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c
     hash_type: sha256
 optional_resources:
   hadoop-aarch64:

Follow ups

[Merge] lp:~bigdata-dev/charms/trusty/apache-hadoop-client/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-client/trunk
From: noreply, 2015-03-11
Re: [Merge] lp:~bigdata-dev/charms/trusty/apache-hadoop-client/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-client/trunk
From: amir sanjar, 2015-03-11