bigdata-dev team mailing list archive
-
bigdata-dev team
-
Mailing list archive
-
Message #00487
[Merge] lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle
Kevin W Monroe has proposed merging lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle.
Requested reviews:
Kevin W Monroe (kwmonroe)
Related bugs:
Bug #1475634 in Juju Charms Collection: "bigdata solution: need a Apache Hadoop, SPark, and ipython notebook for SPark solution"
https://bugs.launchpad.net/charms/+bug/1475634
For more details, see:
https://code.launchpad.net/~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk/+merge/286952
updates from bigdata-dev:
- version lock charms in bundle.yaml
- update bundle tests
- fix README formatting
--
Your team Juju Big Data Development is subscribed to branch lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk.
=== modified file 'README.md'
--- README.md 2015-07-09 15:09:34 +0000
+++ README.md 2016-02-23 21:01:00 +0000
@@ -16,81 +16,81 @@
- 1 Notebook (colocated on the Spark unit)
- ## Usage
- Deploy this bundle using juju-quickstart:
-
- juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook
-
- See `juju quickstart --help` for deployment options, including machine
- constraints and how to deploy a locally modified version of the
- apache-hadoop-spark-notebook bundle.yaml.
-
-
- ## Testing the deployment
-
- ### Smoke test HDFS admin functionality
- Once the deployment is complete and the cluster is running, ssh to the HDFS
- Master unit:
-
- juju ssh hdfs-master/0
-
- As the `ubuntu` user, create a temporary directory on the Hadoop file system.
- The steps below verify HDFS functionality:
-
- hdfs dfs -mkdir -p /tmp/hdfs-test
- hdfs dfs -chmod -R 777 /tmp/hdfs-test
- hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
- hdfs dfs -rm -R /tmp/hdfs-test
- hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
- exit
-
- ### Smoke test YARN and MapReduce
- Run the `terasort.sh` script from the Spark unit to generate and sort data. The
- steps below verify that Spark is communicating with the cluster via the plugin
- and that YARN and MapReduce are working as expected:
-
- juju ssh spark/0
- ~/terasort.sh
- exit
-
- ### Smoke test HDFS functionality from user space
- From the Spark unit, delete the MapReduce output previously generated by the
- `terasort.sh` script:
-
- juju ssh spark/0
- hdfs dfs -rm -R /user/ubuntu/tera_demo_out
- exit
-
- ### Smoke test Spark
- SSH to the Spark unit and run the SparkPi demo as follows:
-
- juju ssh spark/0
- ~/sparkpi.sh
- exit
-
- ### Access the IPython Notebook web interface
- Access the notebook web interface at
- http://{spark_unit_ip_address}:8880. The ip address can be found by running
- `juju status spark/0 | grep public-address`.
-
-
- ## Scale Out Usage
- This bundle was designed to scale out. To increase the amount of Compute
- Slaves, you can add units to the compute-slave service. To add one unit:
-
- juju add-unit compute-slave
-
- Or you can add multiple units at once:
-
- juju add-unit -n4 compute-slave
-
-
- ## Contact Information
-
- - <bigdata-dev@xxxxxxxxxxxxxxxxxxx>
-
-
- ## Help
-
- - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
- - [Juju community](https://jujucharms.com/community)
+## Usage
+Deploy this bundle using juju-quickstart:
+
+ juju quickstart apache-hadoop-spark-notebook
+
+See `juju quickstart --help` for deployment options, including machine
+constraints and how to deploy a locally modified version of the
+apache-hadoop-spark-notebook bundle.yaml.
+
+
+## Testing the deployment
+
+### Smoke test HDFS admin functionality
+Once the deployment is complete and the cluster is running, ssh to the HDFS
+Master unit:
+
+ juju ssh hdfs-master/0
+
+As the `ubuntu` user, create a temporary directory on the Hadoop file system.
+The steps below verify HDFS functionality:
+
+ hdfs dfs -mkdir -p /tmp/hdfs-test
+ hdfs dfs -chmod -R 777 /tmp/hdfs-test
+ hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
+ hdfs dfs -rm -R /tmp/hdfs-test
+ hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
+ exit
+
+### Smoke test YARN and MapReduce
+Run the `terasort.sh` script from the Spark unit to generate and sort data. The
+steps below verify that Spark is communicating with the cluster via the plugin
+and that YARN and MapReduce are working as expected:
+
+ juju ssh spark/0
+ ~/terasort.sh
+ exit
+
+### Smoke test HDFS functionality from user space
+From the Spark unit, delete the MapReduce output previously generated by the
+`terasort.sh` script:
+
+ juju ssh spark/0
+ hdfs dfs -rm -R /user/ubuntu/tera_demo_out
+ exit
+
+### Smoke test Spark
+SSH to the Spark unit and run the SparkPi demo as follows:
+
+ juju ssh spark/0
+ ~/sparkpi.sh
+ exit
+
+### Access the IPython Notebook web interface
+Access the notebook web interface at
+http://{spark_unit_ip_address}:8880. The ip address can be found by running
+`juju status spark/0 | grep public-address`.
+
+
+## Scale Out Usage
+This bundle was designed to scale out. To increase the amount of Compute
+Slaves, you can add units to the compute-slave service. To add one unit:
+
+ juju add-unit compute-slave
+
+Or you can add multiple units at once:
+
+ juju add-unit -n4 compute-slave
+
+
+## Contact Information
+
+- <bigdata-dev@xxxxxxxxxxxxxxxxxxx>
+
+
+## Help
+
+- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
+- [Juju community](https://jujucharms.com/community)
=== modified file 'bundle.yaml'
--- bundle.yaml 2015-07-16 20:35:31 +0000
+++ bundle.yaml 2016-02-23 21:01:00 +0000
@@ -1,46 +1,46 @@
services:
compute-slave:
- charm: cs:trusty/apache-hadoop-compute-slave
+ charm: cs:trusty/apache-hadoop-compute-slave-9
num_units: 3
annotations:
gui-x: "300"
gui-y: "200"
- constraints: mem=3G
+ constraints: mem=7G
hdfs-master:
- charm: cs:trusty/apache-hadoop-hdfs-master
+ charm: cs:trusty/apache-hadoop-hdfs-master-9
num_units: 1
annotations:
gui-x: "600"
gui-y: "350"
constraints: mem=7G
plugin:
- charm: cs:trusty/apache-hadoop-plugin
+ charm: cs:trusty/apache-hadoop-plugin-10
annotations:
gui-x: "900"
gui-y: "200"
secondary-namenode:
- charm: cs:trusty/apache-hadoop-hdfs-secondary
+ charm: cs:trusty/apache-hadoop-hdfs-secondary-7
num_units: 1
annotations:
gui-x: "600"
gui-y: "600"
constraints: mem=7G
spark:
- charm: cs:trusty/apache-spark
+ charm: cs:trusty/apache-spark-6
num_units: 1
annotations:
gui-x: "1200"
gui-y: "200"
constraints: mem=3G
yarn-master:
- charm: cs:trusty/apache-hadoop-yarn-master
+ charm: cs:trusty/apache-hadoop-yarn-master-7
num_units: 1
annotations:
gui-x: "600"
gui-y: "100"
constraints: mem=7G
notebook:
- charm: cs:trusty/apache-spark-notebook
+ charm: cs:trusty/apache-spark-notebook-3
annotations:
gui-x: "1200"
gui-y: "450"
@@ -53,4 +53,4 @@
- [plugin, yarn-master]
- [plugin, hdfs-master]
- [spark, plugin]
- - [notebook, spark]
+ - [spark, notebook]
=== removed file 'tests/00-setup'
--- tests/00-setup 2015-07-16 20:35:31 +0000
+++ tests/00-setup 1970-01-01 00:00:00 +0000
@@ -1,8 +0,0 @@
-#!/bin/bash
-
-if ! dpkg -s amulet &> /dev/null; then
- echo Installing Amulet...
- sudo add-apt-repository -y ppa:juju/stable
- sudo apt-get update
- sudo apt-get -y install amulet
-fi
=== modified file 'tests/01-bundle.py'
--- tests/01-bundle.py 2015-07-16 20:35:31 +0000
+++ tests/01-bundle.py 2016-02-23 21:01:00 +0000
@@ -1,61 +1,32 @@
#!/usr/bin/env python3
import os
-import time
import unittest
import yaml
import amulet
-class Base(object):
- """
- Base class for tests for Apache Hadoop Bundle.
- """
+class TestBundle(unittest.TestCase):
bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')
- profile_name = None
@classmethod
- def deploy(cls):
- # classmethod inheritance doesn't work quite right with
- # setUpClass / tearDownClass, so subclasses have to manually call this
+ def setUpClass(cls):
cls.d = amulet.Deployment(series='trusty')
with open(cls.bundle_file) as f:
bun = f.read()
- profiles = yaml.safe_load(bun)
- # amulet always selects the first profile, so we have to fudge it here
- profile = {cls.profile_name: profiles[cls.profile_name]}
- cls.d.load(profile)
- cls.d.setup(timeout=9000)
- cls.d.sentry.wait()
- cls.hdfs = cls.d.sentry.unit['hdfs-master/0']
- cls.yarn = cls.d.sentry.unit['yarn-master/0']
- cls.slave = cls.d.sentry.unit['compute-slave/0']
- cls.secondary = cls.d.sentry.unit['secondary-namenode/0']
- cls.plugin = cls.d.sentry.unit['plugin/0']
- cls.client = cls.d.sentry.unit['client/0']
-
- @classmethod
- def reset_env(cls):
- # classmethod inheritance doesn't work quite right with
- # setUpClass / tearDownClass, so subclasses have to manually call this
- juju_env = amulet.helpers.default_environment()
- services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client']
-
- def check_env_clear():
- state = amulet.waiter.state(juju_env=juju_env)
- for service in services:
- if state.get(service, {}) != {}:
- return False
- return True
-
- for service in services:
- cls.d.remove(service)
- with amulet.helpers.timeout(300):
- while not check_env_clear():
- time.sleep(5)
-
- def test_hadoop_components(self):
+ bundle = yaml.safe_load(bun)
+ cls.d.load(bundle)
+ cls.d.setup(timeout=1800)
+ cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800)
+ cls.hdfs = cls.d.sentry['hdfs-master'][0]
+ cls.yarn = cls.d.sentry['yarn-master'][0]
+ cls.slave = cls.d.sentry['compute-slave'][0]
+ cls.secondary = cls.d.sentry['secondary-namenode'][0]
+ cls.spark = cls.d.sentry['spark'][0]
+ cls.notebook = cls.d.sentry['notebook'][0]
+
+ def test_components(self):
"""
Confirm that all of the required components are up and running.
"""
@@ -63,17 +34,48 @@
yarn, retcode = self.yarn.run("pgrep -a java")
slave, retcode = self.slave.run("pgrep -a java")
secondary, retcode = self.secondary.run("pgrep -a java")
- client, retcode = self.client.run("pgrep -a java")
+ spark, retcode = self.spark.run("pgrep -a java")
+ notebook, retcode = self.spark.run("pgrep -a python")
# .NameNode needs the . to differentiate it from SecondaryNameNode
assert '.NameNode' in hdfs, "NameNode not started"
+ assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
+ assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
+ assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
+ assert '.NameNode' not in spark, "NameNode should not be running on spark"
+
assert 'ResourceManager' in yarn, "ResourceManager not started"
+ assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
+ assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
+ assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
+ assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark"
+
assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"
+ assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
+ assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
+ assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
+ assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark"
+
assert 'NodeManager' in slave, "NodeManager not started"
+ assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
+ assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
+ assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
+ assert 'NodeManager' not in spark, "NodeManager should not be running on spark"
+
assert 'DataNode' in slave, "DataServer not started"
+ assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
+ assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
+ assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
+ assert 'DataNode' not in spark, "DataNode should not be running on spark"
+
assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started"
+ assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
+ assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
+ assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
+ assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark"
- return hdfs, yarn, slave, secondary, client # allow subclasses to do additional checks
+ assert 'spark' in spark, 'Spark should be running on spark'
+ assert 'notebook' in notebook, 'Notebook should be running on spark'
def test_hdfs_dir(self):
"""
@@ -84,11 +86,11 @@
NB: These are order-dependent, so must be done as part of a single test case.
"""
- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output)
- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output)
- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output)
def test_yarn_mapreduce_exe(self):
@@ -112,59 +114,15 @@
('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"),
]
for name, step in test_steps:
- output, retcode = self.client.run(step)
+ output, retcode = self.spark.run(step)
assert retcode == 0, "{} FAILED:\n{}".format(name, output)
-
-class TestScalable(unittest.TestCase, Base):
- profile_name = 'apache-core-batch-processing'
-
- @classmethod
- def setUpClass(cls):
- cls.deploy()
-
- @classmethod
- def tearDownClass(cls):
- cls.reset_env()
-
- def test_hadoop_components(self):
- """
- In addition to testing that the components are running where they
- are supposed to be, confirm that none of them are also running where
- they shouldn't be.
- """
- hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components()
-
- # .NameNode needs the . to differentiate it from SecondaryNameNode
- assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
- assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
- assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
- assert '.NameNode' not in client, "NameNode should not be running on client"
-
- assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
- assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
- assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
- assert 'ResourceManager' not in client, "ResourceManager should not be running on client"
-
- assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
- assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
- assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
- assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client"
-
- assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
- assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
- assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
- assert 'NodeManager' not in client, "NodeManager should not be running on client"
-
- assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
- assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
- assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
- assert 'DataNode' not in client, "DataNode should not be running on client"
-
- assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
- assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
- assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
- assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client"
+ def test_spark(self):
+ output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'")
+ assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output
+
+ def test_notebook(self):
+ pass # requires javascript; how to test?
if __name__ == '__main__':
=== added file 'tests/tests.yaml'
--- tests/tests.yaml 1970-01-01 00:00:00 +0000
+++ tests/tests.yaml 2016-02-23 21:01:00 +0000
@@ -0,0 +1,3 @@
+reset: false
+packages:
+ - amulet
Follow ups