← Back to team overview

bigdata-dev team mailing list archive

[Merge] lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk

 

Kevin W Monroe has proposed merging lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk.

Requested reviews:
  Juju Big Data Development (bigdata-dev)

For more details, see:
https://code.launchpad.net/~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes/+merge/248978

We test if services are running with 'is_jvm_service_active', which uses jps.  This was running as root on our units, but that doesn't show process names that we can parse to see if a service is really active. For example, here's jps output running as root on a yarn-hdfs master:

root@juju-canonistack-machine-20:~# jps
27783 Jps
22062 -- process information unavailable
23542 -- process information unavailable

That's less than helpful. So much so, that we get relation failures because the charms try to fire up services that are already running. With this MP, we run jps as the appropriate user for a given service (usually either hdfs or yarn).

This yields goodness:

ubuntu@juju-canonistack-machine-20:~$ sudo su - hdfs -c jps
22062 NameNode
27825 Jps

ubuntu@juju-canonistack-machine-20:~$ sudo su - yarn -c jps
23542 ResourceManager
27839 Jps

I'm not a big fan of having a dict with hard coded strings, but the alternative is to pass a username in with every call to is_jvm_service_active. I'll go that route if the herd wants, but this way was less typing for me.
-- 
Your team Juju Big Data Development is requested to review the proposed merge of lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk.
=== modified file 'README.md'
--- README.md	2014-12-10 23:31:55 +0000
+++ README.md	2015-02-06 22:41:08 +0000
@@ -57,7 +57,7 @@
 service units as HDFS namenode and the HDFS datanodes also run YARN NodeManager::
     juju deploy hdp-hadoop yarn-hdfs-master
     juju deploy hdp-hadoop compute-node
-    juju add-unit -n 2 yarn-hdfs-master
+    juju add-unit -n 2 compute-node
     juju add-relation yarn-hdfs-master:namenode compute-node:datanode
     juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager
 

=== modified file 'hooks/bdutils.py'
--- hooks/bdutils.py	2014-12-26 13:50:41 +0000
+++ hooks/bdutils.py	2015-02-06 22:41:08 +0000
@@ -128,7 +128,16 @@
                 os.environ[ll[0]] = ll[1].strip().strip(';').strip("\"").strip()
                 
 def is_jvm_service_active(processname):
-    cmd=["jps"]
+    processusers = {
+        "JobHistoryServer": os.environ['YARN_USER'],
+        "ResourceManager": os.environ['YARN_USER'],
+        "NodeManager": os.environ['YARN_USER'],
+        "DataNode": os.environ['HDFS_USER'],
+        "NameNode": os.environ['HDFS_USER'],
+        }
+    # set user based on given process, defaulting to hdfs user
+    username = processusers.get(processname, os.environ['HDFS_USER'])
+    cmd = shlex.split("su {u} -c jps".format(u=username))
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
     out, err = p.communicate()
     if err == None and str(out).find(processname) != -1:

=== modified file 'hooks/hdp-hadoop-common.py'
--- hooks/hdp-hadoop-common.py	2014-12-26 13:50:41 +0000
+++ hooks/hdp-hadoop-common.py	2015-02-06 22:41:08 +0000
@@ -434,6 +434,7 @@
 @hooks.hook('resourcemanager-relation-joined')
 def resourcemanager_relation_joined():
     log ("==> resourcemanager-relation-joined","INFO")
+    setHadoopEnvVar()
     if is_jvm_service_active("ResourceManager"):
         relation_set(resourceManagerReady=True)
         relation_set(resourceManager_hostname=get_unit_hostname())
@@ -443,12 +444,12 @@
         sys.exit(0)
     shutil.copy(os.path.join(os.path.sep, os.environ['CHARM_DIR'],\
                              'files', 'scripts', "terasort.sh"), home)
-    setHadoopEnvVar()
     relation_set(resourceManager_ip=unit_get('private-address'))
     relation_set(resourceManager_hostname=get_unit_hostname())
     configureYarn(unit_get('private-address'))
     start_resourcemanager(os.environ["YARN_USER"])
-    start_jobhistory()
+    # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
+    restart_jobhistory()
     open_port(8025)
     open_port(8050)
     open_port(8020)
@@ -475,6 +476,9 @@
     # nodemanager requires data node daemon
     if not is_jvm_service_active("DataNode"):
         start_datanode(os.environ['HDFS_USER'])
+    # TODO: (kwm) start_nm fails if nm is running. is it ok to stop first?
+    if is_jvm_service_active("NodeManager"):
+        stop_nodemanager(os.environ["YARN_USER"])
     start_nodemanager(os.environ["YARN_USER"])
     open_port(8025)
     open_port(8030)
@@ -506,11 +510,11 @@
 def namenode_relation_joined():
     log("Configuring namenode - joined phase", "INFO")
 
+    setHadoopEnvVar()
     if is_jvm_service_active("NameNode"):
         relation_set(nameNodeReady=True)
         relation_set(namenode_hostname=get_unit_hostname())
         return
-    setHadoopEnvVar()
     setDirPermission(os.environ['DFS_NAME_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0755)
     relation_set(namenode_hostname=get_unit_hostname())
     configureHDFS(unit_get('private-address'))
@@ -523,7 +527,8 @@
     HDFS_command("dfs -mkdir -p /user/ubuntu")
     HDFS_command("dfs -chown ubuntu /user/ubuntu")
     HDFS_command("dfs -chmod -R 755 /user/ubuntu")
-    start_jobhistory()
+    # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
+    restart_jobhistory()
     open_port(8020)
     open_port(8010)
     open_port(50070)
@@ -550,6 +555,9 @@
     fileSetKV(hosts_path, nodename_ip+' ', nodename_hostname)
     configureHDFS(nodename_ip)
     setDirPermission(os.environ['DFS_DATA_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0750)
+    # TODO: (kwm) start_dn fails if dn is running. is it ok to stop first?
+    if is_jvm_service_active("DataNode"):
+        stop_datanode(os.environ["HDFS_USER"])
     start_datanode(os.environ["HDFS_USER"])
     if not is_jvm_service_active("DataNode"):
         log("error ==> DataNode failed to start")


Follow ups