bigdata-dev team mailing list archive
-
bigdata-dev team
-
Mailing list archive
-
Message #00029
[Bug 1414080] Re: Race condition on relation when bundling hdp-hadoop
Hi Sam, this was fixed with a commit late last month:
http://bazaar.launchpad.net/~bigdata-charmers/charms/trusty/hdp-
hadoop/trunk/revision/30?remember=32
This is the infamous jps problem that plagued big data charms since the
openjdk 7u75 update. In your case, namenode-relation-joined *should*
have detected that namenode was running and returned before the call to
start_namenode():
http://bazaar.launchpad.net/~bigdata-charmers/charms/trusty/hdp-
hadoop/trunk/view/head:/hooks/hdp-hadoop-common.py#L438
However, prior to the above commit, jps running as root did not detect a
running NameNode process, so it fell through to start_namenode() and
caused this bug. We no longer rely on jps, but rather pgrep to detect
running processes, so this should be working correctly now. The latest
hdp-hadoop (currently REVNO32) contains this fix.
** Changed in: hdp-hadoop (Juju Charms Collection)
Status: New => Fix Released
** Changed in: hdp-pig (Juju Charms Collection)
Status: New => Fix Released
--
You received this bug notification because you are a member of Juju Big
Data Development, which is a bug assignee.
https://bugs.launchpad.net/bugs/1414080
Title:
Race condition on relation when bundling hdp-hadoop
Status in hdp-hadoop package in Juju Charms Collection:
Fix Released
Status in hdp-pig package in Juju Charms Collection:
Fix Released
Bug description:
I build a demo to run a Machine Learning workshop based on a blog post
made by Hortonworks.
There are 2 sides in the demo:
* Bundle: https://github.com/SaMnCo/bundle-flight-delay-demo
* Charm: https://github.com/SaMnCo/charm-flight-delay-demo
The bundle comprises:
* YARN Master
* 4x compute nodes
* PIG colocated on YARN
* ipython-notebook colocated on YARN
When I deploy manually using the 00-deploy file provided in the
bundle, everything goes well. However, when trying to deploy the
bundle, it fails at relation creation.
In the attached juju log collected at deployment, line 11347, we see
Hadoop crashing. Then 11402, the crash expands. 11418, we discover the
resource manager is not ready.
I can reproduce the same behavior with a simpler bundle comprising:
* YARN Master
* 4x compute nodes
* PIG colocated on YARN
So it seems to really be related to PIG/YARN/Compute nodes relation.
I can also reproduce the same behavior from juju-deployer, which fails
with :
2015-01-23 16:51:21 [INFO] deployer.import: Adding relations...
2015-01-23 16:51:23 [INFO] deployer.import: Adding relation c00-yarn-master:namenode <-> c02-compute-node:datanode
2015-01-23 16:51:24 [INFO] deployer.import: Adding relation c00-yarn-master:resourcemanager <-> c02-compute-node:nodemanager
2015-01-23 16:51:25 [INFO] deployer.import: Adding relation c04-hdp-pig:namenode <-> c00-yarn-master:namenode
2015-01-23 16:51:25 [INFO] deployer.import: Adding relation c04-hdp-pig:resourcemanager <-> c00-yarn-master:resourcemanager
2015-01-23 16:51:26 [INFO] deployer.import: Adding relation c03-flight-delay-demo:notebook <-> c01-ipython-notebook:notebook
2015-01-23 16:51:26 [DEBUG] deployer.import: Waiting for relation convergence 60s
2015-01-23 16:52:29 [ERROR] deployer.env: The following units had errors:
unit: c00-yarn-master/0: machine: 1 agent-state: error details: hook failed: "namenode-relation-joined"
2015-01-23 16:52:29 [INFO] deployer.cli: Deployment stopped. run time: 583.23
Let me know if you need anything else!
Thanks!
To manage notifications about this bug go to:
https://bugs.launchpad.net/charms/+source/hdp-hadoop/+bug/1414080/+subscriptions
Follow ups