trafodion-firefighters team mailing list archive
-
trafodion-firefighters team
-
Mailing list archive
-
Message #00335
Re: check test failures
Hi Steve,
Did the hadoop distro in safemode thing ever get worked out? That is what was causing the setfacl commands to fail because the previous mkdir command failed due to the distro being in safemode
***INFO: Setting HDFS ACLs for snapshot scan support
mkdir: Cannot create directory /apps/hbase/data/archive. Name node is in safe mode.
chown: Unknown command
Did you mean -chown? This command begins with a dash.
setfacl: `/apps/hbase/data/archive': No such file or directory
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....
The distro being in safemode can also prevent the DTMs from being ready which will cause sqstart to "hang" forever.
This also beings up a longtime issue in the design of starting Trafodion, namely that we will wait forever for the DTMs to be ready, we do that because we have no idea how long recovery will take, however, there are many other reasons the DTMs don't become ready when recovery is not even running. Perhaps if we could change the design to distinguish between waiting for DTM's recovery to complete versus the DTM's are not becoming ready for some other condition (and timeout on those). Not that that fixes this particular hang but it would help.
--Marvin
From: Varnau, Steve (Trafodion)
Sent: Monday, March 02, 2015 11:01 AM
To: Bouaziz, Khaled; Subbiah, Suresh; trafodion-firefighters@xxxxxxxxxxxxxxxxxxx
Cc: Anderson, Marvin
Subject: RE: check test failures
Oh yes, I think Khaled is right for the error that complains about the setfacl command failing. I believe Marvin has a fix in review right now.
I am more worried about the other one (timeout starting trafodion). It was seen couple times Friday and is not obvious why it hangs.
-Steve
From: Bouaziz, Khaled
Sent: Monday, March 02, 2015 06:45
To: Subbiah, Suresh; trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Cc: Anderson, Marvin; Varnau, Steve (Trafodion)
Subject: RE: check test failures
This issue is probably related to the one in the attached email
From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+khaled.bouaziz=hp.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Subbiah, Suresh
Sent: Monday, March 02, 2015 8:31 AM
To: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Subject: [Trafodion-firefighters] FW: check test failures
Hi FFs,
Any suggestions on how to resolve this problem?
Thanks
Suresh
From: Varnau, Steve (Trafodion)
Sent: Saturday, February 28, 2015 11:08 PM
To: Subbiah, Suresh; Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes; Cooper, Joanie
Subject: RE: check test failures
Hi Suresh,
Joanie hit this on Friday too. There seems to be an intermittent failure introduced recently, which causes a hang in sqstart. Maybe firefighters need to tackle it.
-Steve
From: Subbiah, Suresh
Sent: Saturday, February 28, 2015 17:53
To: Varnau, Steve (Trafodion); Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes
Subject: check test failures
Hi Steve, Chris
For my checkin https://review.trafodion.org/#/c/1169/ I am getting check tests failures in either the phoenix of the core-seabase suites.
As far as I can tell the issue is not with any particular test failing, but with something in the setup stage not working as expected.
Is there anything I can do to avoid these errors? Copying Selva since I think his tests may be running into similar pblms. I did not check though.
Thanks
Suresh
Current Run
For the current run the first error I see for failing phoenix test is
2015-03-01 01:27:01 ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed
2015-03-01 01:27:01 ***ERROR: traf_hortonworks_mods98 exited with error.
2015-03-01 01:27:01 ***ERROR: Please check log files.
2015-03-01 01:27:01 ***ERROR: Exiting....
Previous run (core Seabase)
***INFO: End of DCS install.
***INFO: starting Trafodion instance
***INFO: End of DCS install.
***INFO: starting Trafodion instance
Build timed out (after 200 minutes). Marking the build as failed.
Two runs before (once in phoenix and the in core Seabase)
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....
References