trafodion-firefighters team mailing list archive
-
trafodion-firefighters team
-
Mailing list archive
-
Message #00392
Re: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
Arvind,
I checked zookeeper logs on a couple other machines that ran that test job in last week or so and it looks like you are right. At the time when the tests either hung or started getting errors. The Too many connections errors started showing up in the zookeeper logs.
So, is 60 too low of a number, or are we causing too many connections to zookeeper?
On initial look, I could not find equivalent value on the HW distro.
-Steve
From: Narain, Arvind
Sent: Friday, April 17, 2015 14:49
To: Varnau, Steve (Trafodion); Johnson, Stacey
Cc: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
I meant my knowledge on this parameter and the error seen.
From: Narain, Arvind
Sent: Friday, April 17, 2015 2:31 PM
To: Varnau, Steve (Trafodion); Johnson, Stacey
Cc: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
Thanks Steve. This surely helps.
We do have lots of the following messages indicating either we are leaking connections or making more connections - Maybe for authorization we need more?
We could increase the concurrent connections via maxClientCnxns. Did this change recently or was this being set with earlier distribution ? Do check with someone - my knowledge on this is limited.
2015-04-17 12:38:00,632 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:02,727 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:04,033 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:05,662 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /172.16.0.76 - max is 60
Now that we do have a zoo log would be interested to see if what other jobs failed on this system in the past few days and if we could get more info from this log.
From: Varnau, Steve (Trafodion)
Sent: Friday, April 17, 2015 1:30 PM
To: Narain, Arvind; Johnson, Stacey
Cc: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
Unfortunately, we are not archiving zookeeper logs for each job. But I've gone to the machine that ran that particular job and uploaded the (rather large) zookeeper log to http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/ for you. You'll have to sift thru to find the right times.
-Steve
From: Narain, Arvind
Sent: Friday, April 17, 2015 12:31
To: Johnson, Stacey; Varnau, Steve (Trafodion)
Cc: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
Regarding:
- phoenix_part2_T4-cm5.3 http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16 : FAILURE in 3h 22m 30s
Are there any zookeeper logs that will help in identifying this issue? Getting the following errors accessing hbase [12:40 thru 14:21]
org.trafodion.jdbc.t4.HPT4Exception: *** ERROR[1398] Error 0 occured while accessing the hbase subsystem. Fix that error and make sure hbase is up and running. Error Details: . [2015-04-17 12:40:06]
http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/traf_run/logs/trafodion.dtm.log
2015-04-17 12:40:40,199 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2015-04-17 12:40:40,236 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x7db75f15, quorum=slave-cm53.trafodion.org:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
==
http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/traf_run/logs/trafodion.hdfs.log
2015-04-17 12:38:19,321 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2015-04-17 12:38:19,325 ERROR zookeeper.ZooKeeperWatcher: catalogtracker-on-hconnection-0x28ab34f2, quorum=slave-cm53.trafodion.org:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+arvind.narain=hp.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Johnson, Stacey
Sent: Friday, April 17, 2015 9:31 AM
To: trafodion-firefighters@xxxxxxxxxxxxxxxxxxx<mailto:trafodion-firefighters@xxxxxxxxxxxxxxxxxxx>
Subject: [Trafodion-firefighters] Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test Failures
[cid:image001.png@01D07922.C88F2E20]
Build failed.
- traf-pub-release-ahw2.2 http://logs.trafodion.org/daily/traf-pub-release-ahw2.2/39af503 : SUCCESS in 43m 05s
- traf-pub-debug-ahw2.2 http://logs.trafodion.org/daily/traf-pub-debug-ahw2.2/248d564 : SUCCESS in 34m 38s
- build-product-release http://logs.trafodion.org/daily/build-product-release/c040e47 : SUCCESS in 19m 16s
- build-product-debug http://logs.trafodion.org/daily/build-product-debug/54d57bd : SUCCESS in 17m 45s
- core-regress-core-cm5.3 http://logs.trafodion.org/daily/core-regress-core-cm5.3/73d3fde : SUCCESS in 2h 47m 15s
- core-regress-core-ahw2.2 http://logs.trafodion.org/daily/core-regress-core-ahw2.2/02746ac : SUCCESS in 2h 15m 53s
- core-regress-charsets-cm5.3 http://logs.trafodion.org/daily/core-regress-charsets-cm5.3/3560c5b : SUCCESS in 1h 28m 46s
- core-regress-charsets-ahw2.2 http://logs.trafodion.org/daily/core-regress-charsets-ahw2.2/278d998 : SUCCESS in 1h 40m 33s
- core-regress-qat-cm5.3 http://logs.trafodion.org/daily/core-regress-qat-cm5.3/c8953c9 : SUCCESS in 1h 24m 23s
- core-regress-qat-ahw2.2 http://logs.trafodion.org/daily/core-regress-qat-ahw2.2/87209ea : SUCCESS in 1h 33m 37s
- core-regress-udr-cm5.3 http://logs.trafodion.org/daily/core-regress-udr-cm5.3/9bbaa0f : SUCCESS in 1h 13m 12s
- core-regress-udr-ahw2.2 http://logs.trafodion.org/daily/core-regress-udr-ahw2.2/7ace4ab : SUCCESS in 1h 26m 53s
- core-regress-catman1-cm5.3 http://logs.trafodion.org/daily/core-regress-catman1-cm5.3/cec89c0 : SUCCESS in 2h 22m 36s
- core-regress-catman1-ahw2.2 http://logs.trafodion.org/daily/core-regress-catman1-ahw2.2/cd118c3 : SUCCESS in 2h 41m 54s
- core-regress-compGeneral-cm5.3 http://logs.trafodion.org/daily/core-regress-compGeneral-cm5.3/0870564 : SUCCESS in 2h 47m 59s
- core-regress-compGeneral-ahw2.2 http://logs.trafodion.org/daily/core-regress-compGeneral-ahw2.2/24671af : FAILURE in 31m 51s
- core-regress-executor-cm5.3 http://logs.trafodion.org/daily/core-regress-executor-cm5.3/9ae374f : FAILURE in 4h 01m 46s
- core-regress-executor-ahw2.2 http://logs.trafodion.org/daily/core-regress-executor-ahw2.2/df5362a : SUCCESS in 2h 18m 01s
- core-regress-fullstack2-cm5.3 http://logs.trafodion.org/daily/core-regress-fullstack2-cm5.3/c0c71d5 : FAILURE in 4h 00m 22s
- core-regress-fullstack2-ahw2.2 http://logs.trafodion.org/daily/core-regress-fullstack2-ahw2.2/e689523 : SUCCESS in 1h 06m 40s
- core-regress-hive-cm5.3 http://logs.trafodion.org/daily/core-regress-hive-cm5.3/316bb4c : FAILURE in 1h 49m 07s
- core-regress-hive-ahw2.2 http://logs.trafodion.org/daily/core-regress-hive-ahw2.2/d3d9a3f : FAILURE in 2h 00m 04s
- core-regress-seabase-cm5.3 http://logs.trafodion.org/daily/core-regress-seabase-cm5.3/7cb286d : FAILURE in 4h 01m 31s
- core-regress-seabase-ahw2.2 http://logs.trafodion.org/daily/core-regress-seabase-ahw2.2/60e71d0 : SUCCESS in 2h 10m 15s
- phoenix_part1_T4-cm5.3 http://logs.trafodion.org/daily/phoenix_part1_T4-cm5.3/76e5e7d : SUCCESS in 2h 21m 42s
- phoenix_part2_T4-cm5.3 http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16 : FAILURE in 3h 22m 30s
- phoenix_part1_T4-ahw2.2 http://logs.trafodion.org/daily/phoenix_part1_T4-ahw2.2/594c172 : SUCCESS in 2h 11m 42s
- phoenix_part2_T4-ahw2.2 http://logs.trafodion.org/daily/phoenix_part2_T4-ahw2.2/4a65e23 : SUCCESS in 2h 18m 44s
- phoenix_part1_T2-cm5.3 http://logs.trafodion.org/daily/phoenix_part1_T2-cm5.3/a303691 : FAILURE in 1h 04m 35s (non-voting)
- phoenix_part2_T2-cm5.3 http://logs.trafodion.org/daily/phoenix_part2_T2-cm5.3/2aa431e : FAILURE in 51m 10s (non-voting)
- phoenix_part1_T2-ahw2.2 http://logs.trafodion.org/daily/phoenix_part1_T2-ahw2.2/c2c0bd2 : FAILURE in 1h 03m 42s (non-voting)
- phoenix_part2_T2-ahw2.2 http://logs.trafodion.org/daily/phoenix_part2_T2-ahw2.2/d655e16 : FAILURE in 59m 35s (non-voting)
- pyodbc_test-cm5.3 http://logs.trafodion.org/daily/pyodbc_test-cm5.3/01d6eeb : SUCCESS in 1h 13m 26s
- pyodbc_test-ahw2.2 http://logs.trafodion.org/daily/pyodbc_test-ahw2.2/b1a66ec : SUCCESS in 1h 13m 36s
- jdbc_test-cm5.3 http://logs.trafodion.org/daily/jdbc_test-cm5.3/fef27f8 : FAILURE in 1h 31m 41s
- jdbc_test-ahw2.2 http://logs.trafodion.org/daily/jdbc_test-ahw2.2/56f24a3 : SUCCESS in 1h 16m 37s

Follow ups
References