← Back to team overview

monasca team mailing list archive

[monasca] Apache Storm / Monasca Thresh unable to create new native thread

 

Hello everyone,


Unfortunately I've run out of ideas, so I'm putting this here - maybe someone will have any idea...

In our SUSE Cloud 7 production-like environment (i.e. running on bare metal, pretty strong machines), we've observed, that Apache Storm/Monasca Thresh fails with the following error:
java.lang.OutOfMemoryError: unable to create new native thread


Funny fact is, that this problem does not seem to occur neither in our CentOS (virtualized) environment nor in SUSE virtualized (mkcloud) environment....

I've checked all the limits I was aware of (like max processes/file descriptors per user), I've also run a simple test, which showed that it is possible to create over 12,000 threads in Java, while Storm uses only around 150.

Restarting all the services and the machine itself didn't help.


Some details:

Java version:

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (IcedTea 3.3.0) (suse-20.1-x86_64)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)


Apache-Storm v1.0.2

Monasca Thresh v2.1.0


Please also see the attached log file.

Any help is welcome


BR,

Jakub
==> /var/log/storm/workers-artifacts/thresh-cluster-1-1491307025/6702/worker.log <==
2017-04-04 21:14:47.833 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], end registering consumer thresh-metric_71 in ZK
2017-04-04 21:14:47.833 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], end registering consumer thresh-event_52 in ZK
2017-04-04 21:14:47.833 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], end registering consumer thresh-event_54 in ZK
2017-04-04 21:14:47.841 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], starting watcher executor thread for consumer thresh-metric_71
2017-04-04 21:14:47.841 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], starting watcher executor thread for consumer thresh-event_54
2017-04-04 21:14:47.841 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], starting watcher executor thread for consumer thresh-event_52
2017-04-04 21:14:47.865 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], begin rebalancing consumer thresh-metric_71 try #0
2017-04-04 21:14:47.865 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], begin rebalancing consumer thresh-event_54 try #0
2017-04-04 21:14:47.865 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], begin rebalancing consumer thresh-event_52 try #0
2017-04-04 21:14:47.946 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:47.947 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping all fetchers
2017-04-04 21:14:47.947 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] All connections stopped
2017-04-04 21:14:47.948 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Cleared all relevant queues for this fetcher
2017-04-04 21:14:47.949 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Cleared the data chunks in all the consumer message iterators
2017-04-04 21:14:47.950 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Committing all offsets after clearing the fetcher queues
2017-04-04 21:14:47.950 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Releasing partition ownership
2017-04-04 21:14:47.950 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:47.950 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping all fetchers
2017-04-04 21:14:47.950 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] All connections stopped
2017-04-04 21:14:47.950 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Cleared all relevant queues for this fetcher
2017-04-04 21:14:47.950 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Cleared the data chunks in all the consumer message iterators
2017-04-04 21:14:47.950 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Committing all offsets after clearing the fetcher queues
2017-04-04 21:14:47.951 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Releasing partition ownership
2017-04-04 21:14:47.953 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:47.953 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping all fetchers
2017-04-04 21:14:47.953 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] All connections stopped
2017-04-04 21:14:47.953 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Cleared all relevant queues for this fetcher
2017-04-04 21:14:47.953 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Cleared the data chunks in all the consumer message iterators
2017-04-04 21:14:47.953 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Committing all offsets after clearing the fetcher queues
2017-04-04 21:14:47.954 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Releasing partition ownership
2017-04-04 21:14:47.979 k.c.RangeAssignor [INFO] Consumer thresh-metric_71 rebalancing the following partitions: ArrayBuffer(0) for topic metrics with consumers: List(thresh-metric_70-0, thresh-metric_71-0)
2017-04-04 21:14:47.979 k.c.RangeAssignor [INFO] Consumer thresh-event_52 rebalancing the following partitions: ArrayBuffer(0) for topic events with consumers: List(thresh-event_52-0, thresh-event_53-0, thresh-event_54-0)
2017-04-04 21:14:47.980 k.c.RangeAssignor [INFO] Consumer thresh-event_54 rebalancing the following partitions: ArrayBuffer(0) for topic events with consumers: List(thresh-event_52-0, thresh-event_53-0, thresh-event_54-0)
2017-04-04 21:14:47.980 k.c.RangeAssignor [WARN] No broker partitions consumed by consumer thread thresh-metric_71-0 for topic metrics
2017-04-04 21:14:47.980 k.c.RangeAssignor [WARN] No broker partitions consumed by consumer thread thresh-event_54-0 for topic events
2017-04-04 21:14:47.981 k.c.RangeAssignor [INFO] thresh-event_52-0 attempting to claim partition 0
2017-04-04 21:14:47.993 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Consumer thresh-metric_71 selected partitions : 
2017-04-04 21:14:47.993 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Consumer thresh-event_54 selected partitions : 
2017-04-04 21:14:47.996 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], thresh-event_52-0 successfully owned partition 0 for topic events
2017-04-04 21:14:47.995 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], exception during rebalance 
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method) ~[?:1.8.0_121]
	at java.lang.Thread.start(Thread.java:714) ~[?:1.8.0_121]
	at kafka.consumer.ConsumerFetcherManager.startConnections(ConsumerFetcherManager.scala:125) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.updateFetcher(ZookeeperConsumerConnector.scala:764) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:697) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:905) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:240) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at monasca.thresh.infrastructure.thresholding.KafkaSpout.activate(KafkaSpout.java:74) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at org.apache.storm.daemon.executor$fn__6411$fn__6426$fn__6457.invoke(executor.clj:643) [storm-core-1.0.2.jar:1.0.2]
	at org.apache.storm.util$async_loop$fn__555.invoke(util.clj:484) [storm-core-1.0.2.jar:1.0.2]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2017-04-04 21:14:47.995 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], exception during rebalance 
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method) ~[?:1.8.0_121]
	at java.lang.Thread.start(Thread.java:714) ~[?:1.8.0_121]
	at kafka.consumer.ConsumerFetcherManager.startConnections(ConsumerFetcherManager.scala:125) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.updateFetcher(ZookeeperConsumerConnector.scala:764) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:697) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:905) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:240) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at monasca.thresh.infrastructure.thresholding.KafkaSpout.activate(KafkaSpout.java:74) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at org.apache.storm.daemon.executor$fn__6411$fn__6426$fn__6457.invoke(executor.clj:643) [storm-core-1.0.2.jar:1.0.2]
	at org.apache.storm.util$async_loop$fn__555.invoke(util.clj:484) [storm-core-1.0.2.jar:1.0.2]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2017-04-04 21:14:47.999 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], end rebalancing consumer thresh-metric_71 try #0
2017-04-04 21:14:47.999 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], end rebalancing consumer thresh-event_54 try #0
2017-04-04 21:14:47.999 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_54], Rebalancing attempt failed. Clearing the cache before the next rebalancing operation is triggered
2017-04-04 21:14:47.999 k.c.ZookeeperConsumerConnector [INFO] [thresh-metric_71], Rebalancing attempt failed. Clearing the cache before the next rebalancing operation is triggered
2017-04-04 21:14:48.000 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:48.000 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:48.000 k.c.ConsumerFetcherManager$LeaderFinderThread [INFO] [thresh-metric_71-leader-finder-thread], Shutting down
2017-04-04 21:14:48.000 k.c.ConsumerFetcherManager$LeaderFinderThread [INFO] [thresh-event_54-leader-finder-thread], Shutting down
2017-04-04 21:14:48.001 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Consumer thresh-event_52 selected partitions : events:0: fetched offset = 47: consumed offset = 47
2017-04-04 21:14:48.001 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], exception during rebalance 
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method) ~[?:1.8.0_121]
	at java.lang.Thread.start(Thread.java:714) ~[?:1.8.0_121]
	at kafka.consumer.ConsumerFetcherManager.startConnections(ConsumerFetcherManager.scala:125) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.updateFetcher(ZookeeperConsumerConnector.scala:764) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:697) ~[stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:905) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:240) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at monasca.thresh.infrastructure.thresholding.KafkaSpout.activate(KafkaSpout.java:74) [stormjar.jar:monasca-thresh-2.1.0-SNAPSHOT-2017-01-31T11:46:34-73196d]
	at org.apache.storm.daemon.executor$fn__6411$fn__6426$fn__6457.invoke(executor.clj:643) [storm-core-1.0.2.jar:1.0.2]
	at org.apache.storm.util$async_loop$fn__555.invoke(util.clj:484) [storm-core-1.0.2.jar:1.0.2]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2017-04-04 21:14:48.001 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], end rebalancing consumer thresh-event_52 try #0
2017-04-04 21:14:48.002 k.c.ZookeeperConsumerConnector [INFO] [thresh-event_52], Rebalancing attempt failed. Clearing the cache before the next rebalancing operation is triggered
2017-04-04 21:14:48.002 k.c.ConsumerFetcherManager [INFO] [ConsumerFetcherManager-1491308027740] Stopping leader finder thread
2017-04-04 21:14:48.002 k.c.ConsumerFetcherManager$LeaderFinderThread [INFO] [thresh-event_52-leader-finder-thread], Shutting down