← Back to team overview

monasca team mailing list archive

Re: alarm transition events are missing in kafka queue - mysql alarm state is updated properly

 

Hi Yuan,
   I have been unable to reproduce the problem you have seen. I set up the threshold engine to run with only 3 partitions on alarm-state-transitions and triggered multiple alarm transitions. All of them were correctly written to the database.

   That is the behavior that I expected as the Threshold Engine does not hard code the number of transitions, instead the java kafka client gets the number of partitions from the kafka server and then uses that on the writes.

   Can you give me the full log message that you saw? And give me the output of:


/ # kafka-topics.sh —topic alarm-state-transitions —describe —zookeeper <your-zookeeper-server>


This is the output for my test setup:


Topic:alarm-state-transitions PartitionCount:3 ReplicationFactor:1 Configs:

Topic: alarm-state-transitions Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001

Topic: alarm-state-transitions Partition: 1 Leader: 1001 Replicas: 1001 Isr: 1001

Topic: alarm-state-transitions Partition: 2 Leader: 1001 Replicas: 1001 Isr: 1001


Craig Bryant HPE

From: "Yuan.Pen@xxxxxxxxxxxxx<mailto:Yuan.Pen@xxxxxxxxxxxxx>" <Yuan.Pen@xxxxxxxxxxxxx<mailto:Yuan.Pen@xxxxxxxxxxxxx>>
Date: Tuesday, January 30, 2018 at 4:28 PM
To: Microsoft Office User <craig.bryant@xxxxxxx<mailto:craig.bryant@xxxxxxx>>, "Hochmuth, Roland M" <roland.hochmuth@xxxxxxx<mailto:roland.hochmuth@xxxxxxx>>
Cc: "bradley.klein@xxxxxxxxxxx<mailto:bradley.klein@xxxxxxxxxxx>" <bradley.klein@xxxxxxxxxxx<mailto:bradley.klein@xxxxxxxxxxx>>, "monasca@xxxxxxxxxxxxxxxxxxx<mailto:monasca@xxxxxxxxxxxxxxxxxxx>" <monasca@xxxxxxxxxxxxxxxxxxx<mailto:monasca@xxxxxxxxxxxxxxxxxxx>>
Subject: RE: [Monasca] alarm transition events are missing in kafka queue - mysql alarm state is updated properly

Hi Craig,
It is indeed that the alarm was dropped because of error to write to kafka. Once we started debug mode in storm, we saw the error. Thanks for patching it up so next time we can see what happened in log file. By the way, the error is “Invalid partition given with record”.Because the threshold engine writes alarm to alarm-state-transitions topic with specific partition number [0-7), it will fail if topic is configured with less than 8 partitions – that is the case we have. This may happen to anybody because the threshold engine hard coded the number of alarm-state-transitions topic to which may not be the case when the topic is created by separate process. I am suggesting to change the number of partition in Threshold Engine code to be configurable so that can be kept consistent with number of partitions parameter when creating alarm-state-transitions topic.
Thanks a lot for looking at the matter.
Yuan

From: Bryant, Craig W (HP Cloud Service) [mailto:craig.bryant@xxxxxxx]
Sent: Thursday, January 18, 2018 12:43 PM
To: Pen, Yuan; Hochmuth, Roland M
Cc: bradley.klein@xxxxxxxxxxx<mailto:bradley.klein@xxxxxxxxxxx>; monasca@xxxxxxxxxxxxxxxxxxx<mailto:monasca@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Monasca] alarm transition events are missing in kafka queue - mysql alarm state is updated properly

Hi Yuan,
I have not seen an issue like this before and have never had it reported, either. Unfortunately, the code only logs an exception on send to kafka as debug so it won’t show up in the standard configuration. I will submit a patch to change that, but you will have to upgrade to get that change. I’m sorry, but I have no suggestions on how to make this more reliable except ensure you kafka is running in a high availability configuration.

Craig Bryant
HPE

From: Monasca <monasca-bounces+craig.bryant=hpe.com@xxxxxxxxxxxxxxxxxxx<mailto:monasca-bounces+craig.bryant=hpe.com@xxxxxxxxxxxxxxxxxxx>> on behalf of "Yuan.Pen@xxxxxxxxxxxxx<mailto:Yuan.Pen@xxxxxxxxxxxxx>" <Yuan.Pen@xxxxxxxxxxxxx<mailto:Yuan.Pen@xxxxxxxxxxxxx>>
Date: Friday, January 12, 2018 at 11:36 AM
To: "Hochmuth, Roland M" <roland.hochmuth@xxxxxxx<mailto:roland.hochmuth@xxxxxxx>>
Cc: "bradley.klein@xxxxxxxxxxx<mailto:bradley.klein@xxxxxxxxxxx>" <bradley.klein@xxxxxxxxxxx<mailto:bradley.klein@xxxxxxxxxxx>>, "monasca@xxxxxxxxxxxxxxxxxxx<mailto:monasca@xxxxxxxxxxxxxxxxxxx>" <monasca@xxxxxxxxxxxxxxxxxxx<mailto:monasca@xxxxxxxxxxxxxxxxxxx>>
Subject: [Monasca] alarm transition events are missing in kafka queue - mysql alarm state is updated properly

Hi Roland,
This is Yuan Pen from Deutsche Telekom. I am sending this email to the monasca community asking for help on monasca threshold engine. We have found that when sometime alarm state transitions happened, the threshold engine updated mysql alarm state properly, but failed to put  state transition events  in kafka queue (alarm-state-transitions).  Does this ring a bell to anyone in the community? If this is a real problem, is there anything we can do to make sure the event in transition queue and state in mysql is synched? Any comments or help are greatly appreciated.
Best Regard,

Yuan Pen

571-594-6155


References