launchpad-dev team mailing list archive

Thread
Date

Re: anyone else finding ec2land things disappearing w/out warning?

To: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
From: Jonathan Lange <jml@xxxxxxxxxxxxx>
Date: Tue, 26 Oct 2010 08:12:52 -0400
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>, Brad Crittenden <brad.crittenden@xxxxxxxxxxxxx>
In-reply-to: <AANLkTi=1in95NxN0-WD9RhhGZw=DHV7WhXfj1yDmuma4@mail.gmail.com>
Sender: jonathan.lange@xxxxxxxxx

On Tue, Oct 26, 2010 at 6:46 AM, Robert Collins
<robertc@xxxxxxxxxxxxxxxxx> wrote:
> On Tue, Oct 26, 2010 at 4:24 AM, Maris Fogels
> <maris.fogels@xxxxxxxxxxxxx> wrote:
>>
>> I do not think an email alert will catch hung testrunners because an email
>> implementation will probably not send granular enough messages about what the
>> runner is doing.  Instead, I would consider installing a beacon in ec2 test that
>> sends HTTP POSTs to a central CGI script.  The beacon would report the start,
>> stop, report, and shutdown events for each run.  Auditing the logs would catch
>> hung, disappeared, or otherwise AWOL runners.  (BTW, web.py is awesome for
>> building such small web apps, and it is already on devpad for this purpose. Hint
>> hint ;)
>>
>> It is really difficult to gather facts about a randomly occurring error in a
>> randomly run process initiated by 30 developers on a globally distributed team.
>>  I really think that automated data gathering makes sense as the next step.
>
> I agree that email at the end won't help with debugging silent fails.
>
> Perhaps:
>  - capture stdout and stderr to two files on disk

stdout & stderr are combined to a file that has all of the output.
"full_log" in the language of remote.py. It is sent out when there is
an unexpected failure in being able to run the tests.

>  - change the @ script to shutdown to:
>   - combine the stdout and stderr files and send the combined file to
> a central place
>

Yes.

> Email would be a fine transport medium this way. We don't need little
> chunks, we just need something before the end.
>
> Fixing the buffering of the progress (theres a bug) would make this
> very granular.
>

I had thought I fixed that.

And, because I can't remember saying this recently, if you do fix a
bug in remote.py, please update the tests. Since it's such a pain to
debug, it would be nice to be able to rely on the tests.

jml

References

anyone else finding ec2land things disappearing w/out warning?
From: Robert Collins, 2010-10-22
Re: anyone else finding ec2land things disappearing w/out warning?
From: Bryce Harrington, 2010-10-22
Re: anyone else finding ec2land things disappearing w/out warning?
From: Graham Binns, 2010-10-23
Re: anyone else finding ec2land things disappearing w/out warning?
From: Maris Fogels, 2010-10-23
Re: anyone else finding ec2land things disappearing w/out warning?
From: Brad Crittenden, 2010-10-25
Re: anyone else finding ec2land things disappearing w/out warning?
From: Jonathan Lange, 2010-10-25
Re: anyone else finding ec2land things disappearing w/out warning?
From: Brad Crittenden, 2010-10-25
Re: anyone else finding ec2land things disappearing w/out warning?
From: Maris Fogels, 2010-10-25
Re: anyone else finding ec2land things disappearing w/out warning?
From: Robert Collins, 2010-10-26