← Back to team overview

ubuntu-phone team mailing list archive

Re: Catching CPU run-aways on Touch

 

On Wed, Sep 04, 2013 at 10:22:59AM -0500, Ted Gould wrote:
> It seems to me for all of these long running services the "manager" of
> them is Upstart.  It restarts them if they crash or do other stupid
> things, and it knows whether they're running.  This seems roughly like
> respawn limits[1], where they're per-task and can be configured to
> create different results.

> Also, it seems that this should work within those limits, we should try
> to restart the service to see if it solves the problem.  But keep it on
> a shorter leash for the second time around.

> To give people something to attack more specifically, I'll say this.  We
> should add a line to Upstart job configs that looks like this:

>         cpu limit [CPU Percentage] [seconds]

> Then we can have a small upstart-bridge-like process that watches
> upstart for started, stopped and added jobs to ensure that they're on
> the naughty/nice list and that they behave within those limits.

upstart already supports setting kernel ulimits for jobs; through ulimits
you can already set "max CPU for the life of the process" and "max memory
per process".  You can also set the realtime priority of a process.  You
can't set a max *percentage* of CPU usage for the job, or max memory usage
for the set of processes spawned by the job; both of these capabilities will
arrive with cgroup support.

However, in all of the above cases we're talking about *limiting* the CPU
usage, not *measuring* it.  If the desired semantics are to measure the
process's CPU usage and report on it / *optionally* kill the process, I
don't think that's a reasonable fit for upstart.  It makes sense for upstart
to apply cgroups to processes upon request, allowing the kernel to limit the
amount of CPU the job gets access to... but then by definition no such
process is ever a "runaway" because it's kept on a leash, so you don't
actually get any useful information this way about which processes are buggy
and should be fixed.  If we care about identifying and fixing misbehaving
processes, rather than just limiting the damage, that should be handled
outside of upstart.

-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@xxxxxxxxxx                                     vorlon@xxxxxxxxxx

Attachment: signature.asc
Description: Digital signature


Follow ups

References