credativ team mailing list archive

Thread
Date

[Bug 677257] Re: Scheduler won't reschedule a task if it takes too long

To: credativ@xxxxxxxxxxxxxxxxxxx
From: "Ghislain Nebra \(INCB\)" <677257@xxxxxxxxxxxxxxxxxx>
Date: Mon, 19 Dec 2011 09:25:40 -0000
Reply-to: Bug 677257 <677257@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

I would also add that in _poolJos function, sql requests are using "now()" of Postgre whereas in the if statement, the "DateTime.now()" of Python is used.
Only one "now" origin should be used : the Python one.

Here is my suggestion :
cr.execute("select * from ir_cron where numbercall<>0 and active and nextcall<=now() order by priority")
should be
cr.execute("select * from ir_cron where numbercall<>0 and active and nextcall<='" + now.strftime('%Y-%m-%d %H:%M:%S') + "' order by priority")

This is very important if your PostgreSQL database is not on the same
computer than the OpenERP server because a small difference in clock
could lead to non-working cron jobs.

Moreover in this function, the "while nextcall < now and numbercall:"
should be "while nextcall <= now and numbercall:" because if the cron
job is planned at 1:00, the timer will wake up at 1:00 ... and you want
your cron job to be executed !

--
You received this bug notification because you are a member of OpenERP
Framework Experts, which is subscribed to OpenERP Server.
https://bugs.launchpad.net/bugs/677257

Title:
Scheduler won't reschedule a task if it takes too long

Status in OpenERP Server:
Fix Released
Status in OpenERP Server 5.0 series:
Fix Released

Bug description:
We ran into this problem because we were running the mrp scheduler
every two minutes and it started to take longer than a minute to run.
Suddenly, it would just stop being scheduled.

It looks like this is what happens in the ir_cron._poolJobs() method:

1. Get the current time and hold it in the "now" variable.
2. Find all active jobs whose next call time has passed.
3. Run each job.
4. Increment the next call time by the job's interval until it passes the "now" variable which may be a few minutes in the past if the jobs took a while to execute.
5. Update the job's next call time in the database.
6. Find all active jobs whose next call time is in the future and schedule the first one.

If a job ends up getting scheduled for a time after the "now" variable
but before step 6 executes then it will no longer be executed.

Here's a scenario where that could happen. The mrp scheduler is
scheduled to run every five minutes and it takes two minutes to run.

10:00 mrp starts
10:02 mrp finishes, scheduled for 10:05
10:03 admin shuts down the server for maintenance
10:09 admin starts up the server and connects to database. mrp starts
10:11 mrp finishes, scheduled for 10:10 (10:05 plus 5 minutes, it's after 10:09). mrp is no longer in the list of tasks in memory.

Once this happens, I think there is a two-in-five chance that the mrp
task will run once at start up and not be scheduled after that.

Why does step six above need to check that the next time is in the
future? Shouldn't it just schedule the minimum time for all active
tasks?

To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-server/+bug/677257/+subscriptions

Follow ups

Re: [Bug 677257] Re: Scheduler won't reschedule a task if it takes too long
From: Olivier Dony (OpenERP), 2011-12-19