← Back to team overview

openerp-india team mailing list archive

[Bug 1129967] [NEW] Cron infinite loop with separate OpenERP and PostgreSQL servers

 

Public bug reported:

We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
openerp.addons.base.ir.ir_cron: Starting job ...

After a few seconds or minutes the job is actually started correctly and
then it gets back into this loop again. This always seems to occur on
jobs which run every minute and as a side affect prevents any other jobs
from running at all.

After some intensive debugging it appears that the issue is down to the
clocks being out of sync between the two servers, and as the clocks
drift further apart the problem does get worse.

The simple solution... use ntpd! But I am thinking if there may be any
other cases where this problem would become apparent, or any other parts
of the system which would also be impacted by this problem.

It turns out the problem occurs for the following reason:
* In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
* In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
* It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.

This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?

Any thoughts on how a date mismatch between Postgres and OpenERP should
be handled?

Thanks,
Craig

** Affects: openobject-server
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Server.
https://bugs.launchpad.net/bugs/1129967

Title:
  Cron infinite loop with separate OpenERP and PostgreSQL servers

Status in OpenERP Server:
  New

Bug description:
  We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
  openerp.addons.base.ir.ir_cron: Starting job ...

  After a few seconds or minutes the job is actually started correctly
  and then it gets back into this loop again. This always seems to occur
  on jobs which run every minute and as a side affect prevents any other
  jobs from running at all.

  After some intensive debugging it appears that the issue is down to
  the clocks being out of sync between the two servers, and as the
  clocks drift further apart the problem does get worse.

  The simple solution... use ntpd! But I am thinking if there may be any
  other cases where this problem would become apparent, or any other
  parts of the system which would also be impacted by this problem.

  It turns out the problem occurs for the following reason:
  * In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
  * In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
  * It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.

  This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
  I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?

  Any thoughts on how a date mismatch between Postgres and OpenERP
  should be handled?

  Thanks,
  Craig

To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-server/+bug/1129967/+subscriptions


Follow ups

References