openerp-india team mailing list archive
-
openerp-india team
-
Mailing list archive
-
Message #23677
[Bug 1129967] [NEW] Cron infinite loop with separate OpenERP and PostgreSQL servers
Public bug reported:
We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
openerp.addons.base.ir.ir_cron: Starting job ...
After a few seconds or minutes the job is actually started correctly and
then it gets back into this loop again. This always seems to occur on
jobs which run every minute and as a side affect prevents any other jobs
from running at all.
After some intensive debugging it appears that the issue is down to the
clocks being out of sync between the two servers, and as the clocks
drift further apart the problem does get worse.
The simple solution... use ntpd! But I am thinking if there may be any
other cases where this problem would become apparent, or any other parts
of the system which would also be impacted by this problem.
It turns out the problem occurs for the following reason:
* In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
* In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
* It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.
This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?
Any thoughts on how a date mismatch between Postgres and OpenERP should
be handled?
Thanks,
Craig
** Affects: openobject-server
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Server.
https://bugs.launchpad.net/bugs/1129967
Title:
Cron infinite loop with separate OpenERP and PostgreSQL servers
Status in OpenERP Server:
New
Bug description:
We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
openerp.addons.base.ir.ir_cron: Starting job ...
After a few seconds or minutes the job is actually started correctly
and then it gets back into this loop again. This always seems to occur
on jobs which run every minute and as a side affect prevents any other
jobs from running at all.
After some intensive debugging it appears that the issue is down to
the clocks being out of sync between the two servers, and as the
clocks drift further apart the problem does get worse.
The simple solution... use ntpd! But I am thinking if there may be any
other cases where this problem would become apparent, or any other
parts of the system which would also be impacted by this problem.
It turns out the problem occurs for the following reason:
* In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
* In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
* It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.
This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?
Any thoughts on how a date mismatch between Postgres and OpenERP
should be handled?
Thanks,
Craig
To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-server/+bug/1129967/+subscriptions
Follow ups
References