openerp-india team mailing list archive

Thread
Date

[Bug 1129967] Re: [6.1] Cron loop with separate OpenERP and PostgreSQL servers with out-of-sync clocks

To: openerp-india@xxxxxxxxxxxxxxxxxxx
From: "Olivier Dony \(OpenERP\)" <1129967@xxxxxxxxxxxxxxxxxx>
Date: Tue, 19 Feb 2013 13:34:10 -0000
Reply-to: Bug 1129967 <1129967@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Hi,

Your servers were drifting away progressively from each other in time,
and this is bound to cause issues in the long term, so it's best to
notice it quickly, one way or another. You really needed to synchronize
their clocks.

That said it's clearly not an intended feature, and we could imagine
removing that "return" line to make the scheduling fair even in that
case. For the record in 7.0 this was fixed at the same time as bug
1086396 at revision 4744
revid:odo@xxxxxxxxxxx-20121222011355-te4osnedofct0p1u, but this patch
cannot be backported to 6.1 as-is, as it would violate the stable
policy.

** Changed in: openobject-server
   Importance: Undecided => Wishlist

** Changed in: openobject-server
       Status: New => Opinion

** Changed in: openobject-server
     Assignee: (unassigned) => OpenERP's Framework R&D (openerp-dev-framework)

-- 
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Server.
https://bugs.launchpad.net/bugs/1129967

Title:
  [6.1] Cron loop with separate OpenERP and PostgreSQL servers with out-
  of-sync clocks

Status in OpenERP Server:
  Opinion

Bug description:
  We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
  openerp.addons.base.ir.ir_cron: Starting job ...

  After a few seconds or minutes the job is actually started correctly
  and then it gets back into this loop again. This always seems to occur
  on jobs which run every minute and as a side affect prevents any other
  jobs from running at all.

  After some intensive debugging it appears that the issue is down to
  the clocks being out of sync between the two servers, and as the
  clocks drift further apart the problem does get worse.

  The simple solution... use ntpd! But I am thinking if there may be any
  other cases where this problem would become apparent, or any other
  parts of the system which would also be impacted by this problem.

  It turns out the problem occurs for the following reason:
  * In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
  * In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
  * It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.

  This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
  I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?

  Any thoughts on how a date mismatch between Postgres and OpenERP
  should be handled?

  Thanks,
  Craig

To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-server/+bug/1129967/+subscriptions

References

[Bug 1129967] [NEW] Cron infinite loop with separate OpenERP and PostgreSQL servers
From: Craig Gowing (credativ), 2013-02-19