maas-devel team mailing list archive
-
maas-devel team
-
Mailing list archive
-
Message #00339
Re: Clock skew and OAuth
On Wed, 27 Jun 2012, Julian Edwards wrote:
> On Wednesday 27 June 2012 10:14:26 you wrote:
> > On 2012-06-27 07:59, Julian Edwards wrote:
> > > https://bugs.launchpad.net/maas/+bug/978127
> > >
> > > Scott, is there a quick backportable fix that we can do for this? Perhaps
> > > send the MAAS server's time at boot somehow, before trying to access the
> > > metadata service (via user data?) and then have cloud-init set the clock?
> > >
> > > It's causing a lot of pain for quite a few people.
Agreed.
> > Would it be possible to make maas depend on an ntp server, have the dhcp
> > config refer the nodes to it, and install & run ntpdate on the node
> > early on?
>
> That's one other sort of thing I had in mind, provided the maas server is the
> ntp server since there may not be any other onward network available from the
> node (yet).
>
> > It's a few extra moving parts but it avoids issues like ntp servers that
> > might be out of the nodes' reach, or re-inventing the protocol. On the
> > downside, I have no idea how hard it might be to install ntpdate on a
> > node in this state.
>
> The clock only has to be roughly in sync, not perfectly. This is why we don't
> need to re-invent ntp, we could just throw a clock setting in the cloud-init
> code which pulls the time out of the user data.
I think a reasonable and SRU-able solution is below. Note, that in order
to deliver this, we have to deliver updated ephemeral images (which was
always expected, just pointing out that this fix comes in a ~600M
download).
The way this works right now is the following:
A. ephemeral instance is booted with a 'url=' parameter on the kernel
command line something like this:
url=http://maasserver/cblr/svc/op/ks/system/node-XXXX
B. as described at [1], cloud-init pulls that un-authed url, and stores it
as local configuration. Currently the payload looks like this:
#cloud-config
datasource:
MAAS:
metadata_url: http://mass-host.localdomain/source
consumer_key: Xh234sdkljf
token_key: kjfhgb3n
token_secret: 24uysdfx1w4
C. cloud-init then continues on and uses that maas datasource as if it
were locally configured to do so. It pulls user-data from
the derivative url, and then executes it.
D. The user-data provided is read from
/etc/maas/commissioning-user-data [2]. cloud-init executes this code
which makes api calls back to the configured maas server in 'B' to
post commissioning status.
The issue that we see in bug 978127 is that the http requests done in 'C'
fail because of out of sync clock on the ephemeral node.
The solution that I suggest is:
i.) modify 'B' above to include 'time_sync_url' field under 'MAAS'
ii.) Before cloud-init does oauthed requests for user-data in 'C'
above, it will first do an un-authed request to the value of
'time_sync_url' which will return data like:
Wed, 27 Jun 2012 10:13:29 -0400
iii.) cloud-init will then set the system clock (not the hardware clock)
to the given date. The subsequent oauth requests will succeed as
they'll have a reasonable system clock at that point.
iv.) if possible make cloud-init log the failure in 'C' above more
obviously on the console. I believe this is less than
straightforward unfortunately due to the console switching around
that is done on boot.
Note, I skipping the 'time_sync_url' by simply directly providing a
'time_sync' like:
time_sync: Wed, 27 Jun 2012 10:13:29 -0400
If that is seen as desirable it could probably be accommodated. The thing
that I do not like about it is that it writes that data to a local config
file, and obviously the current time stamp very quickly becomes incorrect.
Hiding it behind a url that has dynamic and correct content removes that.
Note 2: We could/should just put this in the bug.
--
[1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/doc/kernel-cmdline.txt
[2] http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/etc/maas/commissioning-user-data
Follow ups
References