openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #12205
Re: [metering] high-level design proposal
On Thu, May 24, 2012 at 12:36 AM, Nick Barcet <nick.barcet@xxxxxxxxxxxxx>wrote:
> On 05/22/2012 07:15 PM, Doug Hellmann wrote:
> >
> >
> > On Tue, May 22, 2012 at 1:25 PM, Nick Barcet <nick.barcet@xxxxxxxxxxxxx
> > <mailto:nick.barcet@xxxxxxxxxxxxx>> wrote:
> >
> > On 05/22/2012 03:26 PM, Doug Hellmann wrote:
> > > -> In addition to a signature, I think we would need a
> > sequence number
> > > to be embedded by the agent for each message sent, so that
> loss of
> > > messages, or forgery of messages, can be detected by the
> > collector and
> > > further audit process.
> > >
> > >
> > > OK. We have a message id, but I assumed those would be used to
> > eliminate
> > > duplicates so this sounds like something different or new. It
> implies
> > > that the agent knows its own id (not hard) and keeps up with a
> > sequence
> > > counter (more difficult, though not impossible). Did you have
> > something
> > > in mind for how to implement that?
> >
> > Actually, this was my intent in the original blueprint when I
> specified
> > the "message_id" field then a couple lines bellow: "a process may
> verify
> > that messages were not lost". On the implementation side, I was
> > thinking that each agent would maintain its own sequence count, as a
> > global instance count would be pricier. In my mind, non repudiation
> was
> > built from the message_signature + message_id which should be unique
> for
> > each agent.
> >
> >
> > OK. That brings a couple of more specific questions to mind:
> >
> > Does the agent save its sequence counter through a restart? How and
> > where? What about an upgrade?
>
> Seems easily stored locally.
>
> > What would the down-stream consumer of the data do if it discovered
> > there was a missing event? Who should do that detection work?
>
> Not sure we need to worry about auditing process yet, just make sure
> that we provide necessary the necessary information to do proper
> auditing. In principle, an audit process could then trigger an alert
> for further investigation of the issue.
OK. As James (I think) pointed out in another message and I verified
yesterday, the libvirt data is cumulative, so for those counters at least
we don't need to worry about missing an event. On the other hand, that fact
complicates the API a bit because the "sum" for disk I/O is not the
addition of all of the events in the database but the volume value from the
most recent event. We may need per-counter logic behind the API to perform
the correct query if some of the counters are cumulative and some are
incremental.
Doug
References