openstack team mailing list archive

Thread
Date
Re: Proposed OpenStack Service Requirements

To: Todd Willey <todd@xxxxxxxxxxxx>
From: Eric Day <eday@xxxxxxxxxxxx>
Date: Mon, 14 Feb 2011 10:44:41 -0800
Cc: openstack@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTikgowBUSxwsjJO3mDtC-Z-rZU9_=zQuL_-fJpPY@mail.gmail.com>
User-agent: Mutt/1.5.20 (2009-06-14)
On Sun, Feb 13, 2011 at 01:37:29PM -0500, Todd Willey wrote:
> On Wed, Feb 9, 2011 at 1:38 PM, Eric Day <eday@xxxxxxxxxxxx> wrote:
> > * Firehouse - So far we've not discussed this too much, but I
> >  think when we did there was agreement that we need it. As more
> >  service come into the picture, we want the ability to combine and
> >  multiplex our logs, events, billing information, etc. so we can
> >  report per account, per service, and so forth. For example, as
> >  a user, I want to be able to see the logs or billing events with
> >  all entries from all my services (or filter by service), but as a
> >  sysadmin I may want to view per service, or per zone. We may have
> >  registered handlers to grab certain events for PuSH notifications
> >  too. To maintain maximum flexibility across deployments we need
> >  keep the interface generic, the payload can be a JSON object or some
> >  more efficient serialized message (this can be pluggable). The only
> >  required fields are probably:
> >
> >  <timestamp> <service> <account_id> <blob>
> >
> >  Where <blob> is a list of key/value pairs that handlers can
> >  perform routing and processing on. For a logging event, blob
> >  may be "priority=ERROR, message=oops!" or "priority=information,
> >  message=instance X launched". We can keep things really simple and
> >  flexible, relying on a set of documented common attributes that
> >  common event producers, routers, and handlers can key in on.
> 
> Regarding the Firehose, I'd suggest we look over
> http://wiki.openstack.org/AuditLogging to see what we want to change.
> That page doesn't talk about centralization or aggregation, so those
> still need to be thought out.

Thanks, I wasn't aware of this blueprint. As you mention, it's more
about nova specific logging rather than aggregation. While many ideas
can be used for other services, I'm thinking more about the layer
above it. The AuditLogging ideas help provide requirements though.

> I like the idea of using JSON and
> including some tools to work with it.  I'd suggest that we'd add a bit
> more info into the log messages, so that we have more fields to filter
> by.  This should include the logger name, level, deep context, and the
> "extra" kwarg of the call to the log function (which we pack with the
> environment when handling exceptions).  All of these fields are
> available to each call to a logging method (though context and extra
> may be None).  I don't see any reason for making messages look stupid
> (header + blob) when we're going to be parsing them as JSON anyway and
> can build better filters without having to parse out the "blob" field.

When I mentioned "blob", what I really meant was optional or
service-specific fields. I want to be clear what is required vs
optional fields across all services, so we don't want to require
something Nova specific. There can certainly be context requirements,
such as 'if service==nova, fields X,Y,Z are also required'.

> A resulting message may look like this:
> 
> { "timestamp": "2011-02-13T17:50:11Z",
>   "service": "nova-compute",
>   "logger": "nova.virt.libvirt_conn",
>   "level": "DEBUG",
>   "message": "instance i-00000001: rebooted",
>   "context": { "request": "XXXXXX", "user": "u", "project": "p",
> "admin": "0", "elevated": "1" },
>   "extra": "" }

Looks good, but perhaps we should split "service" into two, such as
service=nova and nova-service=compute. I suppose we could also do
prefix matching for the top-level service, but I want to keep routing
of messages very simple.

> I've change context to note the original admin state vs. the current
> state if elevated, which I think is a good idea, but only tangential
> to this conversation.

As long as context is project agnostic. If not, I would also
include the non-project-specific context (account ID like I mentioned
above). This may overlap with context, but context looks nova-specific
right now with request. We want a common ID that a deployment could
use to match up between swift, nova, and so on to allow for easy
routing and aggregation per account. I guess we first need to resolve
how accounts/users/projects/whatever will look and interact with a
common auth service. :)

> I also vote for an "ACCOUNTING" log level that functions similarly to
> how we added an "AUDIT" level that only carries information relevant
> to billing.

Perhaps the top-level envelope should require a 'type', which can
be log/audit/accounting/...

> As far as aggregation of logs, I think the best thing would to have a
> sweeper running on each machine that is generating logs, and at a
> specified interval (hourly?) move all messages generated for that
> interval into swift.  We can make a container for each interval, and
> populate it with files named by hostname.  Then you can just grab all
> the files out of a container and process them however you want (maybe
> even combining them, decorating the json with internal account
> identifiers, and adding the combined log file back into the swift
> container).

Instead of a sweeper, what I was proposing was to instead have
real-time pushing of messages into the firehose service (essentially
a queue). Workers can then pull from this queue to route, aggregate,
and do as they wish. For example one worker listening for nova-events
could collect messages and every hour push to a swift container as you
suggest. We want the ability to access these events real-time, allowing
both internal and external consumers to tap into this as they see fit.

For example, we can expose PuSH, HTTP long-poll, or batched short-poll
interfaces for the firehose and users can subscribe with their account
ID, optionally filtering by service, type, etc. Imagine having a
real-time 'tail -f' tool showing messages for your entire public
cloud account. :)

> I think we still need to have the ability to log to syslog as well.
> We should probably just keep the same formatter we have for stderr and
> --syslog, and have a json formatter for the handler installed by the
> --logfile flag.  It is easy enough to just add a new handler with a
> different formatter in a way that doesn't break what we have already.

Agreed, and this is where logging plugins come into play for each
service. You should be able to enable both syslog and a firehose
plugin in nova, so every message goes to both.

> At least that is how I think of tackling that problem, but feedback is
> always appreciated (especially since we haven't really talked about
> log aggregation yet).  I'm willing to step up and implement lots of
> these features, since I've already got a pretty good handle on the
> logging.

I'm really interested in working on the aggregation services that
nova and others can leverage, so lets continue to get the API/message
format defined and the appropriate consumer interfaces exposed.

-Eric
References

Proposed OpenStack Service Requirements
From: Eric Day, 2011-02-09