← Back to team overview

openstack team mailing list archive

Re: [Metering] Agent configuration mechanism

 

On Tue, Jun 5, 2012 at 12:59 PM, Nick Barcet <nick.barcet@xxxxxxxxxxxxx>wrote:

> On 06/05/2012 04:44 PM, Doug Hellmann wrote:
> > On Tue, Jun 5, 2012 at 10:41 AM, Doug Hellmann
> > <doug.hellmann@xxxxxxxxxxxxx <mailto:doug.hellmann@xxxxxxxxxxxxx>>
> wrote:
> >     On Tue, Jun 5, 2012 at 9:56 AM, Nick Barcet
> >     <nick.barcet@xxxxxxxxxxxxx <mailto:nick.barcet@xxxxxxxxxxxxx>>
> wrote:
> >
> >         Following up on our last meeting, here is a proposal for
> centrally
> >         hosting configuration of agents in ceilometer.
> >
> >         The main idea is that all agents of a given type should be
> sending
> >         similarly formatted information in order for the information to
> be
> >         usable, hence the need to ensure that configuration info is
> >         centrally
> >         stored and retrieved.  This would rule out, in my mind, the idea
> >         that we
> >         could use the global flags object, as distribution of the
> >         configuration
> >         file is left to the cloud implementor and does not lend for easy
> and
> >         synchronized updates of agent config.
> >
> >         Configuration format and content is left to the agent's
> >         implementation,
> >         but it is assumed that each meter covered by an agent can be :
> >          * enabled or disabled
> >          * set to send information at a specified interval.
> >
> >
> >     Right now we only have one interval for all polling. Do you think we
> >     need to add support for polling different values at different
> >     intervals? Do we need other per-agent settings, or are all of the
> >     settings the same for all agents? (I had assumed the latter would be
> >     all we needed.)
>
> I would have thought that we may want to support different intervals per
> meters, based on the billing rules that one may want to offer.  For
> example, I may want to bill compute by the hour but floating IPs by the
> day, hence have a different reporting interval for each.
>

I was planning to aggregate the values for items being billed over the
longer time frames, but we can make the polling interval configurable. It
will take some work, because of the way the scheduled tasks are configured
in the service and manager (right now we just schedule one method to run,
and it invokes each pollster).

How important is it to include this in Folsom?


>
> >         1/ Configuration is stored for each agent in the database as
> follow
> >
> +-------------------------------------------------------------------+
> >         | Field     | Type     | Note
> >             |
> >
> +-------------------------------------------------------------------+
> >         | AgentType | String   | Unique agent type
> >            |
> >         | ConfVers  | Integer  | Version of the configuration
> >             |
> >         | Config    | Text     | JSON Configuration info (defined by
> >         agent) |
> >
> +-----------+----------+--------------------------------------------+
> >
> >         2/ Config is retreived via the messaging queue upon boot once a
> day
> >         (this should be defined in the global flags object) to check if
> the
> >         config has changed.
> >
> >
> >     Updating the config once a day is not going to be enough in an
> >     environment with a lot of compute nodes.
> >
> >
> > Two thoughts merged into one sentence there. Need more caffeine.
> >
> > What I was trying to say, was that updating the config once a day might
> > not be enough and in environments with a lot of compute nodes going
> > around to manually restart the services each time the config changes
> > will be a pain. See below for more discussion of pushing config settings
> > out.
>
> Agreed, and that's why I proposed that the interval for confguration
> refresh should be set in the Global object flag (this is something that
> can be shared among all the agents).
>
> >
> >
> >         Request sent by the agent upon boot and :
> >
> >            'reply_to': 'get_config_data',
> >            'correlation_id': xxxxx
> >            'version': '1.0',
> >            'args': {'data': {
> >                       'AgentType': agent.type,
> >                       'CurrentVersion': agent.version,
> >                       'ConfigDefault': agent.default,
> >                       },
> >                    },
> >
> >
> >     Is this a standard OpenStack RPC call?
>
> Not sure about that, but if it can be, it would be easier :)
>

Yeah, I think a regular RPC call would be the easiest implementation. So we
still need to specify the arguments to that call, but we don't have to
worry about how the messages travel back and forth.


>
> >         Where ConfigDefault are the "sane" default proposed by the agent
> >         authors.
> >
> >
> >     Why is the agent proposing default settings?
>
> So that the first agent of a given type can populate its info with sane
> defaults that can then be edited later on?
>

If the agent plugins are installed on the server where the collector is
located, the collector can ask them for defaults.


>
> >         If no config record is found the collector creates the record,
> sets
> >         ConfVers to 1 and sends back a normal reply.
> >
> >         Reply sent by the collector:
> >            'correlation_id': xxxxx
> >            'version': '1.0',
> >
> >
> >     Do we need minor versions for the config settings, or are those
> >     simple sequence numbers to track which settings are the "most
> current"?
>
> Simple sequence was what I was thinking about.
>

Wouldn't it be simpler if the configuration settings were pushed to the
agent as an idempotent operation?


>
> >            'args': {'data': {
> >                       'Result': result.code,
> >                       'ConfVers': ConfVers,
> >                       'Config': Config,
> >                       },
> >                    },
> >            }
> >
> >         Result is set as follow:
> >            200  -> Config was retrieved successfully
> >            201  -> Config was created based on received default (Config
> >         is empty)
> >            304  -> Config version is identical to CurrentVersion (Config
> >         is empty)
> >
> >
> >     Why does the agent need to know the difference between those?
> >     Shouldn't it simply use the settings it is given?
>
> To avoid processing update code if the update is not needed?
>

That optimization doesn't need to be built into the protocol, though. The
only way to get that right is for the central server to have a
representation of the state of the configuration of each agent. It is
simpler for the agent to ask the collector, "what should my configuration
be?" and then handle the changes locally.

The simplest implementation will be to just throw away all of the pollsters
and instantiate new ones when the configuration changes. It isn't expensive
to construct those objects, and doing it this way should be easier to
implement than trying to adjust settings (especially the schedule).


>
> >         This leaves open the question of having some UI to change the
> >         config,
> >         but I thing we can live with manual updating of the records for
> >         the time
> >         being.
> >
> >
> >     Since we're using the service and RPC frameworks from nova
> >     elsewhere, we have the option of issuing commands to all of the
> >     agents from a central server. That would let us, for example, use a
> >     cast() call to push a new configuration out to all of the agents at
> >     once, on demand (from a command line program, for example).
>
> Sounds nifty.  Let's amend.
>
> >     I don't see the need for storing the configuration in the database.
> >     It seems just as easy to have a configuration file on the central
> >     server. The collector could read the file each time it is asked for
> >     the agent configuration, and the command line program that pushes
> >     config changes out could do the same.
>
> Over engineering on my side, maybe.  You are right that the database is
> NOT needed and we can do with a simple file, but then the collector
> becomes state-full and HA considerations will start kicking in if we
> want to have 2 collectors running in //.  If the DB is shared, the issue
> is pushed to the DB, which will, hopefully, be redundant by nature.
>

That's a reasonable point. I assumed the collector configuration is going
to need to be shared among those nodes already. How does that work in other
OpenStack components?


>
> >     Have you given any thought to distributing the secret value used for
> >     signing incoming messages? A central configuration authority does
> >     not give us a secure way to deliver secrets like that. If anyone
> >     with access to the message queue can retrieve the key by sending RPC
> >     requests, we might as well not sign the messages.
>
> Actually, the private key used to generate a signature should be unique
> to each host, if we want them to have any value at all, therefore
> distributing a common signature should NOT be part of this, or we would
> fall under the notion of a shared secret, which is, IMHO, not any better
> than having a global password.
>
> I would recommend that, for the time being, we just generate a random
> key pair per host the first time the agent is run, allowing for someone
> with further requirement to eventually populate this value by another
> mean.
>
> In any case, if we want to effectively check the signature, the public
> key does need to be accessible by the collector to check it and have yet
> to define a way to do so...  Proposals welcome, but again, while I think
> we should lay the ground for a great security experience, we certainly
> don't need to solve it all in v1.
>

The current implementation uses hmac message signatures, which use a shared
secret instead of public/private key pairs. We can have a separate secret
for each agent, but we still need the collector(s) to have them all. I
thought the point of signing the messages was to prevent an arbitrary agent
from signing on to the message queue and sending bogus data. Do we need to
be doing more than hmac for security?


>
> Nick
>
>

Follow ups

References