openstack team mailing list archive

Thread
Date
Re: [Metering] schema and counter definitions

To: Nick Barcet <nick.barcet@xxxxxxxxxxxxx>
From: Doug Hellmann <doug.hellmann@xxxxxxxxxxxxx>
Date: Tue, 1 May 2012 11:52:57 -0400
Cc: openstack@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CADb+p3TCUtJX9b+Lvsjm98akmVMZ44eho6tVfQ8fdfCq8UxhFA@mail.gmail.com>
On Tue, May 1, 2012 at 11:49 AM, Doug Hellmann
<doug.hellmann@xxxxxxxxxxxxx>wrote:

>
>
> On Tue, May 1, 2012 at 10:38 AM, Nick Barcet <nick.barcet@xxxxxxxxxxxxx>wrote:
>
>> On 05/01/2012 02:23 AM, Loic Dachary wrote:
>> > On 04/30/2012 11:39 PM, Doug Hellmann wrote:
>> >>
>> >>
>> >> On Mon, Apr 30, 2012 at 3:43 PM, Loic Dachary <loic@xxxxxxxxxxxx
>> >> <mailto:loic@xxxxxxxxxxxx>> wrote:
>> >>
>> >>     On 04/30/2012 08:03 PM, Doug Hellmann wrote:
>> >>>
>> >>>
>> >>>     On Mon, Apr 30, 2012 at 11:43 AM, Loic Dachary <loic@xxxxxxxxxxxx
>> >>>     <mailto:loic@xxxxxxxxxxxx>> wrote:
>> >>>
>> >>>         On 04/30/2012 03:49 PM, Doug Hellmann wrote:
>> >>>>
>> >>>>
>> >>>>         On Mon, Apr 30, 2012 at 6:46 AM, Loic Dachary
>> >>>>         <loic@xxxxxxxxxxxx <mailto:loic@xxxxxxxxxxxx>> wrote:
>> >>>>
>> >>>>             On 04/30/2012 12:15 PM, Loic Dachary wrote:
>> >>>>             > We could start a discussion from the content of the
>> >>>>             following sections:
>> >>>>             >
>> >>>>             > http://wiki.openstack.org/EfficientMetering#Counters
>> >>>>             I think the rationale of the counter aggregation needs
>> >>>>             to be explained. My understanding is that the metering
>> >>>>             system will be able to deliver the following
>> >>>>             information: 10 floating IPv4 addresses were allocated
>> >>>>             to the tenant during three months and were leased from
>> >>>>             provider NNN. From this, the billing system could add a
>> >>>>             line to the invoice : 10 IPv4, $N each = $10xN because
>> >>>>             it has been configured to invoice each IPv4 leased from
>> >>>>             provider NNN for $N.
>> >>>>
>> >>>>             It is not the purpose of the metering system to display
>> >>>>             each IPv4 used, therefore it only exposes the aggregated
>> >>>>             information. The counters define how the information
>> >>>>             should be aggregated. If the idea was to expose each
>> >>>>             resource usage individually, defining counters would be
>> >>>>             meaningless as they would duplicate the activity log
>> >>>>             from each OpenStack component.
>> >>>>
>> >>>>             What do you think ?
>> >>>>
>> >>>>
>> >>>>         At DreamHost we are going to want to show each individual
>> >>>>         resource (the IPv4 address, the instance, etc.) along with
>> >>>>         the charge information. Having the metering system aggregate
>> >>>>         that data will make it difficult/impossible to present the
>> >>>>         bill summary and detail views that we want. It would be much
>> >>>>         more useful for us if it tracked the usage details for each
>> >>>>         resource, and let us aggregate the data ourselves.
>> >>>>
>> >>>>         If other vendors want to show the data differently, perhaps
>> >>>>         we should provide separate APIs for retrieving the detailed
>> >>>>         and aggregate data.
>> >>>>
>> >>>>         Doug
>> >>>>
>> >>>         Hi,
>> >>>
>> >>>         For the record, here is the unfinished conversation we had on
>> IRC
>> >>>
>> >>>         (04:29:06 PM) dhellmann: dachary, did you see my reply about
>> >>>         counter definitions on the list today?
>> >>>         (04:39:05 PM) dachary: It means some counters must not be
>> >>>         aggregated. Only the amount associated with it is but there
>> >>>         is one counter per IP.
>> >>>         (04:55:01 PM) dachary: dhellmann: what about this :the id of
>> >>>         the ressource controls the agregation of all counters : if it
>> >>>         is missing, all resources of the same kind and their measures
>> >>>         are aggregated. Otherwise only the measures are agreggated.
>> >>>
>> http://wiki.openstack.org/EfficientMetering?action=diff&rev2=40&rev1=39
>> >>>         <
>> http://wiki.openstack.org/EfficientMetering?action=diff&rev2=40&rev1=39>
>> >>>         (04:55:58 PM) dachary: it makes me a little unconfortable to
>> >>>         define such an "ad-hoc" grouping
>> >>>         (04:56:53 PM) dachary: i.e. you actuall control the
>> >>>         aggregation by chosing which value to put in the id column
>> >>>         (04:58:43 PM) dachary: s/actuall/actually/
>> >>>         (05:05:38 PM) ***dachary reading
>> >>>         http://www.ogf.org/documents/GFD.98.pdf
>> >>>         (05:05:54 PM) dachary: I feel like we're trying to resolve a
>> >>>         non problem here
>> >>>         (05:08:42 PM) dachary: values need to be aggregated. The raw
>> >>>         input is a full description of the resource and a value (
>> >>>         gauge ). The question is how to control the aggregation in a
>> >>>         reasonably flexible way.
>> >>>         (05:11:34 PM) dachary: The definition of a counter could
>> >>>         probably be described as : the id of a resource and code to
>> >>>         fill each column associated with it.
>> >>>
>> >>>         I tried to append the following, but the wiki kept failing.
>> >>>
>> >>>         Propose that the counters are defined by a function instead
>> >>>         of being fixed. That helps addressing the issue of
>> >>>         aggregating the bandwidth associated to a given IP into a
>> >>>         single counter.
>> >>>
>> >>>         Alternate idea :
>> >>>          * a counter is defined by
>> >>>           * a name ( o1, n2, etc. ) that uniquely identifies the
>> >>>         nature of the measure ( outbound internet transit, amount of
>> >>>         RAM, etc. )
>> >>>           * the component in which it can be found ( nova, swift etc.)
>> >>>          * and by columns, each one is set with the result of
>> >>>         aggregate(find(record),record) where
>> >>>           * find() looks for the existing column as found by
>> >>>         selecting with the unique key ( maybe the name and the
>> >>>         resource id )
>> >>>           * record is a detailed description of the metering event to
>> >>>         be aggregated (
>> >>>
>> http://wiki.openstack.org/SystemUsageData#compute.instance.exists:
>> >>>         )
>> >>>           * the aggregate() function returns the updated row. By
>> >>>         default it just += the counter value with the old row
>> >>>         returned by find()
>> >>>
>> >>>
>> >>>     Would we want aggregation to occur within the database where we
>> >>>     are collecting events, or should that move somewhere else?
>> >>     I assume the events collected by the metering agents will all be
>> >>     archived for auditing (or re-building the database)
>> >>
>> http://wiki.openstack.org/EfficientMetering?action=diff&rev2=45&rev1=44
>> >>     <
>> http://wiki.openstack.org/EfficientMetering?action=diff&rev2=45&rev1=44>
>> >>
>> >>     Therefore the aggregation should occur when the database is
>> >>     updated to account for a new event.
>> >>
>> >>     Does this make sense ? I may have misunderstood part of your
>> question.
>> >>
>> >>
>> >> I guess what I don't understand is why the aggregated data is written
>> >> back to the metering database at all. If it's in the same database, it
>> >> seems like it should be in a different "table" (or equivalent) so the
>> >> original data is left alone.
>> > In my view the events are not stored in a database, they are merely
>> > appended to a log file. The database is built from the events with
>> > aggregated data. I now understand that you (and Joshua Harlow) think
>> > it's better to not aggregate the data and let the billing system do this
>> > job.
>>
>> My intent when writing the blueprint was that each event would be
>> recorded atomically in the database, as it is the only way to control
>> that we have not missed any. Aggregation, should be done at the external
>> API level if the request is to get the sum of a given counter.
>>
>
> That matches what I was thinking. The "log file" that Loic mentioned would
> in fact be a database that can handle a lot of writes. We could use some
> sort of simple file format, but since we're going to have to read and parse
> the log anyway, we might as well use a tool that makes that easy.
>
> Aggregation could happen either in a metering API based on the query, or
> an external app could retrieve a large dataset and manage the aggregation
> itself.
>
>
>> What I missed in the blueprint and seems to be appearing clearly now, is
>> that an event need to be able to carry the "object-reference" for which
>> it was collected, and this would seem highly necessary looking at the
>> messages in this thread. A metering event would essentially be defined
>> by (who, what, which) instead of a simple (who, what).  As a consequence
>> we would need to extend the DB schema to add this [which/object
>> reference], and make sure that we carry it as well when we will work on
>> the message API format definition.
>>
>> How does this sound?
>>
>
> I think so. A lot of these sorts of issues can probably be fixed by being
> careful about how we define the measurements. For example, I may want to be
> able to show a customer the network bandwidth used per server, not just per
> network. If we measure the bandwidth consumed by each VIF, the aggregation
> code can take care of summarizing by network (because we know where the VIF
> is) and/or server (because we know which server has the VIF).
>
> We may need to record more detail than a simple "which," though, because
> it may be possible to change some information relevant for calculating the
> billing rate later. For example, a tenant can resize an instance, which
> would usually cause a change in the billing rate. Some of the relationships
> might change, too (Is it possible to move a VIF between networks?).
>
> At first I thought this might require separate table definitions per
> resource type (instance, network, etc.) but re-reading the table of
> counters in EfficientMetering I guess this is handled by measuring things
> like CPU, RAM, and block storage as separate counters? So a single event
> for creating a new instance might result in several records being written
> to the database, with the "which" set to the instance identifier. The data
> could then be presented as a unified "resource usage" report for that
> server.
>
> I think that works, but it may make the job of calculating the bill
> harder. We are planning to follow the model of specifying rates per size,
> so we would have to figure out which combination of CPU, RAM, and root
> volume storage matches up with a given size to determine the rate.
>
> Another piece I've been thinking about is handling boundary conditions
> when resource create and delete events don't both fall inside a billing
> cycle (or within the granularity of the metering system). That shouldn't be
> part of logging the events, necessarily, but it could be a reusable
> component that feeds into producing the aggregated data (either through the
> API, or as a way of processing the results returned by the API).
>
> >> Maybe it's time to start focusing these discussions on user stories?
>> >>
>> > I agree. Would you like to go first ?
>>
>
> These are "things that might happen" use cases rather than "user stories,"
> but let's see where they take us:
>
> 1. User creates an instance, waits some period of time, then terminates it.
>  - Vary the period of time to allow the events to both fall within the
> metering granularity window, to overlap an entire window, to start in one
> window and end in another.
>  - The same variations for "billing cycle" instead of "metering
> granularity window."
> 2. User creates an instance, waits some period of time, then resizes it.
>  - Vary the period of time as above.
>  - Do we need variations for resizing up and down?
> 3. User creates an instance but it fails to create properly (provider
> issue).
> 4. User creates an instance but it fails to boot after creation (bad
> image).
> 5. User create volume storage, adds it to an existing instance, waits a
> period of time, then deletes the volume.
>  - Vary the period of time as above.
> 6. User creates volume storage, adds it to an existing instance, waits a
> period of time, then terminates the instance (I'm not sure what happens to
> the volume in that case, maybe it still exists?)
>
> A provider-related story might be:
>
> 1. As a provider, I can query the metering API to determine the activity
> for a tenant within a given period of time.
>
> Although that's pretty vague. :-)
>

I thought of another provider story:

2. As a provider, I can install a metering plugin to start collecting data
about events not handled by the core metering app.
References

[Metering] schema and counter definitions
From: Loic Dachary, 2012-04-30
Re: [Metering] schema and counter definitions
From: Loic Dachary, 2012-04-30
Re: [Metering] schema and counter definitions
From: Doug Hellmann, 2012-04-30
Re: [Metering] schema and counter definitions
From: Loic Dachary, 2012-04-30
Re: [Metering] schema and counter definitions
From: Doug Hellmann, 2012-04-30
Re: [Metering] schema and counter definitions
From: Loic Dachary, 2012-04-30
Re: [Metering] schema and counter definitions
From: Doug Hellmann, 2012-04-30
Re: [Metering] schema and counter definitions
From: Loic Dachary, 2012-05-01
Re: [Metering] schema and counter definitions
From: Nick Barcet, 2012-05-01
Re: [Metering] schema and counter definitions
From: Doug Hellmann, 2012-05-01