← Back to team overview

openstack team mailing list archive

Re: [Metering] schema and counter definitions

 

On 05/01/2012 04:38 PM, Nick Barcet wrote:
> On 05/01/2012 02:23 AM, Loic Dachary wrote:
>> On 04/30/2012 11:39 PM, Doug Hellmann wrote:
>>>
>>> On Mon, Apr 30, 2012 at 3:43 PM, Loic Dachary <loic@xxxxxxxxxxxx
>>> <mailto:loic@xxxxxxxxxxxx>> wrote:
>>>
>>>     On 04/30/2012 08:03 PM, Doug Hellmann wrote:
>>>>
>>>>     On Mon, Apr 30, 2012 at 11:43 AM, Loic Dachary <loic@xxxxxxxxxxxx
>>>>     <mailto:loic@xxxxxxxxxxxx>> wrote:
>>>>
>>>>         On 04/30/2012 03:49 PM, Doug Hellmann wrote:
>>>>>
>>>>>         On Mon, Apr 30, 2012 at 6:46 AM, Loic Dachary
>>>>>         <loic@xxxxxxxxxxxx <mailto:loic@xxxxxxxxxxxx>> wrote:
>>>>>
>>>>>             On 04/30/2012 12:15 PM, Loic Dachary wrote:
>>>>>             > We could start a discussion from the content of the
>>>>>             following sections:
>>>>>             >
>>>>>             > http://wiki.openstack.org/EfficientMetering#Counters
>>>>>             I think the rationale of the counter aggregation needs
>>>>>             to be explained. My understanding is that the metering
>>>>>             system will be able to deliver the following
>>>>>             information: 10 floating IPv4 addresses were allocated
>>>>>             to the tenant during three months and were leased from
>>>>>             provider NNN. From this, the billing system could add a
>>>>>             line to the invoice : 10 IPv4, $N each = $10xN because
>>>>>             it has been configured to invoice each IPv4 leased from
>>>>>             provider NNN for $N.
>>>>>
>>>>>             It is not the purpose of the metering system to display
>>>>>             each IPv4 used, therefore it only exposes the aggregated
>>>>>             information. The counters define how the information
>>>>>             should be aggregated. If the idea was to expose each
>>>>>             resource usage individually, defining counters would be
>>>>>             meaningless as they would duplicate the activity log
>>>>>             from each OpenStack component.
>>>>>
>>>>>             What do you think ?
>>>>>
>>>>>
>>>>>         At DreamHost we are going to want to show each individual
>>>>>         resource (the IPv4 address, the instance, etc.) along with
>>>>>         the charge information. Having the metering system aggregate
>>>>>         that data will make it difficult/impossible to present the
>>>>>         bill summary and detail views that we want. It would be much
>>>>>         more useful for us if it tracked the usage details for each
>>>>>         resource, and let us aggregate the data ourselves.
>>>>>
>>>>>         If other vendors want to show the data differently, perhaps
>>>>>         we should provide separate APIs for retrieving the detailed
>>>>>         and aggregate data.
>>>>>
>>>>>         Doug
>>>>>
>>>>         Hi,
>>>>
>>>>         For the record, here is the unfinished conversation we had on IRC
>>>>
>>>>         (04:29:06 PM) dhellmann: dachary, did you see my reply about
>>>>         counter definitions on the list today?
>>>>         (04:39:05 PM) dachary: It means some counters must not be
>>>>         aggregated. Only the amount associated with it is but there
>>>>         is one counter per IP.
>>>>         (04:55:01 PM) dachary: dhellmann: what about this :the id of
>>>>         the ressource controls the agregation of all counters : if it
>>>>         is missing, all resources of the same kind and their measures
>>>>         are aggregated. Otherwise only the measures are agreggated.
>>>>         http://wiki.openstack.org/EfficientMetering?action=diff&rev2=40&rev1=39
>>>>         <http://wiki.openstack.org/EfficientMetering?action=diff&rev2=40&rev1=39>
>>>>         (04:55:58 PM) dachary: it makes me a little unconfortable to
>>>>         define such an "ad-hoc" grouping
>>>>         (04:56:53 PM) dachary: i.e. you actuall control the
>>>>         aggregation by chosing which value to put in the id column
>>>>         (04:58:43 PM) dachary: s/actuall/actually/
>>>>         (05:05:38 PM) ***dachary reading
>>>>         http://www.ogf.org/documents/GFD.98.pdf
>>>>         (05:05:54 PM) dachary: I feel like we're trying to resolve a
>>>>         non problem here
>>>>         (05:08:42 PM) dachary: values need to be aggregated. The raw
>>>>         input is a full description of the resource and a value (
>>>>         gauge ). The question is how to control the aggregation in a
>>>>         reasonably flexible way.
>>>>         (05:11:34 PM) dachary: The definition of a counter could
>>>>         probably be described as : the id of a resource and code to
>>>>         fill each column associated with it.
>>>>
>>>>         I tried to append the following, but the wiki kept failing.
>>>>
>>>>         Propose that the counters are defined by a function instead
>>>>         of being fixed. That helps addressing the issue of
>>>>         aggregating the bandwidth associated to a given IP into a
>>>>         single counter.
>>>>
>>>>         Alternate idea :
>>>>          * a counter is defined by
>>>>           * a name ( o1, n2, etc. ) that uniquely identifies the
>>>>         nature of the measure ( outbound internet transit, amount of
>>>>         RAM, etc. )
>>>>           * the component in which it can be found ( nova, swift etc.)
>>>>          * and by columns, each one is set with the result of
>>>>         aggregate(find(record),record) where
>>>>           * find() looks for the existing column as found by
>>>>         selecting with the unique key ( maybe the name and the
>>>>         resource id )
>>>>           * record is a detailed description of the metering event to
>>>>         be aggregated (
>>>>         http://wiki.openstack.org/SystemUsageData#compute.instance.exists:
>>>>         )
>>>>           * the aggregate() function returns the updated row. By
>>>>         default it just += the counter value with the old row
>>>>         returned by find()
>>>>
>>>>
>>>>     Would we want aggregation to occur within the database where we
>>>>     are collecting events, or should that move somewhere else?
>>>     I assume the events collected by the metering agents will all be
>>>     archived for auditing (or re-building the database)
>>>     http://wiki.openstack.org/EfficientMetering?action=diff&rev2=45&rev1=44
>>>     <http://wiki.openstack.org/EfficientMetering?action=diff&rev2=45&rev1=44>
>>>
>>>     Therefore the aggregation should occur when the database is
>>>     updated to account for a new event.
>>>
>>>     Does this make sense ? I may have misunderstood part of your question.
>>>
>>>
>>> I guess what I don't understand is why the aggregated data is written
>>> back to the metering database at all. If it's in the same database, it
>>> seems like it should be in a different "table" (or equivalent) so the
>>> original data is left alone.
>> In my view the events are not stored in a database, they are merely
>> appended to a log file. The database is built from the events with
>> aggregated data. I now understand that you (and Joshua Harlow) think
>> it's better to not aggregate the data and let the billing system do this
>> job.
> My intent when writing the blueprint was that each event would be
> recorded atomically in the database, as it is the only way to control
> that we have not missed any. Aggregation, should be done at the external
> API level if the request is to get the sum of a given counter.
>
> What I missed in the blueprint and seems to be appearing clearly now, is
> that an event need to be able to carry the "object-reference" for which
> it was collected, and this would seem highly necessary looking at the
> messages in this thread. A metering event would essentially be defined
> by (who, what, which) instead of a simple (who, what).  As a consequence
> we would need to extend the DB schema to add this [which/object
> reference], and make sure that we carry it as well when we will work on
> the message API format definition.
>
> How does this sound?
Hi,

I agree and I think it makes the blueprint simpler while addressing the concerned expressed in this thread. The database will have to store a lot more events and we will have to be careful to make sure it scales. I translated your suggestion in the blueprint:

http://wiki.openstack.org/EfficientMetering?action=diff&rev2=46&rev1=45

Feel free to fix the blueprint if I misrepresented it.

Cheers
>
> Nick
>
>>> Maybe it's time to start focusing these discussions on user stories?
>>>
>> I agree. Would you like to go first ?
>>
>> Cheers
>>
>> -- 
>> Loïc Dachary         Chief Research Officer
>> // eNovance labs   http://labs.enovance.com
>> // ? loic@xxxxxxxxxxxx  ? +33 1 49 70 99 82
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


-- 
Loïc Dachary         Chief Research Officer
// eNovance labs   http://labs.enovance.com
// ? loic@xxxxxxxxxxxx  ? +33 1 49 70 99 82


References