← Back to team overview

openstack team mailing list archive

[metering] resource metadata changes and billing

 

tl;dr: Ceilometer should ignore resource metadata when computing sums or
maximum values for counters through the API.

One of the things we discussed early during the design meetings was the
need to track metadata along with resources so providers could use the
metadata to determine the rate to charge for using the resource (for
example, flavor or availability group of an instance). While working on the
mongodb driver, I've been thinking about how that requirement changes what
the API we defined needs to do. At first I thought we would need to try to
return multiple values from the sum and max calls so the values could be
associated with the resource metadata and a given time range. I've decided
that implementing that inside the ceilometer API will be very complex, and
unlikely to produce the correct result. I want to work through my reasoning
here in case someone else can find a fault in it or propose a solution I
wasn't able to find.

First, the scenario: A user boots an instance, lets it run for some time
(period 1), then changes the metadata in some way that does not result in
the instance being recreated but does result in something the provider
would decide introduces a different charge structure. For example, the
amount of RAM allocated to the instance might be increased. After running
with the new settings for a period of time (period 2), the user changes
them back to their original value and the instance continues to run (period
3).

The specific change to the metadata doesn't matter (RAM is just an
example), except that the metadata change should not require an instance to
be recreated because when that happens the user actually gets a new
instance (at least I believe they do based on feedback during an early
meeting, please correct me if I'm wrong). Getting a new instance simplifies
things immensely, since the new instance is a completely new resource and
so we can ignore those cases for the rest of this discussion.

Another important point in the way the scenario is constructed is that the
metadata values go from value A to B and then back to their original value
A. That means any signature we calculate for the metadata will be the same
during periods 1 and 3.

Now we would like for v1/USERS/<USER_ID>/RESOURCES/<instance_id>/cpu/VOLUME
to return the amount of CPU time used by the instance. However, if that
time is billed at a different rate depending on the "size" of the server
then the RAM change will cause a difference in billing rates. We therefore
need to return a sequence of 3-tuples containing the total for the counter,
the resource metadata, and the time range during which the metadata was in
effect. There are two problems implementing this in a generic way in the
API server.

First, it turns out to be surprisingly difficult to write an efficient
query to compute that time range in the case described because it is not
easy to recognize the ranges for period 1 and period 3 as being interrupted
by period 2 (finding min or max for a value is easy, but finding the
endpoint of period 1 is not because the signature for the metadata is the
same for period 1 and 3).

Second, while calculating ranges is difficult in itself, what is even more
difficult is recognizing *important* changes in the metadata that actually
imply a change in the billing rate. The logic for that is up to the
deployer and their billing rules.

There are ways to compute the ranges by using multiple queries [1], and we
could create some sort of way for the query to specify which fields in the
metadata for a given type of resource are important. Both calculations
would be expensive to apply in the API server, though, and I think they can
be solved more efficiently on the client-side. If the client grabs all (or
a large portion) of the data and processes it sequentially, it is easy to
test the metadata fields to find changes and treat that condition as the
boundary between the time ranges.

My conclusion from all of this (over-)thinking is that the ceilometer API
should assume the simple case and ignore the metadata changes when
computing the sum or maximum value for a counter over a range of time. More
complex processing will be left up to the caller, who can ask for raw
metering data in manageable chunks and process them outside of the API. I
could be persuaded to do something more complicated if the problems
described above can be solved in a relatively simple way, but even then I
think we should push that to the v2 API.

Thoughts?

There-and-back-again-ly,
Doug


[1] Find the min time for a (resource, metadata) pair; find min time for
any different (resource, metadata) pair greater than the first time; find
max for original pair with timestamp less than second min; use the max as
starting point to determine the next range; loop.

Follow ups