openstack team mailing list archive

Thread
Date
Re: Instrumentation Monitoring Next Step - quick meet up

To: Sandy Walsh <sandy.walsh@xxxxxxxxxxxxx>, "'doug.hellmann@xxxxxxxxxxxxx'" <doug.hellmann@xxxxxxxxxxxxx>
From: Annie Cheng <anniec@xxxxxxxxxxxxx>
Date: Mon, 29 Oct 2012 19:09:59 -0700
Accept-language: en-US
Acceptlanguage: en-US
Cc: "'openstack@xxxxxxxxxxxxxxxxxxx'" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <60A3427EF882A54BA0A1971AE6EF03887708D5C7@ORD1EXD02.RACKSPACE.CORP>
Thread-index: Ac22Q68MQ+kCBHjnSW6JsxC1ABmYPA==
Thread-topic: [Openstack] Instrumentation Monitoring Next Step - quick meet up
User-agent: Microsoft-MacOutlook/14.14.0.111121
Meeting logs: http://wiki.openstack.org/InstrumentationMetricsMonitoring10292012
Also added as part of blueprint under 'meeting logs'
http://wiki.openstack.org/InstrumentationMetricsMonitoring

Thanks!

Annie

From: Sandy Walsh <sandy.walsh@xxxxxxxxxxxxx<mailto:sandy.walsh@xxxxxxxxxxxxx>>
Date: Mon, 29 Oct 2012 16:05:08 -0700
To: Annie Cheng <anniec@xxxxxxxxxxxxx<mailto:anniec@xxxxxxxxxxxxx>>, "'doug.hellmann@xxxxxxxxxxxxx<mailto:'doug.hellmann@xxxxxxxxxxxxx>'" <doug.hellmann@xxxxxxxxxxxxx<mailto:doug.hellmann@xxxxxxxxxxxxx>>
Cc: "'openstack@xxxxxxxxxxxxxxxxxxx<mailto:'openstack@xxxxxxxxxxxxxxxxxxx>'" <openstack@xxxxxxxxxxxxxxxxxxx<mailto:openstack@xxxxxxxxxxxxxxxxxxx>>
Subject: RE: [Openstack] Instrumentation Monitoring Next Step - quick meet up

Log

<harlowja> #startmeeting instrumentation
<amotoki> We can find the irc log at http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/.
<-- john5223 (john@nat/rackspace/x-xqgvxzmtkhhjcfce) has quit (Quit: Leaving)
<harlowja> bot no workie :(
<-- sasharatkovic (4281e024@gateway/web/freenode/ip.66.129.224.36<mailto:4281e024@gateway/web/freenode/ip.66.129.224.36>) has quit (Quit: Page closed)
<-- edgarmagana (ad24c407@gateway/web/freenode/ip.173.36.196.7<mailto:ad24c407@gateway/web/freenode/ip.173.36.196.7>) has quit (Quit: Page closed)
<asalkeld> hi
<harlowja> howdy
<nijaba> o/
<eglynn> o/
<-- mlavalle (~miguel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~miguel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has left #openstack-meeting
<jeffreyb> sandywalsh: hey
<sandywalsh> o/ I made it :)
<jeffreyb> cool
<anniec> ?yay sandy!
<timjr> :)
<jeffreyb> waiting for angus et al
<-- pamor (ad24c407@gateway/web/freenode/ip.173.36.196.7<mailto:ad24c407@gateway/web/freenode/ip.173.36.196.7>) has quit (Quit: Page closed)
<asalkeld> I am here
<dhellmann> o/
<jeffreyb> did you guys have a chance to look at the diagram?
<jeffreyb> fire away :-)
<-- nati_ueno_ (~nati_ueno@2001:418:200:95:11d8:f78e:935e:426b) has quit (Remote host closed the connection)
<asalkeld> yip, good start
<dhellmann> jeffreyb: diagram? (I guess that's a "no")
<-- markmcclain (~Adium@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~Adium@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has quit (Quit: Leaving.)
<eglynn> http://wiki.openstack.org/InstrumentationMetricsMonitoring?action=AttachFile&do=view&target=InstrumentationMonitoringSketch.png
<jeffreyb> on the spec from last week
<timjr> http://wiki.openstack.org/InstrumentationMetricsMonitoring?action=AttachFile&do=view&target=InstrumentationMonitoringSketch.png
<timjr> #link http://wiki.openstack.org/InstrumentationMetricsMonitoring?action=AttachFile&do=view&target=InstrumentationMonitoringSketch.png
<timjr> is that meetbot even here?
<-- zyluo (~zyluo@180.174.52.223<mailto:~zyluo@180.174.52.223>) has quit (Quit: Leaving)
<-- salv-orlando (~salv-orla@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~salv-orla@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has left #openstack-meeting
<asalkeld> registry ~= config?
<jeffreyb> yeah, kinda tough to walk thru on irc, but captures our thinking
<harlowja> sure, approx ==
<harlowja> similar to logging config for python
<harlowja> 'similar'
<jeffreyb> think of it as a metric root
<-- sachint_ (4834601d@gateway/web/freenode/ip.72.52.96.29<mailto:4834601d@gateway/web/freenode/ip.72.52.96.29>) has quit (Quit: Page closed)
<eglynn> I guess I had a question on whether the 'metric driver' plus handlers kinda subsumes weher ceilometer sits today?
<eglynn> s/weher/where/
<-- jaypipes (~jpipes@xxxxxxxxxxxxxxxxxxx<mailto:~jpipes@xxxxxxxxxxxxxxxxxxx>) has quit (Quit: Leaving)
<dhellmann> timjr: I think the previous meeting folks said the bot was down
<jeffreyb> not sure…thinking there are bits in common
<asalkeld> I think everything including ceilometer-agent should use the same lib
<jeffreyb> so could perhaps drive ceilometer too
<sandywalsh> hmm, diagram is a little confusing to me
<-- roampune (ad24c409@gateway/web/freenode/ip.173.36.196.9<mailto:ad24c409@gateway/web/freenode/ip.173.36.196.9>) has left #openstack-meeting
<jeffreyb> in what way?
<jeffreyb> you are welcome to enhance. i put the sources on the wiki.
<asalkeld> so registry.new_data(...)
<jeffreyb> the core part is having the measurement bits that can then be used to drive metrics flow to file|datagram|ceilometer?
<asalkeld> goes to find the handler
<asalkeld> yea
<asalkeld> so one question is the difference in required fields
--> mattray (~Opscode@pdpc/supporter/21for7/mrayzenoss) has joined #openstack-meeting
<asalkeld> so metering needs more info
<asalkeld> and trace potentially much less
<-- ijw11 (~ijw@nat/cisco/x-lhdnscxcyhfjkjyz) has left #openstack-meeting
<jeffreyb> well, we were thinking about having scoping rules for what's active as well as metric levels (like log levels: billing, monitor, profile)
<sandywalsh> not really clear on what the metric layer is meant to convey, how that works with the monkeypatching/decorator layer and what the handlers/drivers are (as compared to notifier drivers)
<harlowja>  /reload
<harlowja> oops
<harlowja> wrong place
<eglynn> or another approach would be for ceilometer to provide the different handlers/emitters/publishers, i.e. a common infrastcuture for routing/transforming these data
<jeffreyb> the metric core could be used via m.patch or via decoraters
<eglynn> (regardless of the source)
<sandywalsh> basically that whole middle block is confusing
<-- ejkern (63a2946d@gateway/web/freenode/ip.99.162.148.109<mailto:63a2946d@gateway/web/freenode/ip.99.162.148.109>) has quit (Ping timeout: 245 seconds)
<jeffreyb> sandy: what's the proposal from you?
<-- harlowja (~harlowja@nat/yahoo/x-npcxbpakmszfmdce) has quit (Quit: I'm popping this joint!)
<jeffreyb> others; is it also confusing to you?
--> harlowja (~harlowja@nat/yahoo/x-fwjzzopkarfcwbxt) has joined #openstack-meeting
<-- vbannai (~vinay@209.181.200.146<mailto:~vinay@209.181.200.146>) has quit (Quit: Ex-Chat)
<jeffreyb> i can perhaps try to create an annotated guide to the diagram
<timjr> I think the notion is that you can use the metrics layer by monkeypatching if you want, or with a decorator, or with explicit function calls
<dhellmann> it's not clear if the boxes inside nova-* are "logical" groupings or just where code lives at runtime
<sandywalsh> well, we're talking about two different things: events (for billing and monitoring) and instrumentation (for performance)
<jeffreyb> yes, so that is meant to convey a lib utilized in various daemons
<dhellmann> sandywalsh: they are two different things, but we're trying to explore whether we can share code for handling the data at different levels
<eglynn> sandywalsh: sure but still some commonality I suspect
<jeffreyb> i didn't have time to put in the interface boundaries
<dhellmann> for example, the "metric driver" box and all of the drivers could be shared with the ceilometer agent, so that both the agent and instrumented services could send data to the same places using the same code
<sandywalsh> well, the motivation for instrumentation is very different than the motivation for monitoring/billing
<jeffreyb> yeah, i think some are definitely common/shared
<timjr> we were thinking that you could view metrics as a superset of billing data... so we introduced the notion of levels --  meter, monitor, profile being like error, info, trace in logging
<dhellmann> or even send data to different places with the same code
<jeffreyb> there is quite an overlap between monitoring and instrumentation
<jeffreyb> however, you might do different things with the data in the end
<dhellmann> right
<harlowja> right
<asalkeld> agree
<eglynn> sandywalsh: true that, but some of the mechanics of getting these data from A to B still common, or?
<sandywalsh> instrumentation is a much higher sampling rate than monitoring
<timjr> the point in the code where you emit the data should have no clue about where it's going
<nijaba> we did an excellent table of requirements per use cases at last summit, fwiw
<sandywalsh> and instrumentation can afford to drop some data
<jeffreyb> it is a much higher sampling rate
<harlowja> sandywalsh: isn't that more about what u do with the data, not how its produced?
<jeffreyb> but monitoring can afford to drop data too
<asalkeld> sandywalsh, agree but the notifier can do that
<dhellmann> sandywalsh: and that would be configured in the publisher
<eglynn> sandywalsh: that's a good point, on the droppability
<eglynn> sandywalsh: only metering really requires completeness
<sandywalsh> well, not really
<sandywalsh> monitoring needs it just as much
--> tgall_foo (~tgall@70.35.96.184<mailto:~tgall@70.35.96.184>) has joined #openstack-meeting
<harlowja> so thats making sure the publishing mechansim for '?billing' is using the NonDroppableHandler
<-- tgall_foo (~tgall@70.35.96.184<mailto:~tgall@70.35.96.184>) has quit (Changing host)
--> tgall_foo (~tgall@linaro/tgall-foo) has joined #openstack-meeting
<timjr> so, you would have a different handler for metric-level instrumentation, which uses, e.g., UDP, and can drop some
<jeffreyb> instrumentation is typically going to be used post-event, so it isn't even clear to me that you want it to be propogated out to dashboards/tools in real-time
<sandywalsh> we need to have a valid picture of state ... (for orchestration, etc)
<eglynn> for monitoring, once the data gets old, value drops off rapidly
<jeffreyb> eglynn: very much so
<eglynn> (might as well drop it on the floor if queues backed up etc.)
<jeffreyb> you don't want to lose monitoring data for sure, but if you have to you do
<timjr> uhh, no
<timjr> you want it most when your queues are backing up
<timjr> it's for debugging that kind of thing
<sandywalsh> it depends ... monitoring has equal importance to billing, we need all the events to get a complete picture
<eglynn> you want the most recent
<jeffreyb> yes, but you'd rather deliver high priority c&c messages
<eglynn> the hour-old stuff is already old news
<sandywalsh> instrumentation is trending data
<eglynn> (can maybe sample that)
<timjr> sandywalsh: could you elaborate on that?  trending?
<-> sachint__ is now known as sthakkar
<eglynn> in the twitter sense?
<eglynn> (of trending ...)
<sandywalsh> instrumentation is things like: number of calls / second, average execution time, etc
<timjr> sure
<timjr> so, if the call latency to rabbitmq shoots up, then we know where to look for trouble
<sandywalsh> monitoring / billing requires a sole consumer and guarantee of hand off
<timjr> definitely
<timjr> but they can use the same emission hooks in the code
<jeffreyb> delivery sla is separate from measuring
<timjr> they just need different handlers (in the logging analogy)
<asalkeld> yea
<-> sthakkar is now known as sachinthakkar
<eglynn> billing yes, but if you have to choose between more and less recent monitoring data, then the older stuff gets sampled or dropped
<sandywalsh> timjr: possibly, my concern is that instrumentation code could be anywhere in the code (depending on the developer)
<eglynn> (in extremis ...)
<timjr> sure
-*- dhellmann thinks we're quibbling over implementation details a little early
<harlowja> :)
<jeffreyb> sandy: yes, that's what we hope for — instrumentation everywhere
<timjr> so, if the level is "meter" (or "billing", if you will), you gotta be careful where it is
<timjr> cuz somebody could end up paying if you move it!
<sandywalsh> yes, monitoring / billing require clear anchor points
<anniec> very similar to logging concept .. where there are different levels
<timjr> it's important for that kind of call to be explicit in the code, IMHO
<anniec> you turn on what you need
<sandywalsh> hmm
<sandywalsh> not sure about that point anniec
<jeffreyb> so what are the q's we are trying to answer? 1) is the source of measurement the same for billing|monitoring|instrumentation
<jeffreyb> 2) the how?
<dhellmann> are *any* sources of measurement the same?
<jeffreyb> i don't think we are agreed on #1
<sandywalsh> jeffreyb: I'm trying to get on the same page for terminology / requirements so we can talk about about implementation
<timjr> dhellmann: sure.  I'm more on the fence over whether logging and metrics should be the same.  Signs point to "no"
<dhellmann> and if the sources aren't the same, is there any benefit in sharing the delivery code?
<-- sachinthakkar (4834601d@gateway/web/freenode/ip.72.52.96.29<mailto:4834601d@gateway/web/freenode/ip.72.52.96.29>) has quit (Quit: Page closed)
<jeffreyb> doug: seems like there is definitely overlap with monitoring for some billing stuff
<sandywalsh> the diagram hints heavily at implementation
<eglynn> 1) the source can be different, but there can be common infrastructure for "publication"
<sandywalsh> I think billing and monitoring are the same
--> sthakkar (4834601d@gateway/web/freenode/ip.72.52.96.29<mailto:4834601d@gateway/web/freenode/ip.72.52.96.29>) has joined #openstack-meeting
<sandywalsh> I think instrumentation is a different animal
<jeffreyb> sandywalsh: you are free to diagram yourself!
<timjr> well, we are threatening to implement it...
<harlowja> lol
<eglynn> sandywalsh: disagree, timeliness versus completeness
<sandywalsh> jeffreyb: I will
<jeffreyb> the point here is we should all present some views and see where we can agree or agree to disagree
<jeffreyb> sandywalsh: great
<asalkeld> well I think if the code is in one spot it is easier for devs to see how to do either
<sandywalsh> (reading the scroll back, hard to keep up)
<jeffreyb> sandywalsh: i put the vdx and graffle in the wiki if you want to re-use any of those bits
<sandywalsh> jeffreyb: that's ok, I'll do a wiki page, but thanks
<nijaba> jeffreyb: link?
<jeffreyb> nijaba: http://wiki.openstack.org/InstrumentationMetricsMonitoring
<sandywalsh> as I've stated before, I've got concerns about putting instrumentation hooks in trunk
<sandywalsh> (permanent)
<sandywalsh> since the needs are so diverse
<jeffreyb> sandywalsh: what if we can make it non-intrusive/cheap?
<harlowja> just a perspective, facebook, yahoo, google code, instrumentation is in trunk
<jeffreyb> sandywalsh: is your concern performance or code cleanliness?
<timjr> sandywalsh: there's no harm in extra instrumentation -- you just don't configure a handler if you don't want it
<sandywalsh> jeffreyb: I'm all ears, the tach approach I think is best
<timjr> it's like debug-level logging
<sandywalsh> timjr: not really true, there are costs
<timjr> you just turn it off if you're ignoring it
<asalkeld> sandywalsh, not many other companies tolerate monkey patching
<timjr> function calls are not free, it's true
<sandywalsh> driver / library/ loaders / config
<dhellmann> timjr: even less expensive, since the decorator can evaluate the "on/off" flag once at startup
<jeffreyb> sandywalsh: but is tach for sort of one-off profiling or continuous use?
<sandywalsh> jeffreyb: both, we use it permanently and for one-offs
--> metral_ (~metral@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~metral@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has joined #openstack-meeting
<harlowja> dhellmann: really, nice
<jeffreyb> sandywalsh: so would you plan to monkey patch more?
<sandywalsh> jeffreyb: for instrumentation, yes
<sandywalsh> the decorators in trunk are already causing problems
<timjr> there's a place for that -- you can be much more pervasive if you monkey patch stuff
<timjr> nobody would want instrumentation calls on every second line of code
<jeffreyb> sandywalsh: what about real-time monitoring of things like resources/pools/etc? is that instrumentation?
<dhellmann> having monkeypatching as an option will provide good flexibility, but I don't think that should preclude wiring in instrumentation
<sandywalsh> the big issue with decorators are how they interact with exceptions
<timjr> but I would not want important events of measurements to be implicit
--> markwash (~markw@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~markw@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has joined #openstack-meeting
<timjr> events /or/ measurements, even
<sandywalsh> events I totally agree should be hard coded
<harlowja> sandywalsh: agreed, the decorartors right now are sorta hacky orchestration, hacky eventlet exception catching...
<nijaba> timjr: +1
<dhellmann> sandywalsh: what does a decorator do that tach doesn't do?
<jeffreyb> dhellmann: i was thinking the base metrics/measurement thingies could be used in both ways: decorators and monkey patching — then transmission conduit is shared
<sandywalsh> harlowja: +1
<dhellmann> jeffreyb: agreed
<sandywalsh> dhellmann: tach doesn't affect trunk and it's always the outer wrapper
<dhellmann> sandywalsh: does being the outer wrapper help with exceptions?
<jeffreyb> sandywalsh: but won't a bunch of us end up patching outside of trunk for the same sorts of needs?
<sandywalsh> so, let's consider a developer that wants to instrument a part of nova (let's say networking)
<-- metral (~metral@50.57.17.244<mailto:~metral@50.57.17.244>) has quit (Read error: Operation timed out)
<-> metral_ is now known as metral
<sandywalsh> they either have to create a disposable branch or submit to trunk
<jeffreyb> seems like over time we will keep wanting to go deeper and deeper on what we measure all the time so that we can more easily characterize system behavior
<sandywalsh> neither of which is really attractive
<jeffreyb> and analyze run time faults
<sandywalsh> jeffreyb: yes
<harlowja> why is submission to trunk bad :-/
<eglynn> would such a patch be carried long-term?
<sandywalsh> for instrumentation it would mean putting hooks everywhere
<eglynn> (or just as long as it takes to track down the bottleneck)
<sandywalsh> for monitoring/billing it's fine
<jeffreyb> i would expect the instrumentation would grow over the long term in trunk
<timjr> we should have performance-level instrumentation on trunk, if possible, because we should have a CI gate that looks for performance regressions, eventually
<sandywalsh> jeffreyb: that's the problem
<asalkeld> sandywalsh, where it adds generic value
<eglynn> i.e. is much ultra-fine-grained profiling inherently disposable?
<sandywalsh> timjr: disagree
-*- timjr raises an eyebrow
<jeffreyb> sandywalsh: i see that as a good thing
<timjr> sandywalsh: I don't see your POV, I guess... could you elaborate?
<jeffreyb> the challenge for us is to make it so that the impact is negligible at run-time
<jeffreyb> and so folks can switch it off who don't want or need it
<sandywalsh> we can still gate CI efforts via monitoring events or MP'ed installs
<jeffreyb> the monkey patch approach is fine for some situations but that would get a bit hard across large sets of fns
<timjr> sandywalsh: granted, that would work as well
<sandywalsh> and each party can instrument what is important to them, not what "is decreed" to be the important spots
<timjr> but, as developer, I want to know which events my code changes might affect
<timjr> so it is better if they are present on the code I'm changing
<jeffreyb> sandywalsh: well, that's where we were wanting to put in some scoping controls and level controls
<asalkeld> well how about we support in-code trace and monkey patching
<timjr> we definitely are going to want to decree some important spots :)
<sandywalsh> jeffreyb: it all sounds terribly heavy weight to me for something that is so transient
<eglynn> so can we distinguish between instrumentation that has a long-term use (coarse-grained timings, fault counts etc.) and tactical stuff that is only of interest to solve a particular problem?
<asalkeld> and can let ptl's decide where they want the trace
<jeffreyb> we think this level of data is gold
<dhellmann> yes, I don't think we're going to get everyone to agree to one or the other for MP, so this feels like a rabbit-warren of a discussion
<anniec> asalkeld: +1
<sandywalsh> there is some low-hanging instrumentation fruit, like we do today at the rpc layer
<asalkeld> the question: should we make such a lib
<dhellmann> if we agree that there is *something* instrumenting for monitoring, what can we do with the results and how much can we share them?
<timjr> rabbit-warren?  are we... breeding?
<asalkeld> and what should it look like
<sandywalsh> but simply because they exist I still don't believe that code belongs in trunk
<jeffreyb> asalkeld: i think the answer is yes
<dhellmann> timjr: we're lost in the dark? :-)
<timjr> oh
<timjr> and fuzzy :)
<harlowja> and cuddly
<sandywalsh> dhellmann: yes, once we grab the data, we can use common code to process it
-*- timjr scoots a little further from harlowja
-*- harlowja scoots closer
<sandywalsh> I'm arguing for this low-level distinction
<dhellmann> sandywalsh: cool. let's talk about that, then
<timjr> so, let's take a concrete example
<jeffreyb> i am guessing there are particular tastes in processing that may not be common
<timjr> if I want to know database latency, for example
<jeffreyb> e.g. we might collect loads of data and give it to a researcher on a hadoop grid to dream up interesting things
<timjr> is there any reason not to put a line of code that says "this is the extent that defines database latency"?
<sandywalsh> timjr: there are many ways to skin that cat without having to affect trunk
<timjr> if that information was in a config file or a patch, it would be less robust in the face of later code changes
<sandywalsh> so, for example, consider how cells work
<timjr> isn't that a ... federation layer?
<sandywalsh> cells have a new derivation for compute.api that redirects calls to other (child) cells
<timjr> I'm not sure there's good consensus on the value of federation vs. monolithic scaling
<sandywalsh> in the normal situation, it all works as a single cell, but by changing the --foo_driver flag, it works with cells
<sandywalsh> we can do the same for --db_api and other larger subsystems
<sandywalsh> this doesn't need to be a permanent part of trunk
<jeffreyb> just seems like not everything can or should be measured/gated by 1) patching, 2) queue rpc
<sandywalsh> but instead can be a "extra" part of ceilometer
<timjr> it doesn't need to be, but I'd certainly prefer to see a pretty good complement of metrics coming from trunk
<jeffreyb> if instrumentation data is lower priority, then it shouldn't go through the same channels of deliver as billing and control messages
<harlowja> a library like others in java that might be a good talking poiint as well, https://github.com/johnewart/ruby-metrics, the concepts there seem useful to others in other languages, as a library u could use it in your monkey patchers, u could use it in ceilometer, and so on, something like that for openstack/python would seem like the right way to go, as to how much gets into trunk, or how much doesn't, that can be up to the code
<harlowja> reviewers and others, dhellmann should ceilometer be the point of the that library?
<sandywalsh> timjr: you can still do that by having a config file that hits the major / agreed-upon points
<sandywalsh> timjr: it doesn't need to clutter up trunk
<jeffreyb> harlowja: rules violation - too much text
<harlowja> :)
<timjr> sandywalsh: but how would you connect the config file to the code?
<sandywalsh> timjr: tach, today, has all the hooks for nova rpc and some other areas
<sandywalsh> (for instrumentation, not monitoring/billing)
<harlowja> sandywalsh: how about we seperate out the cluttering up trunk with doing it or not, that seems to be the code reviewers accepting it or not, but is the common concept that something in celiometer or a library should be created to aid in whatever the final result is?
<dhellmann> harlowja: ceilometer was accepted as an incubated project for measuring things in an openstack cloud. I think that means yes, the lib should be part of the project. That's not to say where the code actually lives long term.
<jeffreyb> sandywalsh: what about monitoring something like measure eventlet resources e.g. ?
<sandywalsh> dhellmann: I don't really care where it lives (ceilometer or another project), but it doesn't have to live in nova trunk
<jeffreyb> dhellman: do the things i had put on the spec/etherpad fit with the types of things you had in mind related to ceilometer?
<harlowja> sure
<dhellmann> harlowja: maybe some of it goes into oslo, or we release it as a stand-alone lib ourselves but managed by the ceilometer project
<harlowja> ya, that might be useful
<-- tgall_foo (~tgall@linaro/tgall-foo) has quit (Quit: This computer has gone to sleep)
<sandywalsh> jeffreyb: still works fine
<dhellmann> jeffreyb: could you post that link again? I looked at it, but it's been a few days
<jeffreyb> sandywalsh: so would that be put into irc?
<jeffreyb> http://wiki.openstack.org/InstrumentationMetricsMonitoring — see measurements section
<jeffreyb> er, sandy, i meant rpc
<jeffreyb> sorry
<sandywalsh> :)
<dhellmann> jeffreyb: oh, I thought there was an etherpad
<sandywalsh> I was wondering :)
<jeffreyb> dhellman: i copied most of the bits out of the etherpad to there
<sandywalsh> jeffreyb: it doesn't have to, but my inflight project was using that approach
<dhellmann> jeffreyb: sneaky
<sandywalsh> jeffreyb: I'm always up for suggestions on better ways
<jeffreyb> sandywalsh: aren't you worried about the transport implications?
<sandywalsh> jeffreyb: it was using the same eventlet backdoor to count greenthreads
<sandywalsh> jeffreyb: that's the whole point, to get a realistic measurement
<jeffreyb> ah yes, you mentioned that but i didn't have a chance to look at it
<dhellmann> jeffreyb: I hadn't thought of a comprehensive list. The design goal I've always taken with ceilometer was make it extensible so we don't have to think of everything to measure ourselves.
<sandywalsh> and it's low bandwidth/frequency
<dhellmann> this looks like a good list of things to be measuring
<sandywalsh> so ... care for another topic?
--> henrynash (~henrynash@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~henrynash@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has joined #openstack-meeting
<harlowja> a library in or out of ceilmoeter, perhaps like https://github.com/johnewart/ruby-metrics (or the design we put up which is similar), then that gets used by celiometer code in nova/elsewhere (thus unifying that into using the library created), then if that library gets to big, or has different stuff that is conflicting with ceilometer, it gets split off into some 'metrics' library that can either be used in monkey-patching, or ca
<harlowja> accepted by code reviewers into nova code (or other code) as the project reveiwers say yes/no to annotations/... metrics code additions...
<sandywalsh> the workers that consume from the queue?
<-- zhuadl (~chatzilla@114.246.88.37<mailto:~chatzilla@114.246.88.37>) has quit (Ping timeout: 252 seconds)
<jeffreyb> dhellman: when there is something to be measured in process, was the idea to send it to ceilometer agent?
<jeffreyb> dhellman: is there a core set of measurement objects?
<dhellmann> jeffreyb: not necessarily, unless it's something you want to bill for (API calls?)
<sandywalsh> harlowja: I think that sounds reasonable
<jeffreyb> dhellman: i see. definitely most of those are not of that nature.
<sandywalsh> harlowja: (except for the decorators for instrumentation :)
<harlowja> sure, sure, impl detail
<sandywalsh> :)
<dhellmann> jeffreyb: the Counter class is probably the closest we come, but we don't have a separate class representing the measurement of each meter. We do have a class that *produces* the measurement, but they are all represented by a common object at this point.
<dhellmann> jeffreyb: yes, that's clear
<dhellmann> jeffreyb: and it makes sense for that to be the case
<sandywalsh> so ... the ceilometer worker ... it seems like overkill ... can we go with something lighter weight like the StackTach worker?
<-- sthakkar (4834601d@gateway/web/freenode/ip.72.52.96.29<mailto:4834601d@gateway/web/freenode/ip.72.52.96.29>) has quit (Quit: Page closed)
<sandywalsh> and it's using the nova rpc code in the wrong way (imho)
<sandywalsh> since events aren't rpc events
<jeffreyb> dhellman: so could the counter stuff be folded in with the extra metrics gauges?
<dhellmann> sandywalsh: the rpc issues are well documented
<sandywalsh>  s/rpc methods
<dhellmann> sandywalsh: StackTach uses YAGI, right?
<sandywalsh> dhellmann: no, it has it's own lightweight worker
<-- mattray (~Opscode@pdpc/supporter/21for7/mrayzenoss) has quit (Quit: Leaving.)
<sandywalsh> I'd like to see a YAGI-like thing being the common layer though
<dhellmann> sandywalsh: ah, ok
<dhellmann> the issue with YAGI is it doesn't (AFAICT) address duplicate events, which we *definitely* don't want for billing
<harlowja> impl detail, the ceilometer worker is a metric/billing 'sink' right, not the only metric/billing 'sink' i would hope
<dhellmann> harlowja: yes "a" not "the"
<sandywalsh> dhellmann: we're using YAGI for billing today, no problems
<dhellmann> sandywalsh: what happens if a worker dies while processing some events? how do you avoid reprocessing them when it restarts?
<timjr> I think our perspective is we don't care about the transport as long as it's pluggable
<jeffreyb> timjr: yes
<dhellmann> timjr: +1
<sandywalsh> harlowja: well, that's the tricky part. Honestly I think the rabbit_queue_list flag is bad ... these queues are huge and the events big.
<sandywalsh> we want to publish notifications less not more
<nijaba> we need a reliable transport
<harlowja> some need :)
<sandywalsh> nijaba: certainly
<jeffreyb> uh, reliable transport for some, but not for all uses
<nijaba> harlowja: exactly
<sandywalsh> for monitoring/billing it has to be reliable
<timjr> I would say billing wants a reliable transport, and fine-granularity performance info get by with UDP
<sandywalsh> for instrumentation ... meh
<nijaba> yes, the we was about the metering bit
<sandywalsh> agreed
<nijaba> so transport has to be pluggable, period....
<sandywalsh> so, I think it would be nice to see a lean-mean worker as a common piece of code
<timjr> nod
<harlowja> lean-mean, nice
<nijaba> sandywalsh: certainly the point of this
<jeffreyb> sandywalsh: are you talking about for dequeuing of messages or sending or both?
<sandywalsh> we've gone through a million variations on this over the last few months. Oddly enough there are only a few combinations that work reliable at scale
<anniec> hi all, we have 10 min left for the meeting.. do people feel we have good understanding of direction we want to go to?  or should we call a G+ Hangout for next meeting to explain more?
<sandywalsh> jeffreyb: I'm mostly concerned about consuming the events from rabbit and "doing something with them"
<jeffreyb> anniec: seriously?
<sandywalsh> haha :)
<harlowja> sandywalsh: what variations have u hit, just out of curosity, MQ stuff? others?
<timjr> um, well, I'm going to try to come up with some concrete examples of the client code and config
<anniec> originally, Angus wanted the meeting to see who can do what ..
<timjr> so we can hash that out in the next meeting or whatever
<jeffreyb> sandywalsh: i feel it is a bad idea to put instrumentation events in rabbit
<sandywalsh> carrot, kombu, various amqp libraries under the hood. Frequent memory/locking issues
<harlowja> sandywalsh: thats just a different selection of where to write and where the agents read from, no?
<anniec> so just want to bring back to the original intent
<harlowja> sandywalsh: don't use amqp?
<harlowja> :)
<sandywalsh> jeffreyb: instrumentation should *not* go in rabbit
<anniec> if there are still open questions that blocks us from moving forward, we should find a way to move forward
<timjr> I think we got consensus on pluggable transport
<sandywalsh> I'll work on a proposal wiki page if that would help?
<asalkeld> sure
<timjr> there is some dissent around explicit code vs. monkeypatching
<sandywalsh> you all can bend/spindle/mutilate as desired :)
<timjr> I will not that the two are not mutually exclusive
<timjr> note, even
<harlowja> let it be though, explicit code vs monkeypatching, whatever the reviewers that accept the code prefer right?
<asalkeld> I think we are more in the agreement than not
<sandywalsh> timjr: that's my big bugaboo, but again, it depends on the purpose of it
<harlowja> ya for agreement, high five
<timjr> I don't think we would rule out monkeypatching
<jeffreyb> timjr: no, or decorators either
<timjr> it's a nice escape hatch for monitoring that people don't want to clean up and contribute back for whatever reason
<sandywalsh> harlowja: I wish it was that easy ... often time code gets approved without the right consideration
<anniec> ok .. so tim is going to come up with example of client code and config for next discussion
<anniec> who else is doing what?
<harlowja> sandywalsh: agreed, lol
<anniec> sorry .. i am a manager type .. at the end of meeting, needs to have a Action Item list :P
<sandywalsh> I would suggest you look at the StackTach worker
<sandywalsh> I just updated the library on Friday
<jeffreyb> anniec: banned
<harlowja> anniec: nice try, hjaha
<sandywalsh> and Stacky is up
<timjr> sandywalsh: I'll be sure to give stacktach a good read before the next meeting
<timjr> I regret not having done so prior to this one
<sandywalsh> and I have a video of both that I'm waiting on approach to sent the ML
<sandywalsh> an install guide and howto
<harlowja> video, lol
<harlowja> niceeee
<sandywalsh> s/approach/approval/
<harlowja> hot stacktach action
<timjr> lol
<sandywalsh> yep ... funky screencast
<asalkeld> can we find a place to put library code?
<harlowja> ceilometer subdir
<harlowja> ?
<eglynn> ceilo?
<nijaba> sure
<sandywalsh> stacktach/stacky is living in github/rackspace currently
<-- dolphm (~dolphm@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~dolphm@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has quit (Remote host closed the connection)
<asalkeld> mmm
<harlowja> mmmm == good :-p
<asalkeld> so how does say nova depend on this
<harlowja> that is the question :)
<asalkeld> wouldn't we need something like python-ceilometer
<harlowja> but maybe to early to decide that (when that happens rip out as library?)
<timjr> the logging client lib can probably be generic enough to be a pip for all to use
<timjr> the calls into it will have to be added to nova's source code
<dhellmann> maybe we need a new repo managed by the ceilometer project but allowing us to package the library separately for consumption by other projects
<timjr> (along with the other components)
<harlowja> new stackforge 'metric' project?
<sandywalsh> well, I think that's the next bun-fight ... how to package all these pieces for menu-like selection of components
<jeffreyb> let's mock up some code for sharing maybe location isn't quite so importatnt just now?
<nijaba> dhellmann: +1
<asalkeld> dhellmann, sounds good
<sandywalsh> dhellmann: yep
<harlowja> jeffreyb: +1
<jeffreyb> dhellman: sounds good
<eglynn> sounds like agreement
<asalkeld> woot
<sandywalsh> "I want the ceilometer worker + stacktach + the RAX billing module"
<dhellmann> jeffreyb: indeed, if we create a github repo with the library we can always move it under openstack later
<sandywalsh> how to get that ^
<harlowja> magic :)
<sandywalsh> heh
<harlowja> dhellmann: so a new stackforge repo?
<timjr> I think anvil could easily be configured to pull those components
<jeffreyb> who else is using stacktach so far or had a chance to explore it?
<harlowja> timjr: ya anvil
<harlowja> woot
<harlowja> :-p
<sandywalsh> let's agree on the cut-points first before we start slicing/dicing the project I think
<sandywalsh> jeffreyb: we've been using it internally ... i only just made v2 public
<dhellmann> harlowja: we'll be moving ceilometer off of stackforge to "openstack" soon
<sandywalsh> jeffreyb: but I know there's a bunch of folks that were using the old v1
<harlowja> kk, so a folder in there? but thats not a sep repo
<timjr> ok, well, that was fun :)  my first openstack meeting, actually
<sandywalsh> :)
<timjr> next time, I hope the bot's around
<dhellmann> harlowja: we said we would just create a new github repo for now. it's easy to move it later.
<nijaba> dhellmann: just learned that this would have to wait for a planed maintenance period: gerrit needs to be restarted
<harlowja> wfm
<harlowja> restart the gerrits
<harlowja> dhellmann: github repo on stackforge or elsewhere?
<sandywalsh> someone want to copy-paste this session?
<dhellmann> nijaba: do you have a schedule for that?
<nijaba> dhellmann: not yet.  "soon"
<dhellmann> nijaba: ok
<dhellmann> harlowja: on something that one of us can control without talking to the infra guys
<harlowja> k
<harlowja> harlowja: i can make one in the yahoo org
<-- gyee (~gyee@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:~gyee@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) has quit (Quit: Leaving)
<sandywalsh> someone want to copy-paste this session? So we can put on the wiki?
<timjr> nothing around to log it, eh?
<anniec> i can copy and paste and email it
<anniec> i can't find the log today
<dhellmann> anniec: maybe put the file online somewhere and email the link?
<jeffreyb> attach to the wiki
<sandywalsh> #stop-meeting
<sandywalsh> #stopmeeting
<sandywalsh> #startmeeting
<dhellmann> sandywalsh: the bot is broken
<sandywalsh> yep
<anniec> wait .. i don't have full log
<sandywalsh> sure is
<anniec> i only have it up to 3:24
<anniec> my buffer is not big enough for you talkies :P
<anniec> jeffreyb is trying to copy .. but it may crash
References

Re: Instrumentation Monitoring Next Step - quick meet up
From: Sandy Walsh, 2012-10-29