← Back to team overview

graphite-dev team mailing list archive

[Question #187466]: Upgrade from 0.9.8 to 0.9.9

 

New question #187466 on Graphite:
https://answers.launchpad.net/graphite/+question/187466

I am having a number of problems trying to upgrade from 0.9.8 to 0.9.9.

I am using diamond to send my metrics in to graphite and that hasn't changed.  They were working prior to the upgrade.

I have it configured like so...
1. Each diamond server sends to a central relay in each datacenter.
2. The relay sends the metrics along to a cache server in the main datacenter.

When I start the relay I can see diamond connecting ( pickle receiver port 2014 ).  

Starting carbon-relay (instance a)
11/02/2012 00:39:50 :: [console] Log opened.
11/02/2012 00:39:50 :: [console] twistd 11.0.0 (/usr/local/rnt/bin/python 2.6.4) starting up.
11/02/2012 00:39:50 :: [console] reactor class: twisted.internet.epollreactor.EPollReactor.
11/02/2012 00:39:50 :: [console] twisted.internet.protocol.ServerFactory starting on 2013
11/02/2012 00:39:50 :: [console] Starting factory <twisted.internet.protocol.ServerFactory instance at 0x906912c>
11/02/2012 00:39:50 :: [console] twisted.internet.protocol.ServerFactory starting on 2014
11/02/2012 00:39:50 :: [console] Starting factory <twisted.internet.protocol.ServerFactory instance at 0x9069a8c>
11/02/2012 00:39:50 :: [console] Starting factory CarbonClientFactory(10.60.31.84:2004:None)
11/02/2012 00:39:50 :: [clients] CarbonClientFactory(10.60.31.84:2004:None)::startedConnecting (10.60.31.84:2004)
11/02/2012 00:39:50 :: [clients] CarbonClientProtocol(10.60.31.84:2004:None)::connectionMade
11/02/2012 00:39:53 :: [listener] MetricPickleReceiver connection with 10.60.31.4:58176 established
11/02/2012 00:40:06 :: [listener] MetricPickleReceiver connection with 10.60.35.10:33553 established
11/02/2012 00:40:09 :: [listener] MetricPickleReceiver connection with 10.60.36.11:50424 established
11/02/2012 00:40:10 :: [listener] MetricPickleReceiver connection with 10.60.35.33:48027 established
11/02/2012 00:40:35 :: [listener] MetricPickleReceiver connection with 10.60.31.84:39403 established
11/02/2012 00:40:36 :: [listener] MetricPickleReceiver connection with 10.60.35.59:41129 established
11/02/2012 00:40:50 :: [console] Unhandled error in Deferred:
11/02/2012 00:40:50 :: [console] Unhandled Error
Traceback (most recent call last):
  File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py", line 1162, in run
    self.mainLoop()
  File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py", line 1171, in mainLoop
    self.runUntilCurrent()
  File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py", line 793, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/task.py", line 194, in __call__
    d = defer.maybeDeferred(self.f, *self.a, **self.kw)
--- <exception caught here> ---
  File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/rnt/lib/python2.6/site-packages/carbon/instrumentation.py", line 104, in recordMetrics
    record('metricsReceived', myStats.get('metricsReceived', 0))
exceptions.UnboundLocalError: local variable 'record' referenced before assignment

Also I keep getting the above error which looks like it is coming from the section of code where it is reporting internal cache metrics, although it looks like it only happens once. 

So it appears that diamond and the relay are talking ok to each other and it looks like the relay and the cache are talking to each other.  

But I never see any updates on the cache server.


Incidentally the query below should be "Infrastructur.servers.HC.aghc01.tcp.TCPPureAcks" etc... but it looks like the first part is getting cut off.

==> /var/log/graphite/query.log <==
11/02/2012 00:41:50 :: [127.0.0.1:57739] cache query for "astructure.servers.HC.aghc01.tcp.TCPPureAcks" returned 0 values
11/02/2012 00:41:50 :: [127.0.0.1:57739] cache query for "astructure.servers.HC.aghc01.loadavg.1minute" returned 0 values

Also the console log on the cache server only ever updates the 13 internal carbon metrics.
==> /var/log/graphite/console.log <==
11/02/2012 00:50:02 :: Sorted 13 cache queues in 0.000044 seconds

If I remove the carbon directory where the whisper files live it gets recreated and I see the tree for all of my old metrics in the webui but the whisper files don't ever get updated and consequently neither does the graph.  

I'm not sure where it went wrong but it is definitely off the rails.  

One other thing I've been rolling my own rpm's and completely removed all old 0.9.8 code from the source and only put the 0.9.9 source in the new rpm's.  I completely removed all rpms for graphite-web, carbon, and whisper and did installs with the new rpms.  Any idea where I should start looking? 

Thanks!

Cody 




-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.