← Back to team overview

graphite-dev team mailing list archive

[Question #254080]: Carbon cache query exceptions.

 

New question #254080 on Graphite:
https://answers.launchpad.net/graphite/+question/254080

We have a 3-cluster Graphite installation, version 0.9.12, and on one of the servers I see a steady stream of entries in the exception log as follows:

Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7002)
None
Thu Sep 04 19:28:36 2014 :: ('127.0.0.1', 7202)
None

Looking through the code it appears this means that the function "recv_exactly" in render/datalib.py is not getting the expected amount of data from the carbon cache.  Aside from the fact that the message could be a little clearer (hint hint...), I am at a loss to explain what is happening.

Connections are being made to the cache query port, I can telnet to it, and Graphite appears to be working correctly.  We DO have a lot of traffic on the web servers (they are behind a load balancer, so mostly equal loads).  We do have some occasional issues with missing the last few data points on some graph lines.

The other servers have identical configurations, both carbon and webapp, and we don't see these messages on the other two.  Our architecture is as follows:

Each server has:
2 relays (Level 1) configured to send via consistent hashing to 2 other relays on all 3 servers.
2 relays (Level 2) configured to send via consistent hashing to the carbon caches on the local host
4 caches.

This way the consistent hash ring that the web app uses (for accessing caches on the local host) will match the ring that carbon-relay uses.  

So my question is twofold:

* What causes these errors?
* How do I fix it?

Thanks,
Steve Keller


-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.