← Back to team overview

graphite-dev team mailing list archive

[Question #201494]: Not all metrics are saved on EC2 installation

 

New question #201494 on Graphite:
https://answers.launchpad.net/graphite/+question/201494

I am currently evaluating Graphite performance handling 100k metrics per minute. I've created two identical setups on a local VM and a medium instance in EC2, made a script which would post new metric "systemN.loadavg_1min {rand} {now}" with N ranging from 1 to 50k (sleeping for 0.0006s after each, so that there are 100k per minute) and the metric value is random.

After a while I tried counting the number of directories in the storage location locally:

me@ubuntu:~/graphite-dev$ ls /opt/graphite/storage/whisper/ | wc
50000   50000  588889

and on EC2 (whisper dir is symlinked to /mnt):

ubuntu@ip-x-x-x-x:/opt/graphite$ ls /mnt/whisper/ | wc
31998   31998  372865

The number 31998 does not grow and the strangest thing is that when I delete /mnt/whisper completely, create it back and restart the script, the directory count stops at 31998 again.

console.log contains this kind of entries:

26/06/2012 12:27:37 :: Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 167, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/opt/graphite/lib/carbon/writer.py", line 158, in writeForever
    writeCachedDataPoints()
  File "/opt/graphite/lib/carbon/writer.py", line 118, in writeCachedDataPoints
    whisper.create(dbFilePath, archiveConfig, xFilesFactor, aggregationMethod, settings.WHISPER_SPARSE_CREATE)
  File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 327, in create
    fh = open(path,'wb')
exceptions.IOError: [Errno 2] No such file or directory: '/opt/graphite/storage/whisper/system31851/loadavg_1min.wsp'

Obviously the permissions are ok since the rest of the dirs are created, but some are not. The box has 1 CPU and 4G memory, the /mnt filesystem has 300GB+ of free space.

I have set MAX_CACHE_SIZE to 100000 to force carbon to write the data to disk sooner, MAX_UPDATES_PER_SECOND and MAX_CREATES_PER_SECOND are "inf". Hovewer the disk usage is not high:

ubuntu@ip-x-x-x-x:/opt/graphite$ iostat -dxk 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.32    0.48    0.55     5.89     5.98    23.12     0.01    6.35    8.14    4.77   2.45   0.25
xvdb              0.00   304.39    4.97   60.72    38.24  1460.47    45.63    10.81  164.52    5.54  177.54   1.26   8.28

I guess since the logs show "unhandled exception", this is due to python threads dying together with a part of metrics.

How can I fix that?

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.