← Back to team overview

fuel-dev team mailing list archive

[Ceilometer] Experience sharing

 

Hi colleagues,

The purpose of this letter is to describe my Ceilometer tests and just
experience in it's deployment.

First of all let me describe what I wanted to achieve. There were a lot of
discussions on summit about Ceilometer's collector performance: community
was not sure that it's worth to start improving performance of fetching
data from Ceilometer's DB _before_ fixing some issues during data
collection. I decided to check how Ceilometer collects data and see that
'issues' in real life. For this purpose I used the following env (copied
from letter to community):

Lab description:
3 controllers
187 computes
HA: Galera for MySQL
memcached is on, RabbitMQ in HA mode

Ceilometer processes are running as follows:
1 controller: ceilometer-api, ceilometer-agent-central,
ceilometer-collector, ceilometer-agent-compute
2 and 3 controller: ceilometer-collector
all computes: ceilometer-agent-compute

My first step was just create 200 instances and monitor it every 5 sec.
During 2 days was collected about 9 000 000 entries in DB. Lab worked
rather ok. But 200 instances for such the lab is very little load.

The second step was to try Rally. Two goals were achieved here: measure the
Ceilometer's influence on Nova (please read the letter till the end for
details about this) and load the lab and check how many entries will be
stored in DB.
    Nova test in several words: scenario boot+delete instance becomes about
2 times slower if Ceilometer is up and monitors 200 instances. But Julien
provided good comments about deployment strategies and I think it's ok
to duplicate
it here:
"If you store them [metrics] in the same MySQL DB that's used by Nova for
example, it's likely that's the problem is the load Ceilometer puts on the
MySQL cluster slows Nova
down."
And my summary:
"So if we want to use Ceilometer + MySQL in production we need to use
separate controllers with Ceilometer's MySQL only. And each controller may
run it's own collector which will write data into "local" MySQL. And only
one instance of central-agent may be started (WIP
https://wiki.openstack.org/wiki/Ceilometer/blueprints/tasks-distribution)"

How the lab was feeling after Rally test? Bad. During the test was created
and deleted about 2000 instances and besides this 200 instances were up
permanently. So after two days:
mysql> select count(*) from meter;
+--------------+
| count(*)     |
+--------------+
| 18177507  |
+--------------+
So one of MySQL servers failed and sync between servers failed too. I
managed with it only by creating a new MySQL cluster (old cluster
reconfiguration).

I think that the load simulated by Rally is not very high actually. E.g. if
we consider Savanna EDP use case than the same load will be achieved by the
following scenario: "20 users runs 33 hadoop-jobs during two days" (cluster
with 3 nodes will be created and deleted).

So to summarize: if we want to use MySQL+Ceilometer in production we should
install it on separate controllers and use separate MySQL for it. And one
more note: I'm talking only about data collection now, data fetching (smth
like "show ceilometer statistics for 2  days") works very slow on MySQL and
a little bit better on MongoDB.

Please if you have any questions, suggestions about further tests - you are
welcome! Now we decided just to run 1000 instances and check Ceilometer
behavior.

Thanks for attention,
Nadya

Follow ups