← Back to team overview

fuel-dev team mailing list archive

Re: [Ceilometer] Experience sharing

 

Nadya,

Thanks a lot for sharing the results.

Roman S.,

What is your perspective on the things described below? I'm worried about
Ceilometer + MySQL support in Fuel 4.0. As you can see, the DB fell apart
in a matter of one day. We can't really afford similar behavior in
customer's environments. So, I'm interested in the following:

   1. Given that Ceilometer generates so much data, will MongoDB really
   help to handle it better or it's a matter of time until it breaks like
   MySQL?
   2. Is there a cleanup mechanism in Ceilometer to clear historical data?
   We don't want the database to grow indefinitely.
   3. Are we pointing Ceilometer to the same Galera instance as we use for
   OpenStack, or we deploy a different MySQL for it in Fuel?
   4. What are our recommended/default settings in Fuel for Ceilometer? It
   is stats collection every 5 seconds, or something different? How safe are
   our defaults?
   5. Do we allow user to tweak essential Ceilometer settings through Fuel
   UI or manually?

Thanks,
Roman

On Tuesday, December 10, 2013, Nadya Privalova wrote:

> Hi colleagues,
>
> The purpose of this letter is to describe my Ceilometer tests and just
> experience in it's deployment.
>
> First of all let me describe what I wanted to achieve. There were a lot of
> discussions on summit about Ceilometer's collector performance: community
> was not sure that it's worth to start improving performance of fetching
> data from Ceilometer's DB _before_ fixing some issues during data
> collection. I decided to check how Ceilometer collects data and see that
> 'issues' in real life. For this purpose I used the following env (copied
> from letter to community):
>
> Lab description:
> 3 controllers
> 187 computes
> HA: Galera for MySQL
> memcached is on, RabbitMQ in HA mode
>
> Ceilometer processes are running as follows:
> 1 controller: ceilometer-api, ceilometer-agent-central,
> ceilometer-collector, ceilometer-agent-compute
> 2 and 3 controller: ceilometer-collector
> all computes: ceilometer-agent-compute
>
> My first step was just create 200 instances and monitor it every 5 sec.
> During 2 days was collected about 9 000 000 entries in DB. Lab worked
> rather ok. But 200 instances for such the lab is very little load.
>
> The second step was to try Rally. Two goals were achieved here: measure
> the Ceilometer's influence on Nova (please read the letter till the end for
> details about this) and load the lab and check how many entries will be
> stored in DB.
>     Nova test in several words: scenario boot+delete instance becomes
> about 2 times slower if Ceilometer is up and monitors 200 instances. But
> Julien provided good comments about deployment strategies and I think it's
> ok to duplicate it here:
> "If you store them [metrics] in the same MySQL DB that's used by Nova for
> example, it's likely that's the problem is the load Ceilometer puts on the
> MySQL cluster slows Nova
> down."
> And my summary:
> "So if we want to use Ceilometer + MySQL in production we need to use
> separate controllers with Ceilometer's MySQL only. And each controller may
> run it's own collector which will write data into "local" MySQL. And only
> one instance of central-agent may be started (WIP
> https://wiki.openstack.org/wiki/Ceilometer/blueprints/tasks-distribution)"
>
> How the lab was feeling after Rally test? Bad. During the test was created
> and deleted about 2000 instances and besides this 200 instances were up
> permanently. So after two days:
> mysql> select count(*) from meter;
> +--------------+
> | count(*)     |
> +--------------+
> | 18177507  |
> +--------------+
> So one of MySQL servers failed and sync between servers failed too. I
> managed with it only by creating a new MySQL cluster (old cluster
> reconfiguration).
>
> I think that the load simulated by Rally is not very high actually. E.g.
> if we consider Savanna EDP use case than the same load will be achieved by
> the following scenario: "20 users runs 33 hadoop-jobs during two days"
> (cluster with 3 nodes will be created and deleted).
>
> So to summarize: if we want to use MySQL+Ceilometer in production we
> should install it on separate controllers and use separate MySQL for it.
> And one more note: I'm talking only about data collection now, data
> fetching (smth like "show ceilometer statistics for 2  days") works very
> slow on MySQL and a little bit better on MongoDB.
>
> Please if you have any questions, suggestions about further tests - you
> are welcome! Now we decided just to run 1000 instances and check Ceilometer
> behavior.
>
> Thanks for attention,
> Nadya
>
>

Follow ups

References