fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #00152
Re: [Ceilometer] Experience sharing
On 12/12/2013 08:51 AM, Roman Alekseenkov wrote:
Nadya,
Thanks a lot for sharing the results.
Roman S.,
What is your perspective on the things described below? I'm worried
about Ceilometer + MySQL support in Fuel 4.0. As you can see, the DB
fell apart in a matter of one day. We can't really afford similar
behavior in customer's environments. So, I'm interested in the following:
1. Given that Ceilometer generates so much data, will MongoDB really
help to handle it better or it's a matter of time until it breaks
like MySQL?
2. Is there a cleanup mechanism in Ceilometer to clear historical
data? We don't want the database to grow indefinitely.
3. Are we pointing Ceilometer to the same Galera instance as we use
for OpenStack, or we deploy a different MySQL for it in Fuel?
I'd like to add here: 4) Would it make the sense to use ElasticSearch
backend driver prototype for ceilometer
https://blueprints.launchpad.net/ceilometer/+spec/elasticsearch-driver?
Here is some more considerations about ES as NoSQL usage:
https://www.found.no/foundation/elasticsearch-as-nosql/
I believe, this is a good point for deeper research.
1. What are our recommended/default settings in Fuel for Ceilometer?
It is stats collection every 5 seconds, or something different?
How safe are our defaults?
2. Do we allow user to tweak essential Ceilometer settings through
Fuel UI or manually?
Thanks,
Roman
On Tuesday, December 10, 2013, Nadya Privalova wrote:
Hi colleagues,
The purpose of this letter is to describe my Ceilometer tests and
just experience in it's deployment.
First of all let me describe what I wanted to achieve. There were
a lot of discussions on summit about Ceilometer's collector
performance: community was not sure that it's worth to start
improving performance of fetching data from Ceilometer's DB
_before_ fixing some issues during data collection. I decided to
check how Ceilometer collects data and see that 'issues' in real
life. For this purpose I used the following env (copied from
letter to community):
Lab description:
3 controllers
187 computes
HA: Galera for MySQL
memcached is on, RabbitMQ in HA mode
Ceilometer processes are running as follows:
1 controller: ceilometer-api, ceilometer-agent-central,
ceilometer-collector, ceilometer-agent-compute
2 and 3 controller: ceilometer-collector
all computes: ceilometer-agent-compute
My first step was just create 200 instances and monitor it every 5
sec. During 2 days was collected about 9 000 000 entries in DB.
Lab worked rather ok. But 200 instances for such the lab is very
little load.
The second step was to try Rally. Two goals were achieved here:
measure the Ceilometer's influence on Nova (please read the letter
till the end for details about this) and load the lab and check
how many entries will be stored in DB.
Nova test in several words: scenario boot+delete instance
becomes about 2 times slower if Ceilometer is up and monitors 200
instances. But Julien provided good comments about deployment
strategies and I think it's ok to duplicate it here:
"If you store them [metrics] in the same MySQL DB that's used by
Nova for example, it's likely that's the problem is the load
Ceilometer puts on the MySQL cluster slows Nova
down."
And my summary:
"So if we want to use Ceilometer + MySQL in production we need to
use separate controllers with Ceilometer's MySQL only. And each
controller may run it's own collector which will write data into
"local" MySQL. And only one instance of central-agent may be
started (WIP
https://wiki.openstack.org/wiki/Ceilometer/blueprints/tasks-distribution)"
How the lab was feeling after Rally test? Bad. During the test was
created and deleted about 2000 instances and besides this 200
instances were up permanently. So after two days:
mysql> select count(*) from meter;
+--------------+
| count(*) |
+--------------+
| 18177507 |
+--------------+
So one of MySQL servers failed and sync between servers failed
too. I managed with it only by creating a new MySQL cluster (old
cluster reconfiguration).
I think that the load simulated by Rally is not very high
actually. E.g. if we consider Savanna EDP use case than the same
load will be achieved by the following scenario: "20 users runs 33
hadoop-jobs during two days" (cluster with 3 nodes will be created
and deleted).
So to summarize: if we want to use MySQL+Ceilometer in production
we should install it on separate controllers and use separate
MySQL for it. And one more note: I'm talking only about data
collection now, data fetching (smth like "show ceilometer
statistics for 2 days") works very slow on MySQL and a little bit
better on MongoDB.
Please if you have any questions, suggestions about further tests
- you are welcome! Now we decided just to run 1000 instances and
check Ceilometer behavior.
Thanks for attention,
Nadya
--
Best regards,
Bogdan Dobrelya,
Researcher TechLead, Mirantis, Inc.
+38 (066) 051 07 53
Skype bogdando_at_yahoo.com
38, Lenina ave.
Kharkov, Ukraine
www.mirantis.com
www.mirantis.ru
bdobrelia@xxxxxxxxxxxx
References