← Back to team overview

fuel-dev team mailing list archive

Re: [Ceilometer] Experience sharing

 

On 12/12/2013 08:51 AM, Roman Alekseenkov wrote:
Nadya,

Thanks a lot for sharing the results.

Roman S.,

What is your perspective on the things described below? I'm worried about Ceilometer + MySQL support in Fuel 4.0. As you can see, the DB fell apart in a matter of one day. We can't really afford similar behavior in customer's environments. So, I'm interested in the following:

 1. Given that Ceilometer generates so much data, will MongoDB really
    help to handle it better or it's a matter of time until it breaks
    like MySQL?
 2. Is there a cleanup mechanism in Ceilometer to clear historical
    data? We don't want the database to grow indefinitely.
 3. Are we pointing Ceilometer to the same Galera instance as we use
    for OpenStack, or we deploy a different MySQL for it in Fuel?

I'd like to add here: 4) Would it make the sense to use ElasticSearch backend driver prototype for ceilometer https://blueprints.launchpad.net/ceilometer/+spec/elasticsearch-driver? Here is some more considerations about ES as NoSQL usage: https://www.found.no/foundation/elasticsearch-as-nosql/
I believe, this is a good point for deeper research.

 1. What are our recommended/default settings in Fuel for Ceilometer?
    It is stats collection every 5 seconds, or something different?
    How safe are our defaults?
 2. Do we allow user to tweak essential Ceilometer settings through
    Fuel UI or manually?

Thanks,
Roman

On Tuesday, December 10, 2013, Nadya Privalova wrote:

    Hi colleagues,

    The purpose of this letter is to describe my Ceilometer tests and
    just experience in it's deployment.

    First of all let me describe what I wanted to achieve. There were
    a lot of discussions on summit about Ceilometer's collector
    performance: community was not sure that it's worth to start
    improving performance of fetching data from Ceilometer's DB
    _before_ fixing some issues during data collection. I decided to
    check how Ceilometer collects data and see that 'issues' in real
    life. For this purpose I used the following env (copied from
    letter to community):

    Lab description:
    3 controllers
    187 computes
    HA: Galera for MySQL
    memcached is on, RabbitMQ in HA mode

    Ceilometer processes are running as follows:
    1 controller: ceilometer-api, ceilometer-agent-central,
    ceilometer-collector, ceilometer-agent-compute
    2 and 3 controller: ceilometer-collector
    all computes: ceilometer-agent-compute

    My first step was just create 200 instances and monitor it every 5
    sec. During 2 days was collected about 9 000 000 entries in DB.
    Lab worked rather ok. But 200 instances for such the lab is very
    little load.

    The second step was to try Rally. Two goals were achieved here:
    measure the Ceilometer's influence on Nova (please read the letter
    till the end for details about this) and load the lab and check
    how many entries will be stored in DB.
        Nova test in several words: scenario boot+delete instance
    becomes about 2 times slower if Ceilometer is up and monitors 200
    instances. But Julien provided good comments about deployment
    strategies and I think it's ok to duplicate it here:
    "If you store them [metrics] in the same MySQL DB that's used by
    Nova for example, it's likely that's the problem is the load
    Ceilometer puts on the MySQL cluster slows Nova
    down."
    And my summary:
    "So if we want to use Ceilometer + MySQL in production we need to
    use separate controllers with Ceilometer's MySQL only. And each
    controller may run it's own collector which will write data into
    "local" MySQL. And only one instance of central-agent may be
    started (WIP
    https://wiki.openstack.org/wiki/Ceilometer/blueprints/tasks-distribution)"

    How the lab was feeling after Rally test? Bad. During the test was
    created and deleted about 2000 instances and besides this 200
    instances were up permanently. So after two days:
    mysql> select count(*) from meter;
    +--------------+
    | count(*)     |
    +--------------+
    | 18177507  |
    +--------------+
    So one of MySQL servers failed and sync between servers failed
    too. I managed with it only by creating a new MySQL cluster (old
    cluster reconfiguration).

    I think that the load simulated by Rally is not very high
    actually. E.g. if we consider Savanna EDP use case than the same
    load will be achieved by the following scenario: "20 users runs 33
    hadoop-jobs during two days" (cluster with 3 nodes will be created
    and deleted).

    So to summarize: if we want to use MySQL+Ceilometer in production
    we should install it on separate controllers and use separate
    MySQL for it. And one more note: I'm talking only about data
    collection now, data fetching (smth like "show ceilometer
    statistics for 2  days") works very slow on MySQL and a little bit
    better on MongoDB.

    Please if you have any questions, suggestions about further tests
    - you are welcome! Now we decided just to run 1000 instances and
    check Ceilometer behavior.

    Thanks for attention,
    Nadya





--
Best regards,
Bogdan Dobrelya,
Researcher TechLead, Mirantis, Inc.
+38 (066) 051 07 53
Skype bogdando_at_yahoo.com
38, Lenina ave.
Kharkov, Ukraine
www.mirantis.com
www.mirantis.ru
bdobrelia@xxxxxxxxxxxx


References