savanna-all team mailing list archive

Thread
Date

Cluster scaling discussion

To: savanna-all@xxxxxxxxxxxxxxxxxxx
From: Nadezhda Privalova <nprivalova@xxxxxxxxxxxx>
Date: Fri, 28 Jun 2013 16:37:02 +0400

Hi all,

Here are some our thoughts and ideas about cluster scaling feature in
Savanna. All the following is only about datanode and tasktracker
processes. We are not planning to support any other processes. Please, If
you do not agree with it, share your thoughts.


Now we are considering 2 scenarios:

I. User scales existing cluster's node groups. It is rather simple feature
because we may just copy all configs from existing instances of node group.
>From Hadoop's perspective there is only one additional step besides "start
tasktracker/datanode": before tasktracker's start it is needed to rebalance
cluster.
    And in this scenario it is obviously to have an ability to delete
instances from node group. And here I'm concerned. Datanode's decommission
needs a lot of time for processing. So if user wants to delete one instance
from "datanode" node group (this node group has only 'datanode' process)
and add one instance to "tasktracker" node group the required time may be
unacceptable. So I suppose that datanode decommission should be a separate
process, not part of cluster scaling. What do you think about it?

II User adds a new node group to cluster. Here Savanna repeats flow from
cluster creation. Here we cannot copy the configs and need to create all
*.xml config-files for Hadoop.

As for REST, we propose to make request as follows:


{
    "resize_node_groups": [
        {
            "name": "storage",
            "count": 10
        },
        {
            "name": "worker",
            "count": -1        <-----deletion
        }
    ],
    "add_node_groups": [
        {
            "node_group_tmpl_id": "520ee6a2-c8f5-4c9b-86c4-fd273860ff8e"
            "name": "new-node-group-name",
            "node_processes": ["datanode", "jobtracker"],
            "flavor_id": 42
        }
    ]
}

So we propose to add all the stuff in one PUT cluster's call.

Please share your thoughts about it, because we're planning to
implement scaling in the current phase.

Best regards,
Nadya

Follow ups

Cluster scaling discussion
From: Matthew Farrellee, 2013-07-12
Re: Cluster scaling discussion
From: Himanshu Bari, 2013-06-28