nova-orchestration team mailing list archive

Thread
Date

Re: Dec 1st 2011 IRC Meeting

To: Sandy Walsh <sandy.walsh@xxxxxxxxxxxxx>, "nova-orchestration@xxxxxxxxxxxxxxxxxxx" <nova-orchestration@xxxxxxxxxxxxxxxxxxx>
From: "Dugger, Donald D" <donald.d.dugger@xxxxxxxxx>
Date: Sat, 3 Dec 2011 22:24:57 -0800
Accept-language: en-US
Acceptlanguage: en-US
In-reply-to: <60A3427EF882A54BA0A1971AE6EF03881D659ACE@ORD1EXD02.RACKSPACE.CORP>
Thread-index: AcywaFWH41vxIFTDQjCFEVRttkoROQB4+alA
Thread-topic: Dec 1st 2011 IRC Meeting

As promised at the meeting Thurs. I've attached a writeup of the Proof of Concept scheduler I created that offers that same functionality as the Abstract Scheduler but implements things through two new tables in the DataBase.  I implemented this scheduler based upon the Cactus release and it is fully functional but we didn't do any performance analysis on it.

Would be very interested in any questions/comments/epithets anyone has about this idea.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786


-----Original Message-----
From: nova-orchestration-bounces+donald.d.dugger=intel.com@xxxxxxxxxxxxxxxxxxx [mailto:nova-orchestration-bounces+donald.d.dugger=intel.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Sandy Walsh
Sent: Thursday, December 01, 2011 1:35 PM
To: nova-orchestration@xxxxxxxxxxxxxxxxxxx
Subject: [Nova-orchestration] Dec 1st 2011 IRC Meeting

http://eavesdrop.openstack.org/meetings/openstack-meeting/2011/openstack-meeting.2011-12-01-20.01.html

details:
http://eavesdrop.openstack.org/meetings/openstack-meeting/2011/openstack-meeting.2011-12-01-20.01.log.html

-- 
Mailing list: https://launchpad.net/~nova-orchestration
Post to     : nova-orchestration@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~nova-orchestration
More help   : https://help.launchpad.net/ListHelp

=== Policies & Constraints based Scheduler for OpenStack ===

== Overview ==

The basic architeture for this scheduler is described in the blueprint
at `http://wiki.openstack.org/PC_scheduler'.  This is a detailed description
of how the PC Scheduler is implemented and how to utilize it.

The basic idea is that the PC Scheduler will utilize policies and constraints
that are described in the SQL database table `policies'.  Policies and
constraints are just metrics that are evaluated for different hosts to see
which host is the best one to instantiate a job.  The difference between
a policy and a constraint is the way the metric is evaluated:

constraint - the metric is compared to a specific value and the constaint
  is only met if the comparison is true.  For example, a constraint could
  be that available memory must be greater than 4G.  With this constraint,
  a host that only has 2G of free memory would not be a candidate to
  host a job.

policy - the metric indicates how well the host can support the job.
  For example, a policy could be that the host with the most amount of
  free memory is where the job should be instantiated.

Contraints operate on the black ball technique, if any one constraint is
not met then the job is not schedulable on that host, no matter what the
other constraints show.

Policies on the other hand are additive.  All of the policy metrics are
added up and the result is the goodness value for running the job on that
host.  The host with the largest goodness value is the host that will be
selected to run the job.

Note that it is very important to be careful about setting up constraints.
It is very easy to setup a set of constraints that can't be met by any
host which means that new jobs cannot be scheduled.  Obviously this is a
situation to be avoided at all costs.

== Policies table ==

The schema for the `policies' table is as follows:

--
-- Table structure for table `policies`
--

CREATE TABLE `policies` (
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `deleted_at` datetime DEFAULT NULL,
  `deleted` tinyint(1) DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL DEFAULT '',
  `term` varchar(255) NOT NULL DEFAULT '',
  `op` varchar(15) DEFAULT NULL,
  `value` int(11) DEFAULT '0',
  `weight` int(11) DEFAULT '0',
  PRIMARY KEY (`id`)
);

Note that the columns `created_at', `updated_at', `deleted_at', `deleted',
`disabled' and `id' are required by the OpenStack database infrastructure
and are not utilized by the PC Scheduler.

The column `name' is used to identify the policies & constraints for
a particular class of jobs.  These classes are referred to in OpenStack
as `falvors'.  There can be multiple different classes of policies in
this table, all identified by a specific `name'.

`term' identfies a specific metric that will be used as part of a policy
or constraint.  The value of the metric identified by `term' is extracted
from the SQL database `metrics', described later in this document.

`op' identifies the comparison operation to be done when comparing
`term' to the constraint value contained in the column `value'.  The
valid ops that can be used for constraints are:

	>   - greater than
	>=  - greater than or equal
	<   - less than
	<=  - less than or equal
	==  - equal
	!=  - not equal

If `op' contains `best' then `term' is part of a policy and there is
no comparison, the scheduler will just find the host which has the
best (e.g. numerically highest) value for this metric.

`weight' is only used for `term's that are policies and is used to
give a weight (e.g. a percentage) to a term.  When there are multiple
`term's that make up a policy the value for each `term' is multiplied
by its weight and then all the `term's in the policy are added up to
obtain the goodness value for a host.  Then the host with the greatest
goodness value will be the one selected to instantiate the job. 

An example should help to clarify how this works.  Imagine a `policies'
table with the following values:

mysql> select name,term,op,value,weight from policies;
+---------+---------+------+-------+--------+
| name    | term    | op   | value | weight |
+---------+---------+------+-------+--------+
| default | loadavg | best |     0 |    100 |
| compute | loadavg | best |     0 |    100 |
| compute | mem     | >    |   400 |      0 |
| compute | cpu     | <=   |    50 |      0 |
| compute | disk    | >    | 50000 |      0 |
+---------+---------+------+-------+--------+
5 rows in set (0.00 sec)

This table defines 2 different scheduling policies, `default' and
`compute'.  The `default' policy has no constraints and the goodness
value is calculated from one metric, `loadavg', which is given 100%
weight.

The `compute' policy uses the same metrics to compute the goodness
value but it also has 3 constraints:

  mem > 400       Free memory must be greater than 400M
  cpu <= 50       CPU utilizaion must be less than or equal to 50%
  disk > 50000    Disk free space must be greater than 50G

Note that this table does not specify the units that the `value's
represent.  This is a convention that must be maintained between
the `policies' table and the `metrics' table.  The actual values for
the metrics will be contained in the `metrics' table and it is the
responsibility of the `policies' table to know what those units are
and put the appropriate value in the `value' column to represent those
units.

== Metrics table ==

The schema for the `metrics' table is as follows:

--
-- Table structure for table `metrics`
--
CREATE TABLE `metrics` (
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `deleted_at` datetime DEFAULT NULL,
  `deleted` tinyint(1) DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `host` varchar(255) NOT NULL DEFAULT '',
  `metric` varchar(255) NOT NULL DEFAULT '',
  `value` int(11) DEFAULT '0',
  PRIMARY KEY (`id`)
);

Again the columns `created_at', `updated_at', `deleted_at', `deleted',
`disabled' and `id' are required by the OpenStack database infrastructure
and are not utilized by the PC Scheduler.

`host' identifies the specific host machine for a `metric'

`metric' identifiss a measurable item about that host, such as cpu utilization,
free memory, free disk space, etc.

`value' is the specific value for that metric.  This value is just an
integer representing units that are appropriate for the metric.  As
an example, free memory could be in units of kilo-bytes, mega-bytes or
giga-bytes.  It doesn't make any difference which units are used as long
as the associated values placed in the `policies' table represent the same
units.

As an example, this is what a `metrics' table could look like:

mysql> select host,metric,value from metrics;
+---------+---------+-------+
| host    | metric  | value |
+---------+---------+-------+
| ptah    | mem     |   400 |
| ptah    | cpu     |    50 |
| ptah    | disk    | 50000 |
| astarte | mem     |  1000 |
| astarte | cpu     |    10 |
| astarte | disk    | 70000 |
| astarte | loadavg |    40 |
| ptah    | loadavg |    94 |
+---------+---------+-------+
8 rows in set (0.00 sec)

In this table `mem' is in units of MBytes, `cpu' is percentage busy and
disk is in units of MBytes.  It is a requirement that any constraints in
the `policies' table take these units into account.

Note that the entries in the `metrics' table are updated by a plugin for
the compute nodes that runs on a 1 minute period.  This plugin can be
customized to provide any metrics desired, with the corresponding entries
in the `policies' table set to utilize the entries in the `metrics' table.

== PC Scheduler plugin ==

The Policies and Constraints Scheduler is implemented as a plugin to the
OpenStack scheduler.  The plugin is located in the file:

	nova/scheduler/pc.py

in the standard OpenStack installation path.  To enable the PC Scheduler
the user adds the following parameter to the `/etc/nova/nova.conf' file:

	--scheduler_driver=nova.scheduler.pc.PCScheduler

== Metrics plugin ==

Updating the contents of the `metrics' table is accomplished by a plugin
that is added to the compute subsystem of OpenStack.  A default plugin is
located in the file:

	nova/compute/loadavg.py

The default `loadavg' plugin is called as if there were a line in the
`/etc/nova/nova.conf' of the form:

	--metrics_update=nova.compute.loadavg.MetricsUpdate

A new plugin must define the class `MetricsUpdate' and provide a method for
the class called `update'.  The `update' method will be called every minute
with a single parameter, `host', which is a string containing the name of
the host machine.

The update method can set as many metrics in the `metrics' table as it
desires.  Updates to the metrics table are accomplished by calling the
routine `metrics_update_host' which takes 3 parameters:

	host    - the name of the host
	metric  - the metric to be placed into the table
	value   - the value of the metric

== loadavg - default metric plugin ==

The default plugin `loadavg' provides a policy metric that is based upon
the system load average.  The straight load average is not appropriate
for scheduling purposes since it is stricly a measure of the number of
active processes in the system and doesn't take into account the available
CPUs to process those processes.

The `loadavg' plugin takes the system load average and normalizies it to a
value between 0 - 100 to represent the activity on the system, where 100
is a totally idle system and 0 is a system with all CPUs fully active (e.g.
no spare CPU cycles to handle a new job).

The plugin determines how many CPUs are available in the system and, if the
loadavg is greater than the number of CPUs sets the metric to 0 (the host is
fully active).  Otherwise the loadavg is subtracted from the number of CPUs
(indicating the CPU capaacity available for new jobs) and the result is
scaled between 0 - 100.

Follow ups

Re: Dec 1st 2011 IRC Meeting
From: Joseph Suh, 2012-01-16

References

Dec 1st 2011 IRC Meeting
From: Sandy Walsh, 2011-12-01