← Back to team overview

openstack team mailing list archive

Re: Pondering multi-tenant needs in nova.

 

On 2/7/11 2:49 PM, Eric Day wrote:
Thanks for explaining things further, Jay.

I agree if we want external systems poking into Nova for audit/billing
queries, then yes, this gets inefficient. My assumption is that Nova
specific DBs only contain operational data required for production and
it would push billing/audit events to some external system that can
collect, aggregate, and answer those queries efficiently. Trying to
design a common data store that fits both use cases of provisioning
instances/networks/volumes along with handling queries for
billing/audit would be difficult (as we are seeing). Pushing
billing/audit data to another system gives us the flexibility to
choose the most suitable data store and querying abilities for each
use case without making sacrifices for the other.

Yes this is the model proposed with the system-usage blueprint. Nova publishes usage data, via a pub/sub interface, and a separate billing/audit system subscribes to those events, and builds it's datastore as it sees fit.


If the group consensus is to keep billing/usage data inside Nova
data stores, I agree we'll need a richer mapping of entities inside
of Nova that matches the external accounting system. This also leads
to a duplication of data with the relationships between entities,
since the organizational database (LDAP, ...) will contain this,
as well as Nova (and possibly each project), and we'll need to make
sure these stay in sync. For example, if the user<->project mapping
is used across projects, we'll need hooks so LDAP updates propagate
to all projects that duplicate this structure in a custom way.

If we want to go this route (billing data inside Nova), I've been
barking up the wrong tree for quite a while. :)  My vote would still
be pushing events to an OpenStack firehose and let billing/audit tap
into that, but of course I'll defer to the group decision.

-Eric

On Mon, Feb 07, 2011 at 03:22:06PM -0500, Jay Pipes wrote:
On Mon, Feb 7, 2011 at 2:46 PM, Eric Day<eday@xxxxxxxxxxxx>  wrote:
Perhaps we can be a bit more explicit about what the performance
issues will be. From your original email, you listed a few queries
with an example "X-Y-Z" string for the account ID. My answer to
those concerns is to let an account plugin parse that string that
recognizes the format and structure the SQL/LDAP/... backend to make
queries for those IDs efficient.
You are thinking about it in the wrong "direction".  You're talking
about Nova asking an auth plugin for authorization information. I'm
talking about something entirely the opposite direction: plugins
asking *Nova* for information attached to a group of entities. If we
go down this proposed route, then Nova will have no way to efficiently
query its data stores for queries from plugins such as:

For example, assume this example scenario:

An organization uses the following structure:

Rackspace ->  Reseller A ->  Customer B ->  Project X

if a billing plugin wants to find all usage records for Customer B, it needs to:

* Query its internal data store for all projects at Customer B
* For each project in list, issue a request to Nova's (as yet
undefined) usage/audit API. Something like this would be executed by
the plugin:

account_id = "rackspace-resellerA-customerB"
customer_id = account_id.split('-')[-1]
projects = get_list_of_projects_for_customer(customer_id)
records = []
for project in projects:
     account_string = account_id + '-' + project
     records.append(nova.client.do_request('GET',
'/usage/by-account-id/%s' % account_string))

This is, of course, perfectly fine, but I think you can agree with me
that this is far from efficient.  Assume Customer B has 100 projects
(not unreasonable for a reseller-type organization). That is 100
requests to Nova's usage/audit API. And 100 SQL queries against the
Nova data store(s).

What I'm saying is that by adding a flexible org schema to Nova's data
store would allow the whole thing to be done in a single request to
the Nova usage/audit API:

GET /usage/by-account-id/customerB?include_children=1

With the include_children=1 param signalling to Nova to do a nested
sets query against the organization schema.

1 query instead of 100.

I hope that explains the performance difference adequately.

Now, that said, punting this to the external plugin system and
adapting the existing projects table to support multiple networks is
the simplest and shortest solution. I was just pointing out that this
solution, while simple, can lead to inefficiencies in the long run.

I also know that you are pushing hard for a fully-distributed data
store for Nova, and that distributed data storage would at least
mitigate some of the inefficiency concerns (because the distributed
data store would force the 100 queries anyway...).

But, I wanted to air my concerns about efficient querying of this kind
of data up front.

Cheers,

jay

Let's use the term 'entity' for 'account'. An entity could also be
a project or some other org node.

If we step back and look at what Nova needs from such an API, it's
really just "authenticate with ID X and token Y", and if successful you
get back a list of entities that the requested ID is part of (perhaps
an 'account' and 'project', but those are deployment-specific terms).

For simple deployments with no account structure, an ID may just be
a direct entity reference, and the returned entity list contains a
single entity for the ID.

The account plugin layer will convert the given ID to the list of
entities by parsing the string and performing an efficient lookup
specific to the format/deployment. The existence of a returned
entity may be sufficient for authz purposes alone, or we may attach
project-specific key/value pairs or a comma separate list value that
can store more fine-grained permissions per entity (CRUD flags). I'm
thinking this would be kept this opaque in the backend account system
(generic metadata per project), and interpreted when loaded in the
nova auth/account plugins to give efficient matching operations.

The nova.context object would then have a list of entities that can
be used for all authorization requests, for example, if an instance
belongs to entity A and B (where A may be a user and B may be a project
in this deployment), then operations are verified against the list
of context entities to ensure one of them allows it. This is pretty
much how it works today, just replacing 'user' and 'project' columns
on resources with an arbitrary mapping of entities per resource.

What are some other specific queries that Nova or other projects need
to make that you feel would be inefficient or would result in hacks?

-Eric

On Mon, Feb 07, 2011 at 12:57:09PM -0500, Jay Pipes wrote:
I give up trying to explain why I think this will lead to suboptimal
performance and hacks all over the external ecosystem.

If the decision is to punt and let the external "auth systems" be
responsible for understanding the relationships between accounts and
users in Nova, so be it.

I'll just predict the performance issues up front and call it a day.

Cheers,
jay

On Mon, Feb 7, 2011 at 12:46 PM, Eric Day<eday@xxxxxxxxxxxx>  wrote:
On Mon, Feb 07, 2011 at 08:50:28AM -0500, Jay Pipes wrote:
No, I think you've missed my point. Comments inline...
Actually, I think I did get all your points, we're just not connecting
somewhere. :)

On Mon, Feb 7, 2011 at 12:35 AM, Eric Day<eday@xxxxxxxxxxxx>  wrote:
I disagree with your disagreement. :)

When we have string based ID's like this, it doesn't need to translate
directly into a varchar column for operations. First, auth data may not
be stored as SQL at all for some systems and could be broken out into
key/value pairs with some indexed. It could also translate directly
into a LDAP hierarchy which can be tuned to be very efficient. For
SQL-based auth storage, this could remain pluggable according to the
how the organization creates the string. For example, the string may
be broken out into parts for the auth lookup and mapped to various
columns/tables to search/join together efficiently.
I wasn't talking at all about auth. Accounts != auth. Accounts are a
way to group users and groups of users in order to assign arbitrary
attributes to that group of things (think: billing attributes, policy
attributes (like preference of networking topology, quality of
service, etc)).
I think we need to push accounts, authz, authc, access
control, ... into the auth/accounts API (nova.auth). Actually,
openstack-common/auth, but that's another email. :)

We can structure the account objects returned by this API to
provide all the information we need, and then allow limited update
functionality on them. You want to be using your orgs auth-system API
to manage that data, not Nova's API. The openstack accounts API just
peeks into it getting all the information it needs.

Having an opaque account identifier in Nova means that Nova is
essentially "giving up" on trying to have an efficient, standardized
query interface for accounts. If this is what is wanted, so be it, but
I think I adequately pointed out the efficiency problems of that
approach.  More below...
Yes and no. Nova should not have this, an account management system
should. Nova (and other projects) needs an API into this that returns
objects that works for our various use cases. For example, at the authc
step, you may get back a list of entity objects that represent all
the users/groups/whatever it maps to (or possibly a single aggregate
object), and you may have some generic, limited operations on those
objects that can write back to the pluggable store. Think of this
mostly as a read-only API.

The examples below
where you have 'X-Y-Z' format is assuming a certain structure/layout,
Ah, and this is where you would be incorrect.  The SQL structure I
specified can accomodate virtually any structure, not just X-Y-Z, and
Ok, when you were doing prefix/suffix matching searches, that was
assuming something, but if those SQL queries were meant to be behind
an auth plugin, then n/m. :)

My main point is this could be a big base64 string, a comma-separated
entity list, a number, a snowman, anything.

that's why I was proposing it.  Having a string-based all-in-one key
actually forces Nova to expose an API that doesn't know how to query
for information properly. Like I said, if the decision is for ALL
account "stuff" to be handled externally, then Nova shouldn't even
have an API to get information via account, since it doesn't know what
an account actually is. If the "external only" is the decision, then
That's what I am proposing, nova-core shouldn't concern itself with
stats, billing, reporting, and advanced user management. I agree with
Swift's approach.

you might as well just add a field to the projects table called
"account_tag", make it a TEXT field, and have it output as-is in the
data retrieval APIs. The one-network-per-project is a wholly separate
issue that can be addressed by reworking the networking code to allow
"projects" to have multiple networks, but, as I said in the original
post, I think the whole concept of a project in Nova right now is too
restrictive and could be replaced by the model I showed.

but I think we should treat the string as completely opaque outside the
auth plugins and let the auth plugins perform optimized translation
and lookup into auth and access control objects that are used in the
rest of the code for the various projects (much like it is today).
Again, this has nothing to do with auth at all.
I was assuming auth/accounts were wrapped up in the same API. So call
it openstack accounts API, which also manages auth*.

The root of the issue is the projects table in Nova. It only works for
the most basic organizational structure. If it could be
adapted/replaced by a model that can represent a much wider range of
organizational structures, that would be my ideal solution.
Yes, it needs to go, and I think the structure you provide is along
the lines of what we need to do with account objects in code. Just turn
your proposed SQL tables into Python classes and put a well-defined API
around it so it can be backed by anything and we'll be in agreement. We
don't want to tie ourselves to SQL, as it should not be first class
(LDAP, NoSQL, ...).

-Eric

-jay

-Eric

On Sun, Feb 06, 2011 at 09:57:56AM -0500, Jay Pipes wrote:
Strongly disagree, but nicely, of course :)

I'll disagree by showing you an example of why not having a queryable
org model is problematic:

Let's say we go ahead and do what Glen suggests and have a string
account ID that is then attached to the user in a one to many
relationship.

In SQL (MySQL variant below), this is represented as so:

# Our existing users table:
CREATE TABLE users (
   id VARCHAR(255) NOT NULL PRIMARY KEY,
   access_key VARCHAR(255) NOT NULL,
   secret_key VARCHAR(255) NOT NULL,
   is_admin TINYINT NOT NULL DEFAULT 0
);

# Proposed accounts table, with string based tag-like account identifier:
CREATE TABLE accounts (
   id VARCHAR(255) NOT NULL PRIMARY KEY,
   user_id VARCHAR(255) NOT NULL,
   FOREIGN KEY fk_users (user_id) REFERENCES users (id)
);

Now let's say that we store account IDs like this: enterprise-dept-milestone.

How would we get all accounts in Enterprise X? Easy, and efficiently:

SELECT id FROM accounts WHERE id LIKE "X%"

How would we get all accounts in Enterprise X and Dept Y? Again, it
would be easy and efficient:

SELECT id FROM accounts WHERE id LIKE "X-Y-%"

But, what happens if multiple departments can work on the same
milestone (a common requirement)?

How do we query for all accounts in Enterprise X and Milestone Z?

The SQL would be horrific, and with millions of records, would bog the
reporting system down (trust me):

SELECT id FROM accounts WHERE id LIKE "X%-%-%Z".

The above query would force a full table scan across the entire
accounts table. An organization like Rackspace would theoretically
have millions of account records (# customers + (# customers X
#customer "projects") + (# resellers X # reseller customers) + (#
reseller customers X # reseller customer "projects"))

The "simpler" query of getting all accounts working on a milestone now
becomes equally inefficient:

SELECT id FROM accounts WHERE if LIKE "%-Z"

The above query also has the side-effect of introducing subtle bugs
when, and this will happen because of Murphy's law, accounts called
"Rackspace-Accounting" and "Rackspace-IT-Accounting" are created.
Now, the account for the accounting department and the IT department's
"Accounting" milestone are erroneously returned.

While it may seem nice and easy to put string-based, loose tags into
the system, this decision is extremely difficult to reverse when made,
and it leads to inefficiencies in the querying of the system and
subtle query bugs as noted above.

A more robust way of structuring the schema is like so, again in the
MySQL SQL variant:

# Our existing users table:
CREATE TABLE users (
   id VARCHAR(255) NOT NULL PRIMARY KEY,
   access_key VARCHAR(255) NOT NULL,
   secret_key VARCHAR(255) NOT NULL,
   is_admin TINYINT NOT NULL DEFAULT 0
);

# Organizations are collections of users that can contain other organizations
CREATE TABLE organization (
   id INT NOT NULL NOT NULL PRIMARY KEY,
   user_id VARCHAR(255) NOT NULL,
   parent INT NULL, # Adjacency list model enables efficient child and
parent lookups
   left INT NULL, # left and right enable the nested sets model that enables
   right INT NULL, # equally efficient lookups of more complex relationships
   FOREIGN KEY fk_users (user_id) REFERENCES users (id)
);

The above structure can accomodate both simple (get my immediate
parent or immediate children) queries and complex queries (get ALL my
children, aggregate querying across the entire tree or subtrees) and
do so efficiently. The query API interface that we expose via Nova
(that would be consumed by some reporting/audit/management tools)
would therefore not be a serious drain on the databases storing Nova
data.

More information on the adjacency list and nested sets models are
available here:

http://en.wikipedia.org/wiki/Nested_set_model
http://en.wikipedia.org/wiki/Adjacency_list_model

I'd highly recommend this solution as opposed to the seemingly simple
tag-based solution that leads to gross querying inefficiencies and
subtle bugs.

Just my two cents.

-jay

On Thu, Feb 3, 2011 at 7:38 PM, John Purrier<john@xxxxxxxxxxxxx>  wrote:
I think Glen is on the right track here. Having the account_ID be a string
with no connotation for Nova allows two benefits: 1) deployments can create
the arbitrary organizational models that fit their particular DC, physical,
and logical structures, and 2) the Nova code is simpler as the hierarchical
concepts do not have any manifestations in the code.



Additional benefit includes an easier mapping to the particular identity and
authorization system that a deployment chooses to use.



John



From: openstack-bounces+john=openstack.org@xxxxxxxxxxxxxxxxxxx
[mailto:openstack-bounces+john=openstack.org@xxxxxxxxxxxxxxxxxxx] On Behalf
Of Glen Campbell
Sent: Thursday, February 03, 2011 2:42 PM
To: Devin Carlen; Monsyne Dragon
Cc: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] Pondering multi-tenant needs in nova.



I think that this could be done in the current proposal. Specifically, the
account_id is an arbitrary string that is generated externally to Nova. You
could, for example, easily identify an organizational hierarchy. For
example, an accountID could be:



enterprise-org-project-milestone



 From Nova's point of view, it makes no difference, so long as that string is
associated with a usage event and regurgitated when reported. The cloud
administrator can interpret it however it chooses. For simple organizations,
it could be identical to the project_id, or even just blank. The project_id
holds the network information, and the account_id tracks the usage and other
notifications.



There's no good reason for Nova to have to model an organization internally;
it certainly wouldn't match all the possible org structures available.







From: Devin Carlen<devin.carlen@xxxxxxxxx>
Date: Thu, 3 Feb 2011 12:02:38 -0800
To: Monsyne Dragon<mdragon@xxxxxxxxxxxxx>
Cc:<openstack@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Openstack] Pondering multi-tenant needs in nova.



We were just talking about this the other day.  We definitely need some kind
of further hierarchy.  I think a typical kind of use case for multi-tenant
could be something like:



Enterprise contains Organizations



Organizations contain Organizations and Projects



Projects contain Instances, etc.





In this structure enterprise is just a top level organization.  If we
structure it this way it would make metering and billing pretty simple.













On Feb 2, 2011, at 5:37 PM, Monsyne Dragon wrote:



I am sorting out some possible implementations for the
multi-tenant-accounting blueprint, and the related system-usage-records bp,
and I just wanted to run this by anyone interested in such matters.

Basically, for multitenant purposes we need to introduce the concept of an
'account' in nova, representing a customer,  that basically acts as a label
for a group of resources (instances, etc), and for access control (i.e
customer a cannot mess w/ customer b's stuff)

There was some confusion on how best to implement this, in relation to
nova's project concept.  Projects are kind of like what we want an account
to be, but there are some associations (like one project per network) which
are not valid for our flat networking setup.  I am kind of straw-polling on
which is better here:

The options are:
1) Create a new 'account' concept in nova,  with an account basically being
a subgroup of a project (providers would use a single, default project, with
additional projects added if needed for separate brands, or resellers, etc),
add in access control per account as well as project, and make sure
apis/auth specify account appropriately,  have some way for a default
account to used (per project) so account doesn't get in the way for
non-multitenant users.

2) having account == nova's "project", and changing the network
associations, etc so projects can support our model (as well as current
models).  Support for associating accounts (projects) together for
resellers, etc would either be delegated outside of nova or added later
(it's not a current requirement).

In either case, accounts would be identified by name, which would  be an
opaque string an outside system/person would assign, and could structure to
their needs (ie. for associating accounts with common prefixes, etc)

--

--
    -Monsyne Dragon
    work:         210-312-4190
    mobile        210-441-0965
    google voice: 210-338-0336



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of
the
individual or entity to which this message is addressed, and unless
otherwise
expressly indicated, is confidential and privileged information of
Rackspace.
Any dissemination, distribution or copying of the enclosed material is
prohibited.
If you receive this transmission in error, please notify us immediately by
e-mail
at abuse@xxxxxxxxxxxxx, and delete the original message.
Your cooperation is appreciated.


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



_______________________________________________ Mailing list:
https://launchpad.net/~openstack Post to : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack More help :
https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


--

--
    -Monsyne Dragon
    work:         210-312-4190
    mobile        210-441-0965
    google voice: 210-338-0336



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse@xxxxxxxxxxxxx, and delete the original message.
Your cooperation is appreciated.




Follow ups

References