nova team mailing list archive

Thread
Date
Re: ORM Refactor

To: Jesse Andrews <anotherjesse@xxxxxxxxx>
From: Eric Day <eday@xxxxxxxxxxxx>
Date: Fri, 10 Sep 2010 11:06:45 -0700
Cc: nova@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTikv4pR8S5aGgSZzLYsHRV-jL0jCQaru9gUZbia-@mail.gmail.com>
User-agent: Mutt/1.5.20 (2009-06-14)
I'd like to say that dropping Redis support is fine with me for
now. It's has it's own issues for this type of application, and I
didn't really see it as a long-term permanent data store for Nova. I'm
fine with the SQLAlchemy switch, and we can always plug in new KV
stores if SQL doesn't work out. Even for large cloud installations,
we're still dealing with a modest amount of data, something that is
pretty easy to handle with most SQL databases.

We've also talked about distributing the data into many small stores
and providing aggregate layers for this to handle both security and
scalability concerns, which is another reason to drop Redis. This
model would probably be best implemented as a number of small SQLite
databases on each compute/network/... worker that push relevant
information up to the scheduler and API layers. We're closer to
getting to this model with the ORM/SQLAlchemy branch.

I agree with merging the ORM branch soon to avoid merge conflicts. We
can still iterate and clean things up as needed, but I think this is
a step in the right direction.

-Eric

On Fri, Sep 10, 2010 at 10:57:45AM -0700, Jesse Andrews wrote:
> To be clear, we aren't giving up on redis or nosql.
> 
> Both the intermediate layer for the data api & the choice of not using
> the sqlalchemy models outside of data api help contain the "battle"
> between sql/nosql to the data access layer.
> 
> The the objects passed back from the data api are treated
> dictionaries, not sqlachemy objects.
> 
> Jesse
> 
> On Fri, Sep 10, 2010 at 10:51 AM, Justin Santa Barbara
> <justin@xxxxxxxxxxxx> wrote:
> > I did some early work on abstracting out the data store early on.  There
> > were several problems with the Redis implementation:
> >
> > It seemed clear to me that we were effectively re-implementing a relational
> > database on top of Redis.  For example, there were secondary indexes that
> > needed to be maintained by hand.
> > Several operations that needed to be done atomically were not being done
> > atomically, so the code was not technically correct and data integrity was
> > suspect.  (I suppose this is a sub-point of the first point)
> > The Redis code was very vulnerable to the 1+N select problem - when
> > selecting a group of objects, we would do one select to get the list of IDs,
> > and then a further select to get each object by ID.
> > The schema-less nature made me very uncomfortable, I felt that as the
> > project grew this would become unsustainable and a huge source of bugs,
> > particularly in version migrations.
> > It seemed that reporting against Redis would be difficult.  Some unfortunate
> > developer would therefore have to code up reports against Redis, instead of
> > just being able to run SQL queries or point something like Excel or Crystal
> > Reports at it (http://blog.koehntopp.de/uploads/mapreduce.png)
> >
> > It seems to me that the only user to have yet deployed Redis in production
> > (NASA) has decided it's unsuitable; that technically Redis is
> > not-fit-for-(our)-purpose for the reasons above; that private (enterprise)
> > clouds will prefer using traditional databases with which they are
> > comfortable.  So it seems the only potential use case for Redis is public
> > clouds (Rackspace), for reasons of scalability.
> > My real hope was that we would be able to have both Redis and SQL
> > implementations, and we'd show that not only did Redis have all these
> > problems, but we didn't get anything in return: it would be both slower
> > (because of 1+N) and less scalable (because of the need to keep all the keys
> > in memory); we'd then deprecate Redis.  However, we need to stay focused on
> > Nova and not proving a SQL/NoSQL point - if we know what the outcome will
> > be, let's just go with the right choice and not expend effort on what is
> > likely to be a technical dead-end.  If someone wants to write a Redis
> > back-end so that it can be benchmarked and deprecated, that's great;
> > otherwise I think we should merge the patch and forget about NoSQL.
> > If we let Redis get into V1, then we're stuck supporting it, and we'll have
> > to solve all the above problems.  I would prefer that development effort be
> > focused on building IaaS, not a relational DB on top of a key-value store.
> > Justin
> >
> >
> >
> > On Fri, Sep 10, 2010 at 10:11 AM, Rick Clark <rick@xxxxxxxxxxxxx> wrote:
> >>
> >> Thanks, Jay.
> >>
> >> This covers my feelings pretty much as well.  I am concerned as well
> >> that it is a 180 degree turn 3 weeks before feature freeze. I like the
> >> abstraction, but I would like us to keep the support for redis.  I think
> >> SQL is critical for the enterprise and private clouds, but at
> >> Rackspace's scale, especially with regards to globalization, I think we
> >> are going to need some kind of keystore.
> >>
> >> My feeling is that we put this in Austin +1 and add support for other
> >> datastores.  That will also give us time to write up a blueprint and
> >> have an in depth discussion about it at the summit in November.
> >>
> >> I have added this to the agenda for the next release meeting on Sept 14.
> >>
> >> Rick
> >>
> >> On 09/10/2010 11:56 AM, Jay Pipes wrote:
> >> > Hi Vish,
> >> >
> >> > Such a large patch has taken me quite some time to digest.  There is a
> >> > larger discussion on large patches without any specifications, but
> >> > I'll leave that for a later time! :)
> >> >
> >> > I am torn on this one, mostly because I spent a bunch of time
> >> > attempting to do the datastore refactoring myself (as did Justin Santa
> >> > Barbara), and thus I know the dragons that live in this layer of the
> >> > code :)
> >> >
> >> > One of the things that both Justin and I had tried was to keep an
> >> > abstraction layer that would allow both NoSQL as well as SQL data
> >> > stores to be used.  Unfortunately, it seems that this patch removes
> >> > the ability to use ReDIS, among other NoSQL stores.  I think this is a
> >> > mistake, and although I like much of the code in this patch, I was
> >> > hoping that SQLAlchemy could be hidden behind an abstraction layer
> >> > that would play nicely with the non-relational data stores.
> >> >
> >> > As this patch stands, we take a 180 degree turn away from NoSQL data
> >> > stores and back into the relatively comfortable norms of the SQL
> >> > databases.  While there's nothing particularly wrong with SQL
> >> > databases (as you know, I'm a fan of many of them ;) ), I think that
> >> > keeping non-relational data store capabilities is pretty critical.
> >> >
> >> > After an email discussion with SQLAlchemy's Michael Bayer about
> >> > SQLAlchemy's future with NoSQL data stores.  Although there is an
> >> > issue in the SQLAlchemy trac system about this (see here:
> >> > http://www.sqlalchemy.org/trac/ticket/1518) the likelihood of this
> >> > module seeing the light of day is unlikely in the next year or two.
> >> >
> >> > So...what to do?  There are at least four options I can see:
> >> >
> >> > 1) Go forward with this patch and add NoSQL stores back at some later
> >> > time by ourselves
> >> > 2) Go forward with this patch and wait until SQLAlchemy properly
> >> > supports key value stores
> >> > 3) Delay this patch until after the Austin release and have a larger
> >> > discussion about it here and at the next summit
> >> > 4) Go back to the drawing board and try again with a less ambitious
> >> > set of patches that incrementally changes the way the data stores
> >> > work.
> >> >
> >> > I'm personally on the fence.  I'd prefer to at least delay the patch
> >> > until after Austin, but I understand there are now at least 4 branches
> >> > that depend on this one, which makes things, well, a bit difficult.
> >> >
> >> > -jay
> >> >
> >> > On Tue, Aug 31, 2010 at 8:46 PM, Vishvananda Ishaya
> >> > <vishvananda@xxxxxxxxx> wrote:
> >> >> I've proposed a merge of the orm refactor branch that a large part of
> >> >> the
> >> >> nasa/anso team has been working on.  I'm hoping everyone can pick it
> >> >> apart
> >> >> and we end up with a really clean system that everyone likes.  I've
> >> >> copied
> >> >> the description of the change and issues below.  If the mailing list
> >> >> debates
> >> >> get too complicated, we should just organize a time to discuss it in
> >> >> IRC.
> >> >>
> >> >> Proposing merge to get feedback on orm refactoring. I am very
> >> >> interested in
> >> >> feedback to all of these changes.
> >> >>
> >> >> This is a huge set of changes, that touches almost all of the files.
> >> >> I'm
> >> >> sure I have broken quite a bit, but better to take the plunge now than
> >> >> to
> >> >> postpone this until later. The idea is to allow for pluggable backends
> >> >> throughout the code.
> >> >>
> >> >> Brief Overview
> >> >> For compute/volume/network, there are multiple classes
> >> >> service - responsible for rpc
> >> >>   this currently uses the existing cast and call in rpc.py and a little
> >> >> bit
> >> >> of magic
> >> >>   to call public methods on the manager class.
> >> >>   each service also reports its state into the database every 10
> >> >> seconds
> >> >> manager - responsible for managing respective object classes
> >> >>   all the business logic for the classes go here
> >> >> db (db_driver) - responsible for abstracting database access
> >> >> driver (domain_driver) - responsible for executing actual shell
> >> >> commands and
> >> >> implementation
> >> >>
> >> >> Compute hasn't been fully cleaned up, but to get an idea of how it
> >> >> works,
> >> >> take a look
> >> >> at volume and network
> >> >>
> >> >> Known issues/Things to be done:
> >> >>
> >> >> * nova-api accesses db objects directly
> >> >>   It seems cleaner to have only the managers dealing with their
> >> >> respective
> >> >> objects. This would
> >> >>   mean code for 'run_instances' would move into the manager class and
> >> >> it
> >> >> would do the initial
> >> >>   setup and cast out to the remote service
> >> >>
> >> >> * db code uses flat methods to define its interface
> >> >>   In my mind this is a little prettier as an abstract base class, but
> >> >> driver
> >> >> loading code
> >> >>   can load a module or a class. It works, so I'm not sure it needs to
> >> >> be
> >> >> changed but feel
> >> >>   free to debate it.
> >> >>
> >> >> * Service classes have no code in them
> >> >>   Not sure if this is a problem for people, but the magic of calling
> >> >> the
> >> >> manager's methods is
> >> >>   done in the base class. We could remove the magic from the base class
> >> >> and
> >> >> explicitly
> >> >>   wrap methods that we want to make available via rpc if this seems
> >> >> nasty.
> >> >>
> >> >> * AuthManager Projects/Users/Roles are not integrated into this system.
> >> >>   In order for everything to live happily in the backend, we need some
> >> >> type
> >> >>   of adaptor for LDAP
> >> >>
> >> >> * Context is not passed properly across rabbit
> >> >>   Context should probably be changed to a simple dictionary so that it
> >> >> can
> >> >> be
> >> >>   passed properly through the queue
> >> >>
> >> >> * No authorization checks on access to objects
> >> >>   We need to decide on which layer auth checks should happen.
> >> >>
> >> >> * Some of the methods in ComputeManager need to be moved into other
> >> >> layers/managers
> >> >> * Compute driver layer should be abstracted more cleanly
> >> >> * Flat networking is untested and may need to be reworked
> >> >> * Some of the api commands are not working yet
> >> >> * Nova Swift Authentication needs to be refactored(Todd is working on
> >> >> this)
> >> >>
> >> >> _______________________________________________
> >> >> Mailing list: https://launchpad.net/~nova
> >> >> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> >> >> Unsubscribe : https://launchpad.net/~nova
> >> >> More help   : https://help.launchpad.net/ListHelp
> >> >>
> >> >>
> >> >
> >> > _______________________________________________
> >> > Mailing list: https://launchpad.net/~nova
> >> > Post to     : nova@xxxxxxxxxxxxxxxxxxx
> >> > Unsubscribe : https://launchpad.net/~nova
> >> > More help   : https://help.launchpad.net/ListHelp
> >>
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~nova
> >> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~nova
> >> More help   : https://help.launchpad.net/ListHelp
> >>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~nova
> > Post to     : nova@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~nova
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~nova
> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~nova
> More help   : https://help.launchpad.net/ListHelp
References

ORM Refactor
From: Vishvananda Ishaya, 2010-09-01
Re: ORM Refactor
From: Jay Pipes, 2010-09-10
Re: ORM Refactor
From: Rick Clark, 2010-09-10
Re: ORM Refactor
From: Justin Santa Barbara, 2010-09-10
Re: ORM Refactor
From: Jesse Andrews, 2010-09-10