← Back to team overview

nova team mailing list archive

Re: ORM Refactor


I did some early work on abstracting out the data store early on.  There
were several problems with the Redis implementation:

   - It seemed clear to me that we were effectively re-implementing a
   relational database on top of Redis.  For example, there were secondary
   indexes that needed to be maintained by hand.
   - Several operations that needed to be done atomically were not being
   done atomically, so the code was not technically correct and data integrity
   was suspect.  (I suppose this is a sub-point of the first point)
   - The Redis code was very vulnerable to the 1+N select problem - when
   selecting a group of objects, we would do one select to get the list of IDs,
   and then a further select to get each object by ID.
   - The schema-less nature made me very uncomfortable, I felt that as the
   project grew this would become unsustainable and a huge source of bugs,
   particularly in version migrations.
   - It seemed that reporting against Redis would be difficult.  Some
   unfortunate developer would therefore have to code up reports against Redis,
   instead of just being able to run SQL queries or point something like Excel
   or Crystal Reports at it (http://blog.koehntopp.de/uploads/mapreduce.png)

It seems to me that the only user to have yet deployed Redis in production
(NASA) has decided it's unsuitable; that technically Redis is
not-fit-for-(our)-purpose for the reasons above; that private (enterprise)
clouds will prefer using traditional databases with which they are
comfortable.  So it seems the only potential use case for Redis is public
clouds (Rackspace), for reasons of scalability.

My real hope was that we would be able to have both Redis and SQL
implementations, and we'd show that not only did Redis have all these
problems, but we didn't get anything in return: it would be both slower
(because of 1+N) and less scalable (because of the need to keep all the keys
in memory); we'd then deprecate Redis.  However, we need to stay focused on
Nova and not proving a SQL/NoSQL point - if we know what the outcome will
be, let's just go with the right choice and not expend effort on what is
likely to be a technical dead-end.  If someone wants to write a Redis
back-end so that it can be benchmarked and deprecated, that's great;
otherwise I think we should merge the patch and forget about NoSQL.

If we let Redis get into V1, then we're stuck supporting it, and we'll have
to solve all the above problems.  I would prefer that development effort be
focused on building IaaS, not a relational DB on top of a key-value store.


On Fri, Sep 10, 2010 at 10:11 AM, Rick Clark <rick@xxxxxxxxxxxxx> wrote:

> Thanks, Jay.
> This covers my feelings pretty much as well.  I am concerned as well
> that it is a 180 degree turn 3 weeks before feature freeze. I like the
> abstraction, but I would like us to keep the support for redis.  I think
> SQL is critical for the enterprise and private clouds, but at
> Rackspace's scale, especially with regards to globalization, I think we
> are going to need some kind of keystore.
> My feeling is that we put this in Austin +1 and add support for other
> datastores.  That will also give us time to write up a blueprint and
> have an in depth discussion about it at the summit in November.
> I have added this to the agenda for the next release meeting on Sept 14.
> Rick
> On 09/10/2010 11:56 AM, Jay Pipes wrote:
> > Hi Vish,
> >
> > Such a large patch has taken me quite some time to digest.  There is a
> > larger discussion on large patches without any specifications, but
> > I'll leave that for a later time! :)
> >
> > I am torn on this one, mostly because I spent a bunch of time
> > attempting to do the datastore refactoring myself (as did Justin Santa
> > Barbara), and thus I know the dragons that live in this layer of the
> > code :)
> >
> > One of the things that both Justin and I had tried was to keep an
> > abstraction layer that would allow both NoSQL as well as SQL data
> > stores to be used.  Unfortunately, it seems that this patch removes
> > the ability to use ReDIS, among other NoSQL stores.  I think this is a
> > mistake, and although I like much of the code in this patch, I was
> > hoping that SQLAlchemy could be hidden behind an abstraction layer
> > that would play nicely with the non-relational data stores.
> >
> > As this patch stands, we take a 180 degree turn away from NoSQL data
> > stores and back into the relatively comfortable norms of the SQL
> > databases.  While there's nothing particularly wrong with SQL
> > databases (as you know, I'm a fan of many of them ;) ), I think that
> > keeping non-relational data store capabilities is pretty critical.
> >
> > After an email discussion with SQLAlchemy's Michael Bayer about
> > SQLAlchemy's future with NoSQL data stores.  Although there is an
> > issue in the SQLAlchemy trac system about this (see here:
> > http://www.sqlalchemy.org/trac/ticket/1518) the likelihood of this
> > module seeing the light of day is unlikely in the next year or two.
> >
> > So...what to do?  There are at least four options I can see:
> >
> > 1) Go forward with this patch and add NoSQL stores back at some later
> > time by ourselves
> > 2) Go forward with this patch and wait until SQLAlchemy properly
> > supports key value stores
> > 3) Delay this patch until after the Austin release and have a larger
> > discussion about it here and at the next summit
> > 4) Go back to the drawing board and try again with a less ambitious
> > set of patches that incrementally changes the way the data stores
> > work.
> >
> > I'm personally on the fence.  I'd prefer to at least delay the patch
> > until after Austin, but I understand there are now at least 4 branches
> > that depend on this one, which makes things, well, a bit difficult.
> >
> > -jay
> >
> > On Tue, Aug 31, 2010 at 8:46 PM, Vishvananda Ishaya
> > <vishvananda@xxxxxxxxx> wrote:
> >> I've proposed a merge of the orm refactor branch that a large part of
> the
> >> nasa/anso team has been working on.  I'm hoping everyone can pick it
> apart
> >> and we end up with a really clean system that everyone likes.  I've
> copied
> >> the description of the change and issues below.  If the mailing list
> debates
> >> get too complicated, we should just organize a time to discuss it in
> IRC.
> >>
> >> Proposing merge to get feedback on orm refactoring. I am very interested
> in
> >> feedback to all of these changes.
> >>
> >> This is a huge set of changes, that touches almost all of the files. I'm
> >> sure I have broken quite a bit, but better to take the plunge now than
> to
> >> postpone this until later. The idea is to allow for pluggable backends
> >> throughout the code.
> >>
> >> Brief Overview
> >> For compute/volume/network, there are multiple classes
> >> service - responsible for rpc
> >>   this currently uses the existing cast and call in rpc.py and a little
> bit
> >> of magic
> >>   to call public methods on the manager class.
> >>   each service also reports its state into the database every 10 seconds
> >> manager - responsible for managing respective object classes
> >>   all the business logic for the classes go here
> >> db (db_driver) - responsible for abstracting database access
> >> driver (domain_driver) - responsible for executing actual shell commands
> and
> >> implementation
> >>
> >> Compute hasn't been fully cleaned up, but to get an idea of how it
> works,
> >> take a look
> >> at volume and network
> >>
> >> Known issues/Things to be done:
> >>
> >> * nova-api accesses db objects directly
> >>   It seems cleaner to have only the managers dealing with their
> respective
> >> objects. This would
> >>   mean code for 'run_instances' would move into the manager class and it
> >> would do the initial
> >>   setup and cast out to the remote service
> >>
> >> * db code uses flat methods to define its interface
> >>   In my mind this is a little prettier as an abstract base class, but
> driver
> >> loading code
> >>   can load a module or a class. It works, so I'm not sure it needs to be
> >> changed but feel
> >>   free to debate it.
> >>
> >> * Service classes have no code in them
> >>   Not sure if this is a problem for people, but the magic of calling the
> >> manager's methods is
> >>   done in the base class. We could remove the magic from the base class
> and
> >> explicitly
> >>   wrap methods that we want to make available via rpc if this seems
> nasty.
> >>
> >> * AuthManager Projects/Users/Roles are not integrated into this system.
> >>   In order for everything to live happily in the backend, we need some
> type
> >>   of adaptor for LDAP
> >>
> >> * Context is not passed properly across rabbit
> >>   Context should probably be changed to a simple dictionary so that it
> can
> >> be
> >>   passed properly through the queue
> >>
> >> * No authorization checks on access to objects
> >>   We need to decide on which layer auth checks should happen.
> >>
> >> * Some of the methods in ComputeManager need to be moved into other
> >> layers/managers
> >> * Compute driver layer should be abstracted more cleanly
> >> * Flat networking is untested and may need to be reworked
> >> * Some of the api commands are not working yet
> >> * Nova Swift Authentication needs to be refactored(Todd is working on
> this)
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~nova
> >> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~nova
> >> More help   : https://help.launchpad.net/ListHelp
> >>
> >>
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~nova
> > Post to     : nova@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~nova
> > More help   : https://help.launchpad.net/ListHelp
> _______________________________________________
> Mailing list: https://launchpad.net/~nova
> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~nova
> More help   : https://help.launchpad.net/ListHelp

Follow ups