← Back to team overview

drizzle-discuss team mailing list archive

Re: File Deletion issues for Storage Engines

 

On Dec 16, 2009, at 5:15 PM, Jay Pipes wrote:

Jobin,

You are 100% correct in your views below. What we are working towards is completely removing the notion that a table or a schema in the system is related to a file. The interface we've been discussing is partly done and partly futuristic, to use your words. The end goal is to remove the file-to-object connection that currently exists in parts of the API.

Yes, exactly. And I guess this means asking the engine for a list of tables in a schema, and the kernel should no longer scan directories to find tables.

Basically only the engines will know how and where the data is stored.

In fact, if you look at the Cursor and plugin::StorageEngine classes, you can see comments from Brian, Stewart and myself saying just that :)

-jay

Jobin Augustine wrote:
Hi Brian,
This is a concern of a layman ..DBA..Kernel still knows there are files and file extensions.
as you said:
>>We can't tear out the underbelly of the code before we have proper tests though. shall i understand that proper tests are the missing point and kernel won't care about files in future?
and the new storage APIs under discussion is futuristic?
Let me explain why i am concerned. let us assume that in future sombody is feedup with filesystem overhead and decided to write a storage plugin which writes directly to raw partition/ storage LUG. there won't be a file or filesystem. if kernel expects file, storage need to mimic that there are files.. pseudo files..(A big lie..). same will be true in future cloud storages as well..there won't be a file.. but just a service.. I saw Oracle hitting its head against wall when they decided to come up with ASM. Oracle's core had the assumption that "*File are the only way to store data*".. (please recollect controlfile entries ...v$datafile) since ASM sits over a set of raw partitions.. only option is to tell the lie to database.."there are files.." then started lies over lies...like "files are there but you can't copy to filesystem" to correct, this they kept adding functionality and finally in 11g, ASM is yet another filesystem..it got its own deamon (which does not make any sence otherwise for a single instance database) running to confuse the core.
for marketing they had to brainwash DBAs.. (including me)
Its my humble concern..
Thank you,
Jobin.
On Wed, Dec 16, 2009 at 4:19 AM, Brian Aker <brian@xxxxxxxxxxx <mailto:brian@xxxxxxxxxxx >> wrote:
   Hi!
   On Dec 13, 2009, at 4:08 AM, Toru Maesaka wrote:
> I'm not entirely confident but in the current storage API, there is, > StrorageEngine::doGetTableNames() where you must provide/set the name > of tables to the provided reference to a std::set<string> object. You are starting to see how the system will work for dropDatabase() :) I wrote some of the code for this a while ago. MySQL would sometimes orphan files from shutdown that needed to be cleaned up. Its method
   is to just do an unlink on them. The problem is that Innodb, or
really any engine that kept state, would then loose the tables in it
   system.
   For Drizzle we look at each file, find the owner and call delete
table (or for the orphaned files we will delete). The idea has been
   to morph this code to the following:
   1) Do a CachedLookup for any registered .dfe file (aka... all
   engines register their extension if they have one). This way we
   don't do multiple nested loop over the directory.
   2) We start the database drop.
2) We call dropTable() (notice... not doDropTable()... we want to go
   through the entire process).
3) We then call finish drop Scehma() for anyone who needs to do any
   final cleanup.
   4) We call drop finishDropSchema().
   So process looks like this:
   dropSchema {
   getTableNames() <-- sort by engine
   doStartDropSchema() {}
   while (...)
   {
   dropSchema()
   }
   doFinishDropSchema() {}
   }
The above allows for transactional DDL to occur during the drop for
   a particular engine (possibly all, but we won't go that far just
yet). The interface call dropSchema() is one for the system, and not for a particular engine. The do..() are all private/ (protected?)... aka no one outside of the path can call them. Engine can ignore the
   begin to a drop schema if they want, or they can whatever they
   need... like say batch the entire operation.
   Where the getTableNames() is, we will eventually add a call after
   that to clean up any views. AKA... walk the dependency and apply.
   Trigger as discussed previously are owned by engines, so we don't
   need to do anything there.
   Cheers,
     -Brian
   _______________________________________________
   Mailing list: https://launchpad.net/~drizzle-discuss
   <https://launchpad.net/%7Edrizzle-discuss>
   Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
   <mailto:drizzle-discuss@xxxxxxxxxxxxxxxxxxx>
   Unsubscribe : https://launchpad.net/~drizzle-discuss
   <https://launchpad.net/%7Edrizzle-discuss>
   More help   : https://help.launchpad.net/ListHelp
------------------------------------------------------------------------
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






Follow ups

References