← Back to team overview

drizzle-discuss team mailing list archive

Re: File Deletion issues for Storage Engines

 

Jobin,

You are 100% correct in your views below. What we are working towards is completely removing the notion that a table or a schema in the system is related to a file. The interface we've been discussing is partly done and partly futuristic, to use your words. The end goal is to remove the file-to-object connection that currently exists in parts of the API.

In fact, if you look at the Cursor and plugin::StorageEngine classes, you can see comments from Brian, Stewart and myself saying just that :)

-jay

Jobin Augustine wrote:
Hi Brian,
This is a concern of a layman ..DBA..Kernel still knows there are files and file extensions.
as you said:
>>We can't tear out the underbelly of the code before we have proper tests though. shall i understand that proper tests are the missing point and kernel won't care about files in future?
and the new storage APIs under discussion is futuristic?

Let me explain why i am concerned. let us assume that in future sombody is feedup with filesystem overhead and decided to write a storage plugin which writes directly to raw partition/storage LUG. there won't be a file or filesystem. if kernel expects file, storage need to mimic that there are files.. pseudo files..(A big lie..). same will be true in future cloud storages as well..there won't be a file.. but just a service..

I saw Oracle hitting its head against wall when they decided to come up with ASM. Oracle's core had the assumption that "*File are the only way to store data*".. (please recollect controlfile entries ...v$datafile) since ASM sits over a set of raw partitions.. only option is to tell the lie to database.."there are files.." then started lies over lies...like "files are there but you can't copy to filesystem" to correct, this they kept adding functionality and finally in 11g, ASM is yet another filesystem..it got its own deamon (which does not make any sence otherwise for a single instance database) running to confuse the core.
for marketing they had to brainwash DBAs.. (including me)

Its my humble concern..

Thank you,
Jobin.

On Wed, Dec 16, 2009 at 4:19 AM, Brian Aker <brian@xxxxxxxxxxx <mailto:brian@xxxxxxxxxxx>> wrote:

    Hi!

    On Dec 13, 2009, at 4:08 AM, Toru Maesaka wrote:

     > I'm not entirely confident but in the current storage API, there is,
     > StrorageEngine::doGetTableNames() where you must provide/set the name
     > of tables to the provided reference to a std::set<string> object.

    You are starting to see how the system will work for dropDatabase() :)

    I wrote some of the code for this a while ago. MySQL would sometimes
    orphan files from shutdown that needed to be cleaned up. Its method
    is to just do an unlink on them. The problem is that Innodb, or
    really any engine that kept state, would then loose the tables in it
    system.

    For Drizzle we look at each file, find the owner and call delete
    table (or for the orphaned files we will delete). The idea has been
    to morph this code to the following:

    1) Do a CachedLookup for any registered .dfe file (aka... all
    engines register their extension if they have one). This way we
    don't do multiple nested loop over the directory.

    2) We start the database drop.

    2) We call dropTable() (notice... not doDropTable()... we want to go
    through the entire process).

    3) We then call finish drop Scehma() for anyone who needs to do any
    final cleanup.

    4) We call drop finishDropSchema().

    So process looks like this:

    dropSchema {

    getTableNames() <-- sort by engine

    doStartDropSchema() {}

    while (...)
    {
    dropSchema()
    }

    doFinishDropSchema() {}

    }

    The above allows for transactional DDL to occur during the drop for
    a particular engine (possibly all, but we won't go that far just
    yet). The interface call dropSchema() is one for the system, and not
    for a particular engine. The do..() are all private/(protected?)...
    aka no one outside of the path can call them. Engine can ignore the
    begin to a drop schema if they want, or they can whatever they
    need... like say batch the entire operation.

    Where the getTableNames() is, we will eventually add a call after
    that to clean up any views. AKA... walk the dependency and apply.
    Trigger as discussed previously are owned by engines, so we don't
    need to do anything there.

    Cheers,
      -Brian



    _______________________________________________
    Mailing list: https://launchpad.net/~drizzle-discuss
    <https://launchpad.net/%7Edrizzle-discuss>
    Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
    <mailto:drizzle-discuss@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe : https://launchpad.net/~drizzle-discuss
    <https://launchpad.net/%7Edrizzle-discuss>
    More help   : https://help.launchpad.net/ListHelp



------------------------------------------------------------------------

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp



Follow ups

References