← Back to team overview

drizzle-discuss team mailing list archive

Re: Improving the Engine API

 

Hi!

On Dec 8, 2009, at 2:24 AM, Paul McCullagh wrote:

> OK, I think we agree on this, but to sum up, here are the best arguments for providing both methods (create/copy/rename/drop method & the engine handles the entire operation itself):
> 
> - The create/copy/rename/drop method must be used to alter the table engine.
> - It saves engine developers time because they only have to implement the basic operations to support ALTER TABLE.
> - Engines can concentrate on optimizing certain operations (e.g. add index) without having to implement the entire ALTER TABLE.
> 
> No matter what method the engine chooses. The server should still just replicate the GPB statement data (as you suggest in https://lists.launchpad.net/drizzle-discuss/msg05305.html).
> 
> This basically means that the statement is replicated, and not the operations.

Right, which is the case today.

> 
>> You would have the Statement type as listed in the transaction.proto:
>> 
>> message Statement
>> {
>> enum Type
>> {
>>   ROLLBACK = 0; /* A ROLLBACK indicator */
>>   INSERT = 1; /* An INSERT statement */
>>   DELETE = 2; /* A DELETE statement */
>>   UPDATE = 3; /* An UPDATE statement */
>>   TRUNCATE_TABLE = 4; /* A TRUNCATE TABLE statement */
>>   CREATE_SCHEMA = 5; /* A CREATE SCHEMA statement */
>>   ALTER_SCHEMA = 6; /* An ALTER SCHEMA statement */
>>   DROP_SCHEMA = 7; /* A DROP SCHEMA statement */
>>   CREATE_TABLE = 8; /* A CREATE TABLE statement */
>>   ALTER_TABLE = 9; /* An ALTER TABLE statement */
>>   DROP_TABLE = 10; /* A DROP TABLE statement */
>>   SET_VARIABLE = 98; /* A SET statement */
>>   RAW_SQL = 99; /* A raw SQL statement */
>> }
>> ...
>> }
>> 
> 
> There is no SELECT in the list, but maybe this is correct. I am just thinking allowed...
> 
> We have 3 types of statements:
> DML update statements:    INSERT, UPDATE, DELETE
> DML read-only statements: SELECT.
> DDL statements:           CREATE TABLE, ALTER TABLE, etc.
> 

This right here is the heart of the differences. STATEMENTs are not really what you want. You want to know the sort of action, aka  read/write/reformat... 

	-Brian



> And assuming we have 2 sets of calls:
> - beginTransaction, commitTransaction/rollbackTransaction
> - startStatement, endStatement
> 
> We could say, all types of statements require a beginTransaction() and a startStatement() (and the corresponding endStatement() and commitTransaction/rollbackTransaction()).
> 
> But I don't think this is absolutely correct:
> 
> * DML update statements require both beginTransaction() and a startStatement().
> * DML read-only statements only require a beginTransaction() call because a SELECT does not need a statement level transaction (because they cannot be rolled back).
> * And DDL statements only require a startStatement() because it is up to the engine to decide if this can be done within a transaction or not.
> 
> For example if beginTransaction() is called before startStatement() then engines that do not handle DDL in transactions should return an error. In addition, if a engine does atomic DDL, then it can use the startStatement() to begin a transaction.
> 
> With these calls the engine will have most of the information it needs.
> 
> There is some additional information which should be provided when a cursor is used:
> 
> For example, PBXT needs to know:
> 
> - which columns will be accessed (an optimization so that not all need to be loaded),
> - whether rows retrieved will be updated or deleted,
> - if the rows need to be locked (as in SELECT FOR UPDATE).
> 
>> Toru, what's your opinion?
>> 
>> -jay
>> 
>>> And this is how the engine would handle "ADD INDEX", or "ENCRYPT TABLE":
>>> startStatement("ENCRYPT TABLE", "t1") --> return: use custom method
>>> doTableOperation("ENCRYPT TABLE", "t1")
>>> endStatement()
>>> The engine can write table operations to its transaction log, and in this way it could ensure that the entire ALTER TABLE statement is atomic.
>>> On Dec 7, 2009, at 4:10 PM, Jay Pipes wrote:
>>>> Paul McCullagh wrote:
>>>>> Hi Toru,
>>>>> On Dec 7, 2009, at 3:31 AM, Toru Maesaka wrote:
>>>>>> Great to hear another use-case where knowing a statement type in
>>>>>> advance is useful :)
>>>>> Yes, generally I need to know the following:
>>>>> - If I have a update type statement (i.e. whether the statement modifies rows).
>>>>> - Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).
>>>> 
>>>> But, Paul, doesn't this depend on the engine itself?  I mean, some
>>>> engines can do (some types of) ALTER TABLE without taking a table lock.
>>>> So, is this request really for whether the kernel thinks a table-level
>>>> lock is necessary, or is it really just for a descriptor of the
>>>> statement type?
>>>> 
>>>> And, if it really does just boil down to the statement type, then how do
>>>> we deal with the reality that Brian speaks about -- that statement type
>>>> will be pluggable, and how do we deal with future statement types for
>>>> pluggable engines?
>>>> 
>>>> Is a reasonable solution to pass to engines a sort of "statement
>>>> traits"?  So, instead of passing ALTER_TABLE, CREATE_TABLE, UPDATE,
>>>> DELETE, etc, we instead pass a std::bitset<> (or uint64_t for C folks)
>>>> containing traits of the statement such as:
>>>> 
>>>> MODIFIES_DATA
>>>> MODIFIES_DEFINITION
>>>> etc, etc
>>>> 
>>>> And then to deal with transaction locking concerns, just add a method to Cursor:
>>>> 
>>>> void Cursor::setTransactionIsolationLevel(enum enum_tx_isolation);
>>>> 
>>>> Cheers!
>>>> 
>>>> Jay
>>>> 
>>>>> - If we have a SELECT FOR UPDATE.
>>>>>>> I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch yet :)
>>>>>> 
>>>>>> I've taken notes from our discussion the other day. I'm planning on
>>>>>> working on it when I finish testing through my current progress of
>>>>>> BlitzDB.
>>>>> Great! :)
>>>>>> For now, I'm happy with Jay's advise of using
>>>>>> current_session().
>>>>>> 
>>>>>> Cheers,
>>>>>> Toru
>>>>>> 
>>>>>> On Sat, Dec 5, 2009 at 5:59 AM, Brian Aker <brian@xxxxxxxxxxx> wrote:
>>>>>>> Hi!
>>>>>>> 
>>>>>>> On Dec 4, 2009, at 3:12 AM, Paul McCullagh wrote:
>>>>>>> 
>>>>>>>> If we have a startStatement() call, then it could be used in place of beginAlter(), assuming we can determine the statement type, and the tables involved.
>>>>>>> 
>>>>>>> The problem with relying on statement type is that at some point statement type will be pluggable... which means you would constantly need to update your engine for new statements.
>>>>>>> 
>>>>>>> Yuck!
>>>>>>> 
>>>>>>> I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch yet :)
>>>>>>> 
>>>>>>>> 
>>>>>>>> Then, when a handle is returned to the pool it is deleted, instead of adding it back to the pool.
>>>>>>> 
>>>>>>> BTW very soon engines will own their Cursor objects and will be free to reuse them.
>>>>>>> 
>>>>>>>> The locking thread waits until all handles are returned and deleted before it can proceed. The lock on the pool then prevents a new table handle from being created while the locking thread is busy.
>>>>>>>> Either way, it would be good if Drizzle closes all handlers/cursors before a table is deleted or renamed.
>>>>>>> 
>>>>>>> I would say that long term this will be optional, based on what the engine requires.
>>>>>>> 
>>>>>>>> OK, this make things a lot simpler! Indeed, if we don't need to support LOCK TABLE then external_lock() can be removed altogether.
>>>>>>> 
>>>>>>> Tried removing the external_lock() right now and seeing if any issues pop up?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>>     -Brian
>>>>> -- 
>>>>> Paul McCullagh
>>>>> PrimeBase Technologies
>>>>> www.primebase.org
>>>>> www.blobstreaming.org
>>>>> pbxt.blogspot.com
>>>> 
>>>> 
>>> -- 
>>> Paul McCullagh
>>> PrimeBase Technologies
>>> www.primebase.org
>>> www.blobstreaming.org
>>> pbxt.blogspot.com
>> 
> 
> 
> 
> --
> Paul McCullagh
> PrimeBase Technologies
> www.primebase.org
> www.blobstreaming.org
> pbxt.blogspot.com
> 
> 
> 




Follow ups

References