← Back to team overview

drizzle-discuss team mailing list archive

Re: Improving the Engine API

 

Lots of comments and code... please read through! :)

Paul McCullagh wrote:
Hi Jay,

On Dec 7, 2009, at 8:54 PM, Jay Pipes wrote:

Paul McCullagh wrote:
Hi Jay,
Yes, how ALTER TABLE, TRUNCATE, CHECK, etc is handled does depend (almost entirely) on the engine. I actually think that, for the moment anyway, the statement type will be sufficient. Maybe I have misunderstood what you mean by pluggable statements, but lets assume some engine adds a custom statement: "ENCRYPT TABLE". My engine will not know what type of lock to acquire for this statement, but on the other hand, it will also not know how to execute the statement. So it must reject the statement outright. Which means an engine needs a list of statements it supports, and everything else must be rejected. So the engine needs to know the statement type. I think it is important that we distinguish between statements and operations. In general, MySQL implements the ALTER TABLE using 3 operations: create-table, rename-table and delete-table.

Actually, it uses at least 4 :)

create-new-table-definition
copy-table-data
rename-table
delete-old-table

But...the above *operations* is only how *MySQL/Drizzle kernel* does an ALTER TABLE. This is *not* how InnoDB does an ALTER TABLE. It actually constructs a stored procedure (InnoDB internal stored procedure, not a MySQL stored procedure) and executes that stored procedure...

Hmmm, interesting.

OK, I think we agree on this, but to sum up, here are the best arguments for providing both methods (create/copy/rename/drop method & the engine handles the entire operation itself):

- The create/copy/rename/drop method must be used to alter the table engine. - It saves engine developers time because they only have to implement the basic operations to support ALTER TABLE. - Engines can concentrate on optimizing certain operations (e.g. add index) without having to implement the entire ALTER TABLE.

Yes, no arguments from me! :)

No matter what method the engine chooses. The server should still just replicate the GPB statement data (as you suggest in https://lists.launchpad.net/drizzle-discuss/msg05305.html).

This basically means that the statement is replicated, and not the operations.

Hmmm, kind of. Right now, replication is row-based with regards to DML which modifies row data (INSERT, UPDATE, DELETE). This may actually be better for you, though, as you would have key information about the statement available in a GPB message -- and you wouldn't have to grok the Lex...

For lots of examples of querying the Statement and Transaction GPB messages, see the statement_transform.cc file, where I've included lots of comments and (hopefully) clear code demonstrating working with all the message classes:

http://bazaar.launchpad.net/~drizzle-developers/drizzle/development/annotate/head%3A/drizzled/message/statement_transform.cc

The cool thing is: if my engine can execute these 3 operations, then it can perform all ALTER TABLE statements, although the implementation is not optimal.

Right, because in that case the kernel is deciding what the *operations* should be instead of the engine, which knows its own shortcuts. This is why an earlier email of mine suggested that for ALTER TABLE, instead of constructing a GPB message of operations/instructions to pass to the engine, we instead just pass a before and an after image of the table definition and leave it up to the engine to do its thing:

https://lists.launchpad.net/drizzle-discuss/msg05305.html

Yes, excellent, it make great sense to use exactly the same GPB that is prepared for the replication!

Yep.  One source of message definitions makes lots of sense to me. :)

What if there was the following API? Would this API provide your engine with all the information that you would need?

Assume message::Statement is the same/similar to the message::Statement defined in /drizzled/message/transaction.proto:

http://bazaar.launchpad.net/~drizzle-developers/drizzle/development/annotate/head%3A/drizzled/message/transaction.proto

class plugin::StorageEngine ...
{
public:
/**
* returned from startStatement().  This tells the Drizzle
* kernel what the engine's intent is for executing this particular
* statement or type of statement
*/
enum StatementExecutionIntent
{
 DEFAULT= 0,  /* "Normal" or default execution.  Defer to the kernel...
 INTERNAL= 1, /* Engine will handle all locking and execution */
 UNKNOWN_STATEMENT= 99 /* Engine has no clue what the statement is */
};

/**
* Indicate that the Session is starting a new SQL Statement, and
* provide the engine with information about the statement.  This
* call would come shortly after the parser identifies the type of
* statement to be executed.
*
* @param[in] The Session which is executing this statement
* @param[in] GPB Message of type message::Statement
*/
enum StatementExecutionIntent
StorageEngine::startStatement(Session &session,
                             message::Statement &statement);

Yes.

One thing to note here is that if 2 (or more) engines are involved in the statement, then all get the startStatement() call.

So when ALTER TABLE is used to change the engine, both engines get the startStatement() call. Of course, both engines should recognize the fact that they cannot handle the statement internally (because before and after images have different engines) and return DEFAULT.

Yep, ++

/**
* Let the engine execute the statement itself.
*
* @note
*
* Instead of returning a simple bool, we could have a StatementResult
* enum or something like that...
*/
bool passthruStatement(Session &session,
                      message::Statement &statement);

// in drizzle_parse(), a message::Statement would be constructed
// which contains something like this:

message::Statement *statement= new message::Statement();
statement->set_type(message::Statement::ALTER_TABLE);

message::AlterTableStatement *alter=
 statement->mutable_alter_table_statement();

message::Table *before_table= alter->mutable_before();
message::Table *after_table= alter->mutable_after();

// parser then fills in the before and after image of the
// message::Table definition...
//
// At this point, we notify the engine that we wish to start
// a new SQL statement for the ALTER TABLE t1 ...

plugin::StorageEngine *engine= getStorageEngineForTable("t1");
Session &session= getCurrentSession();

enum StorageEngine::StatementExecutionIntent intent=
 engine->startStatement(session, *statement);

// If the engine is happy to let Drizzle deal with the statement,
// follow a "normal" kernel path for this statement...
//
// If the engine wants to handle this statement, let it...
//
if (likely(intent == StorageEngine::DEFAULT))
{
 // do the "normal", unoptimized set of operations
}
elseif (intent == StorageEngine::INTERNAL)
{
 // Let the engine do it internally
 if (engine->passthruStatement(session, *statement) == false)
 {
    // Ask engine for errors to report to the user...
 }
}

Does the above look workable to you?

Yes, absolutely. That would work for me :)

You would have the Statement type as listed in the transaction.proto:

message Statement
{
 enum Type
 {
   ROLLBACK = 0; /* A ROLLBACK indicator */
   INSERT = 1; /* An INSERT statement */
   DELETE = 2; /* A DELETE statement */
   UPDATE = 3; /* An UPDATE statement */
   TRUNCATE_TABLE = 4; /* A TRUNCATE TABLE statement */
   CREATE_SCHEMA = 5; /* A CREATE SCHEMA statement */
   ALTER_SCHEMA = 6; /* An ALTER SCHEMA statement */
   DROP_SCHEMA = 7; /* A DROP SCHEMA statement */
   CREATE_TABLE = 8; /* A CREATE TABLE statement */
   ALTER_TABLE = 9; /* An ALTER TABLE statement */
   DROP_TABLE = 10; /* A DROP TABLE statement */
   SET_VARIABLE = 98; /* A SET statement */
   RAW_SQL = 99; /* A raw SQL statement */
 }
...
}


There is no SELECT in the list, but maybe this is correct. I am just thinking allowed...

Only because that is for replication and obviously replication doesn't replicate SELECT statements. Of course, we could add support for a message class which described a SELECT, though I would try to steer clear of the interface available for both MRR and ICP and stick to GPB messages similar to the existing Statement message classes...

We have 3 types of statements:
DML update statements:    INSERT, UPDATE, DELETE
DML read-only statements: SELECT.
DDL statements:           CREATE TABLE, ALTER TABLE, etc.

And assuming we have 2 sets of calls:
- beginTransaction, commitTransaction/rollbackTransaction
- startStatement, endStatement

We could say, all types of statements require a beginTransaction() and a startStatement() (and the corresponding endStatement() and commitTransaction/rollbackTransaction()).

But I don't think this is absolutely correct:

* DML update statements require both beginTransaction() and a startStatement().

Yes.

* DML read-only statements only require a beginTransaction() call because a SELECT does not need a statement level transaction (because they cannot be rolled back).

Yep.

* And DDL statements only require a startStatement() because it is up to the engine to decide if this can be done within a transaction or not.

Correct, which means that we must have a way for the StorageEngine to notify the kernel on startStatement() that the engine cannot include a particular Statement in a transaction (in other words, it is saying Drizzle MUST implicitly COMMIT the active transaction before asking the engine to execute the statement (INTERNAL) or executing the statement inside the kernel (NORMAL)).

To accomplish this communication, what about changing the StatementExecutionIntent enum to a std::bitset<> (or uint64_t for C folks):

static const int NUM_INTENT_BITS= 4;
typdef std::bitset<NUM_INTENT_BITS> StatementExecutionIntent;

/* "Normal" or default execution.  Defer to the kernel... */
static const StatementExecutionIntent INTENT_KERNEL(1 << 1);
/* Engine will handle all locking and execution */
static const StatementExecutionIntent INTENT_INTERNAL(1 << 2);
/* Engine has no clue what the statement is */
static const StatementExecutionIntent INTENT_UNKNOWN_STATEMENT(1 << 3);
/* Kernel is required to commit active transaction before
   executing this statement, as the engine cannot include
   this statement in a transaction. */
static const StatementExecutionIntent INTENT_MUST_COMMIT_BEFORE(1 << 4);

Given the above, let's say someone executes an ALTER TABLE statement and PBXT has the ability to optimally execute the ALTER TABLE internally, but must rely on the Drizzle kernel to commit open transactions before this happens.

The code would go like this:

const StatementExecutionIntent
PBXTEngine::startStatement(Session &session,
                           message::Statement &statement)
{
  message::Statement::Type type= statement.type();
  switch (type)
  {
  ...
  case message::Statement::ALTER_TABLE:
    return (INTENT_INTERNAL & INTENT_MUST_COMMIT_BEFORE);
  ...
  }
}

// in drizzle_parse(), a message::Statement would be constructed
// which contains something like this:

message::Statement *statement= new message::Statement();
statement->set_type(message::Statement::ALTER_TABLE);

message::AlterTableStatement *alter=
  statement->mutable_alter_table_statement();

message::Table *before_table= alter->mutable_before();
message::Table *after_table= alter->mutable_after();

// parser then fills in the before and after image of the
// message::Table definition...
//
// At this point, we notify the engine that we wish to start
// a new SQL statement for the ALTER TABLE t1 ...

plugin::StorageEngine *engine= getStorageEngineForTable("t1");
Session &session= getCurrentSession();

StatementExecutionIntent intent=
  engine->startStatement(session, *statement);

// The engine may have indicated to the kernel that the kernel
// should commit any active transactions before asking the engine
// to execute the statement...

if (unlikely(intent.test(INTENT_MUST_COMMIT_BEFORE))
{
  session->endActiveTransaction();
}

// If the engine is happy to let Drizzle deal with the statement,
// follow a "normal" kernel path for this statement...
//
// If the engine wants to handle this statement, let it...
//
if (likely(intent.test(INTENT_KERNEL))
{
  // do the "normal", unoptimized set of operations within
  // the kernel...
}
elseif (intent.test(INTENT_INTERNAL)
{
  // Let the engine do it internally
  if (engine->passthruStatement(session, *statement) == false)
  {
     // Ask engine for errors to report to the user...
  }
}

Whatcha say? :)

-jay



Follow ups

References