← Back to team overview

drizzle-discuss team mailing list archive

Re: Improving the Engine API

 

Hi Jay,

On Dec 8, 2009, at 3:48 PM, Jay Pipes wrote:

Lots of comments and code... please read through! :)

Will do, after all, we are making progress on this! :)


No matter what method the engine chooses. The server should still just replicate the GPB statement data (as you suggest in https://lists.launchpad.net/drizzle-discuss/msg05305.html) . This basically means that the statement is replicated, and not the operations.

Hmmm, kind of. Right now, replication is row-based with regards to DML which modifies row data (INSERT, UPDATE, DELETE). This may actually be better for you, though, as you would have key information about the statement available in a GPB message -- and you wouldn't have to grok the Lex...

Oh, yes, of course. I am very glad about this for a few reasons:

- PBXT's does not support statement based replication (MVCC not serializable) - Row based replication is more reliable, with the tendency to converge rather than to diverge
- RBR works with repeatable read which allows more concurrency.

There is no SELECT in the list, but maybe this is correct. I am just thinking allowed...

Only because that is for replication and obviously replication doesn't replicate SELECT statements. Of course, we could add support for a message class which described a SELECT, though I would try to steer clear of the interface available for both MRR and ICP and stick to GPB messages similar to the existing Statement message classes...

Yes, I forgot about these cases. But I agree we should continue to use the GPB based interface for this stuff.


* And DDL statements only require a startStatement() because it is up to the engine to decide if this can be done within a transaction or not.

Correct, which means that we must have a way for the StorageEngine to notify the kernel on startStatement() that the engine cannot include a particular Statement in a transaction (in other words, it is saying Drizzle MUST implicitly COMMIT the active transaction before asking the engine to execute the statement (INTERNAL) or executing the statement inside the kernel (NORMAL)).

To accomplish this communication, what about changing the StatementExecutionIntent enum to a std::bitset<> (or uint64_t for C folks):

static const int NUM_INTENT_BITS= 4;
typdef std::bitset<NUM_INTENT_BITS> StatementExecutionIntent;

/* "Normal" or default execution.  Defer to the kernel... */
static const StatementExecutionIntent INTENT_KERNEL(1 << 1);
/* Engine will handle all locking and execution */
static const StatementExecutionIntent INTENT_INTERNAL(1 << 2);
/* Engine has no clue what the statement is */
static const StatementExecutionIntent INTENT_UNKNOWN_STATEMENT(1 << 3);
/* Kernel is required to commit active transaction before
  executing this statement, as the engine cannot include
  this statement in a transaction. */
static const StatementExecutionIntent INTENT_MUST_COMMIT_BEFORE(1 << 4);

Given the above, let's say someone executes an ALTER TABLE statement and PBXT has the ability to optimally execute the ALTER TABLE internally, but must rely on the Drizzle kernel to commit open transactions before this happens.

The code would go like this:

const StatementExecutionIntent
PBXTEngine::startStatement(Session &session,
                          message::Statement &statement)
{
 message::Statement::Type type= statement.type();
 switch (type)
 {
 ...
 case message::Statement::ALTER_TABLE:
   return (INTENT_INTERNAL & INTENT_MUST_COMMIT_BEFORE);
 ...
 }
}

// in drizzle_parse(), a message::Statement would be constructed
// which contains something like this:

message::Statement *statement= new message::Statement();
statement->set_type(message::Statement::ALTER_TABLE);

message::AlterTableStatement *alter=
 statement->mutable_alter_table_statement();

message::Table *before_table= alter->mutable_before();
message::Table *after_table= alter->mutable_after();

// parser then fills in the before and after image of the
// message::Table definition...
//
// At this point, we notify the engine that we wish to start
// a new SQL statement for the ALTER TABLE t1 ...

plugin::StorageEngine *engine= getStorageEngineForTable("t1");
Session &session= getCurrentSession();

StatementExecutionIntent intent=
 engine->startStatement(session, *statement);

// The engine may have indicated to the kernel that the kernel
// should commit any active transactions before asking the engine
// to execute the statement...

if (unlikely(intent.test(INTENT_MUST_COMMIT_BEFORE))
{
 session->endActiveTransaction();
}

// If the engine is happy to let Drizzle deal with the statement,
// follow a "normal" kernel path for this statement...
//
// If the engine wants to handle this statement, let it...
//
if (likely(intent.test(INTENT_KERNEL))
{
 // do the "normal", unoptimized set of operations within
 // the kernel...
}
elseif (intent.test(INTENT_INTERNAL)
{
 // Let the engine do it internally
 if (engine->passthruStatement(session, *statement) == false)
 {
    // Ask engine for errors to report to the user...
 }
}

Whatcha say? :)

Absolutely, it will work. And maybe it is the best solution.

I would have suggested that Drizzle assumes that DDL works with transactions, and that the engine handles the situation by doing one of the following:

- Execute the DDL normally, because it supports DDL in transactions,
- Return an error because it does not handle DLL in a transaction, or
- Silently commit the transaction, and begin it again at the end of the DML statement, without Drizzle even knowing about it.

But, I guess the problem is replication. If Drizzle is not aware of a transaction end, then it would be replicated as such, and we may end up with a different result on the slave.

The problem of DDL in a transaction only occurs when auto-commit is disabled, or an explicit BEGIN is used, so lets look at a quick example:

BEGIN;
INSERT t1 VALUES (1, 1);
INSERT t1 VALUES (2, 2);
ALTER TABLE t1 ADD COLUMN c3 INT;
INSERT t1 VALUES (3, 3, 3);
INSERT t1 VALUES (4, 4, 4);
COMMIT;

A silent COMMIT on DML will lead to the following:

BEGIN;
INSERT t1 VALUES (1, 1);
INSERT t1 VALUES (2, 2);
COMMIT;
ALTER TABLE t1 ADD COLUMN INT c3;
BEGIN;
INSERT t1 VALUES (3, 3, 3);
INSERT t1 VALUES (4, 4, 4);
COMMIT;

While I am no fan of a silent COMMIT, it may be the best solution, because at least this sequence of statements will be compatible with engines that support DDL in transactions and those that don't (much like MyISAM happily ignores BEGIN TRANSACTION).

The alternative would be to return an error. This would prevent the surprise affect that I get when the server crashes and I discover my transaction was not atomic after all.

I guess this is a question for the DBA's on the list ... input please! :)

Best regards,

Paul

--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






Follow ups

References