← Back to team overview

drizzle-discuss team mailing list archive

Re: Improving the Engine API (was Re: New PBXT Drizzle-specific storage engine...)


I want to add the example in this post to my summary wiki page, but I want to both make sure I fully understand what you are saying, and add a couple suggestions. Please see my comments below.

Paul McCullagh wrote:
Hi Toru,

On Dec 8, 2009, at 6:37 PM, Toru Maesaka wrote:

Hi Paul,

- If I have a update type statement (i.e. whether the statement modifies
- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).
- If we have a SELECT FOR UPDATE.

Agreed! Especially for the third point. For me this is what I need:

- Whether the statement will change the table state (so, updates in general)
- Whether the entire table needs to be locked.
- Whether the statement only performs READ operations.

The third point I don't think needs to be explicitly defined but it
would be relieving for the engine to know (or be guaranteed) that the
table state will not be changed.

Whether the storage engine should obey the statement characteristics
or not is up to the engine developer I guess... Nonetheless it would
be brilliant as a "hint" for everyone I think.

The question here is whether the engine should lock the tables in the startStatement() call, or when the cursor is used?

Lets look at an UPDATE statement:
UPDATE t1, t2 SET t1.c1=50 WHERE t1.id = t2.id and t2.c3='abc';
In this statement, t1 is being read and updated, and t2 is just being read. Both tables are being scanned (lets assume there are no indexes).

Drizzle does not support multi-table UPDATEs in a single statement. For Drizzle, the above statement would have to be written like so:

UPDATE t1 SET t1.c1=50
WHERE t1.id IN (
  SELECT t2.id FROM t2 WHERE t2.c3='abc'

Here is some pseudo code for the execution of this statement:


a = engine->getCursor("t1", WILL_UPDATE)
b = engine->getCursor("t2", READ_ONLY)

Actually, the above is not correct. Because the update to t1 is dependent on rows in a consistent snapshot of t2, you would actually have to do (again, in pseudocode):

b= engine->getCursor("t2", REPEATABLE_READ);
a= engine->getCursor("t1", WILL_UPDATE);


OK, so the above rnd_init()s on the two cursors are necessary only because Cursor::rnd_init() is the legacy Handler API for initializing a MyISAM table scan.

But, let's take the MyISAM/legacy gloves off, shall we? Forget about rnd_init(), index_init(), blah, blah, blah. Let's imagine that the *cursor* that is returned from StorageEngine::getCursor() corresponds to a specific cursor subclass. Imagine with me, please, the following potential new class hierarchy for Cursor:

 * Base class for a current set of records.
class Cursor
   * Constructor.  Initializes the Cursor for
   * reading.
   * @param[in] The session using this cursor.
  Cursor(Session &session);
   * Read the first row from the cursor into
   * a supplied byte array.
   * @param[out] Bytes to copy row into
  bool readFirst(uint8_t *row_out);
   * Read the next row the cursor into a
   * supplied byte array
  bool readNext(uint8_t *row_out);
   * Read row into a supplied byte array
   * which corresponds to the supplied key
   * @param[out] Bytes to copy row into
   * @param[in] Key to find
   * @retval
   *   true if key was found
   *   false if not
  bool readAtKey(uint8_t *row_out, KeyInfo &key);
   * Updates the current row in the Cursor
   * @retval
   *   true if update succeeded
   *   false otherwise
  bool updateRecord(uint8_t *new_record);

 * Cursor which represents a consistent snapshot
 * of records in the repeatable read transaction
 * isolation level.
 * Records read via this class are guaranteed to be
 * read in a repeatable manner: in other words, this
 * Cursor is a set of records that represent the state
 * of a table at a specific point in time (for example,
 * the start of a transaction)
class ConsistentSnapshotCursor :public Cursor {};

There could, of course, be additional Cursor subclasses for specific read-scan needs...

For instance, the Cursor returned from a call to the following:

b= engine->getCursor("t2", REPEATABLE_READ);

would be a subclass ConsistentSnapshotCursor.

the kernel could do something like the following to update the correct records:

uint8_t *record;

KeyInfo primary_key_value= get_key_info("t2", "c3", "abc"); // pseudo...

bool result= true;
while (b->readAtKey(record, primary_key_value))
  result&= b->updateRecord();

if (result == false)
  // Ask engine for errors...



There would be no need to call rnd_end(), or release(). The Cursor's destructor would clean this up.


gpb_stat_info is GPB based information about the statement. In the case of an UPDATE, I think this would just contain the statement type, and maybe a list of the tables.

Or simply the UpdateStatement that the transaction.proto file already defines, which has all the information one would need to understand the tables involved in the update...

So where should the table lock be taken. We have a number of possibilities:

* startStatement():
So far we have identified the following uses for startStatement():
  - Starts the statement level transaction
  - Where the engine decides how it will handle a DDL statement
If startStatement() is also to be used to lock tables for DML, then the GPB info, must include a list of tables in indicate which table will be updated.

* getCursor()
This would be my choice for the point at which the table would be locked. As I have indicated in my code above, Drizzle should indicate how the cursor will be used. Locking the table here would mean we do not need a list of tables for DML statements in startStatement().

This would be my choice, too, and if you follow my "futuristic" code above to its natural conclusion, you could see that the Cursor subclasses could handle locking differently depending on their needs...

* rnd_init()
The latest point at which the table could be locked.
This is probably not a good point to lock the table because rnd_init() and rnd_end() may be called multiple times in the statement, which would lead to the table being locked and unlocked during the statement execution.





What do you think?

Best regards,



On Mon, Dec 7, 2009 at 11:00 PM, Paul McCullagh
<paul.mccullagh@xxxxxxxxxxxxx> wrote:
Hi Toru,

On Dec 7, 2009, at 3:31 AM, Toru Maesaka wrote:

Great to hear another use-case where knowing a statement type in
advance is useful :)

Yes, generally I need to know the following:

- If I have a update type statement (i.e. whether the statement modifies
- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).
- If we have a SELECT FOR UPDATE.

I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch
yet :)

I've taken notes from our discussion the other day. I'm planning on
working on it when I finish testing through my current progress of

Great! :)

For now, I'm happy with Jay's advise of using


On Sat, Dec 5, 2009 at 5:59 AM, Brian Aker <brian@xxxxxxxxxxx> wrote:


On Dec 4, 2009, at 3:12 AM, Paul McCullagh wrote:

If we have a startStatement() call, then it could be used in place of
beginAlter(), assuming we can determine the statement type, and the tables

The problem with relying on statement type is that at some point
statement type will be pluggable... which means you would constantly need to
update your engine for new statements.


I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch
yet :)

Then, when a handle is returned to the pool it is deleted, instead of
adding it back to the pool.

BTW very soon engines will own their Cursor objects and will be free to
reuse them.

The locking thread waits until all handles are returned and deleted
before it can proceed. The lock on the pool then prevents a new table handle
from being created while the locking thread is busy.
Either way, it would be good if Drizzle closes all handlers/cursors
before a table is deleted or renamed.

I would say that long term this will be optional, based on what the
engine requires.

OK, this make things a lot simpler! Indeed, if we don't need to support
LOCK TABLE then external_lock() can be removed altogether.

Tried removing the external_lock() right now and seeing if any issues pop


Paul McCullagh
PrimeBase Technologies

Paul McCullagh
PrimeBase Technologies

Follow ups