← Back to team overview

drizzle-discuss team mailing list archive

Re: Improving the Engine API (was Re: New PBXT Drizzle-specific storage engine...)

 

Hi Toru,

On Dec 8, 2009, at 6:37 PM, Toru Maesaka wrote:

Hi Paul,

- If I have a update type statement (i.e. whether the statement modifies
rows).
- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).
- If we have a SELECT FOR UPDATE.

Agreed! Especially for the third point. For me this is what I need:

- Whether the statement will change the table state (so, updates in general)
- Whether the entire table needs to be locked.
- Whether the statement only performs READ operations.

The third point I don't think needs to be explicitly defined but it
would be relieving for the engine to know (or be guaranteed) that the
table state will not be changed.

Whether the storage engine should obey the statement characteristics
or not is up to the engine developer I guess... Nonetheless it would
be brilliant as a "hint" for everyone I think.

The question here is whether the engine should lock the tables in the startStatement() call, or when the cursor is used?

Lets look at an UPDATE statement:
UPDATE t1, t2 SET t1.c1=50 WHERE t1.id = t2.id and t2.c3='abc';
In this statement, t1 is being read and updated, and t2 is just being read. Both tables are being scanned (lets assume there are no indexes).

Here is some pseudo code for the execution of this statement:

engine->beginTransaction()
engine->startStatement(gpb_stat_info)

a = engine->getCursor("t1", WILL_UPDATE)
b = engine->getCursor("t2", READ_ONLY)
a->rnd_init()
b->rnd_init()
....
a->update_row()
...
a->rnd_end()
b->rnd_end()

a->release()
b->release()

engine->endStatement()
engine->commitTransaction()

gpb_stat_info is GPB based information about the statement. In the case of an UPDATE, I think this would just contain the statement type, and maybe a list of the tables.

So where should the table lock be taken. We have a number of possibilities:

* startStatement():
So far we have identified the following uses for startStatement():
  - Starts the statement level transaction
  - Where the engine decides how it will handle a DDL statement
If startStatement() is also to be used to lock tables for DML, then the GPB info, must include a list of tables in indicate which table will be updated.

* getCursor()
This would be my choice for the point at which the table would be locked. As I have indicated in my code above, Drizzle should indicate how the cursor will be used. Locking the table here would mean we do not need a list of tables for DML statements in startStatement().

* rnd_init()
The latest point at which the table could be locked.
This is probably not a good point to lock the table because rnd_init() and rnd_end() may be called multiple times in the statement, which would lead to the table being locked and unlocked during the statement execution.

What do you think?

Best regards,

Paul


Cheers,
Toru


On Mon, Dec 7, 2009 at 11:00 PM, Paul McCullagh
<paul.mccullagh@xxxxxxxxxxxxx> wrote:
Hi Toru,

On Dec 7, 2009, at 3:31 AM, Toru Maesaka wrote:

Great to hear another use-case where knowing a statement type in
advance is useful :)

Yes, generally I need to know the following:

- If I have a update type statement (i.e. whether the statement modifies
rows).
- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).
- If we have a SELECT FOR UPDATE.

I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch
yet :)

I've taken notes from our discussion the other day. I'm planning on
working on it when I finish testing through my current progress of
BlitzDB.

Great! :)

For now, I'm happy with Jay's advise of using
current_session().

Cheers,
Toru

On Sat, Dec 5, 2009 at 5:59 AM, Brian Aker <brian@xxxxxxxxxxx> wrote:

Hi!

On Dec 4, 2009, at 3:12 AM, Paul McCullagh wrote:

If we have a startStatement() call, then it could be used in place of beginAlter(), assuming we can determine the statement type, and the tables
involved.

The problem with relying on statement type is that at some point
statement type will be pluggable... which means you would constantly need to
update your engine for new statements.

Yuck!

I was talking to Toru about this, and another possibility is that we have statements declare a needed "lock type" that any plugin could then query. I outlined the solution for Toru, but I don't know if he has written the patch
yet :)


Then, when a handle is returned to the pool it is deleted, instead of
adding it back to the pool.

BTW very soon engines will own their Cursor objects and will be free to
reuse them.

The locking thread waits until all handles are returned and deleted before it can proceed. The lock on the pool then prevents a new table handle
from being created while the locking thread is busy.
Either way, it would be good if Drizzle closes all handlers/ cursors
before a table is deleted or renamed.

I would say that long term this will be optional, based on what the
engine requires.

OK, this make things a lot simpler! Indeed, if we don't need to support
LOCK TABLE then external_lock() can be removed altogether.

Tried removing the external_lock() right now and seeing if any issues pop
up?

Cheers,
      -Brian



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com







--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






Follow ups

References