← Back to team overview

drizzle-discuss team mailing list archive

Re: Stewart Smith: The Drizzle (and MySQL) Key tuple format


  hi stewart, I read your post about key tuple format. But I still do
not quite understand when and how will those index_foo functions will
be called? I've looked through csv and archive engine, as well as
mysql filesystem and awss3 engine. None of then support index. I try
to figure it out by reading source code of innodb engine, but still
dose not quite understand. So I have no idea about how to implement
index in storage engine. Can you give me some hints?
  And I did not read transaction-related source code  in depth. I am
not sure whether I can implement a filesystem or could-based
TransactionalStorageEngine in three-months GSOC. What's your opinion?


2010/4/2 Planet Drizzle <emailer@xxxxxxxxxxxxxxxxx>
> Stewart Smith: The Drizzle (and MySQL) Key tuple format
> Here’s something that’s not really documented anywhere (unless you count ha_innodb.cc as a source of server documentation). You may have some idea about the MySQL/Drizzle row buffer format. This is passed around the storage engine interface: in for write_row and update_row and out for the various scan and index read methods.
> If you want to see the docs for it that exist in the code, check out store_key_val_for_row in ha_innodb.cc.
> However, there is another format that is passed to your engine (and that your engine is expected to understand) and for lack of a better name, I’m going to call it the key tuple format. The first place you’ll probably see this is when implementing the index_read function for a Cursor (or handler in MySQL speak).
> You get two things: a pointer to the buffer and the length of the buffer. Since a key can be made up of multiple parts, some of which can be NULL and some of which can be of variable length, this buffer is not (usually) a simple value. If you are starting out in your engine development, you can use this buffer blindly as a single value for non-nullable indexes with only 1 column.
> The basic format is this:
> The buffer is in-order of the index. First column in the index is first in the buffer, second second etc.
> The buffer must be zero-filled. The server kernel will use memcmp to compare two key values.
> If the column is NULLable, then the first byte is set to 1 if the column is null. Else, 0 means not-null.
> From ha_innodb.cc (for BLOBs, which I haven’t put in embedded_innodb yet): If the column is of a BLOB type (it must be a column prefix field in this case), then we put the length of the data in the field to the next 2 bytes, in the little-endian format. If the field is SQL NULL, then these 2 bytes are set to 0. Note that the length of data in the field is <= column prefix length.
> For fixed length fields (such as int), the next max field length bytes are for that field.
> For VARCHAR, there is always a 2 byte (in little endian) length. This is different to the row format, which may have 1 or 2 bytes. In the key tuple format it is ALWAYS two bytes.
> I’ll discuss the use of this for rnd_pos() and position() in a later post…
> This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).
> URL: http://www.flamingspork.com/blog/2010/04/02/the-drizzle-and-mysql-key-tuple-format/
> _______________________________________________
> Mailing list: https://launchpad.net/~drizzle-discuss
> Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~drizzle-discuss
> More help   : https://help.launchpad.net/ListHelp

Follow ups