← Back to team overview

pbxt-discuss team mailing list archive

Re: goal 0 of embedde pbxt reached || (was: Re: PBXT: Embedded Database (library))

 

Hi Martin,

On Feb 11, 2010, at 5:12 PM, Martin Scholl wrote:

As this is my first posting, I'd like to say "hello" to you all.

On Thu, Feb 11, 2010 at 4:19 PM, Paul McCullagh <paul.mccullagh@xxxxxxxxxxxxx > wrote:
[snip]
I guess the next step would be to defined an interface, and begin the implementation.
Beneath the missing bloody THD- and codeset / charset-related stuff... :-) Even more, you will notice a lot of assert(0)s in the current version... To be clear: the current state of the code is solely to "make it compile" instead of "be maintable" or even "be clean". Please keep this in mind when reading my "code".

Understand, but it is a good start...



This raises the question of whether to use the MySQL handler interface, or to go in and replace ha_pbxt.cc altogether.
I'd propose to skip ha_pbxt.cc altogether and stick with a dedicated (and pbly simplified) embedded API. ha_pbxt.cc is not part of the current build-set anyways. :-)

Yes, OK.

One of the things I like the most about libraries like Tokyo Cabinet is its straight-forward API. I would love to see embedded PBXT be easy like this as well.

Absolutely agree. As few API calls as possible, and they should be easy to understand.

Embedded InnoDB's API might be a good start and reference for an API sketch: http://www.innodb.com/doc/embedded_innodb-1.0/

Yes, I read that through again. Most of it could be taken over 1 to 1.

IMHO what is open and where I would really appreciate your feedback / comments:
- what language should the API be in? C or C++?

Well, C has the advantage that it is easy to put a C++ wrapper around if you want to, the other way around is tricky. So unless there is a good reason, I would recommend a C API.

- In which format shall we store the table/db definitions? protobuff maybe? Afair Drizzle does so, so we could borrow some code there...

protobuf may be an overkill for the initial implementation.

How are you planning to do create table? The innodb API does it by building a create table structure with various API calls.

By submitting a CREATE TABLE statement as text, you can save a lot of API routines.

PBXT already has a parser for CREATE (and ALTER) table statements. So, you could accept the text and feed the parser.

Then, you could actually store the table definition as a CREATE TABLE statement. When the table is loaded you just invoke the parser. The CREATE TABLE text could be stored in a separate file, like the .frm file, for each table.

Alternatively the text could be stored in the header of the .xtd file, where I already store the foreign key information (the foreign key information is actually stored as SQL text).

However, this may be going too far with the integration of the embedded code and PBXT itself.

Basically, what would be cool is if the embedded wrapper code controls the following:

1. The types of data stored.
   - We can start with a few very basic types.
2. The format of a record in RAM
- This is the same format that PBXT uses on disk, as long as the records are fixed length - For variable length records it uses a simple serialization method (as I mentioned before)
3. The format of index records
- with an interface to get and set data in a row, the engine does not need to actual format
4. The comparison of data types
   - the wrapper provides routines to compare data types.
- These are mostly methods which are part of the data dictionary in RAM
5. The format of the data dictionary on disk, and in RAM
   - the wrapper reads and writes this data.

This will give us great flexibility to add data types and other complexities later.

It is also pretty much the division of work between MySQL code and PBXT today. However, the division is not so clear in the code.

- else, should table serialization / deserialization be pluggable or even be purely programmatic? I am fine with this, too, as it is an _embedded_ library and I'd guess most people will control pbx programmatically anyways

Although I spoke mainly about the textual interface above, I am really flexible on this. I think both solutions have there advantages.

Use whichever is best and easiest for you at the moment, which may be simply writing your own stuff! :)

- library naming: are you fine with libembpbxt?

Yup, that sounds good.


If you use the handler interface, then you will have to continue to simulate MySQL, which may not suite the API (you will have to call the handler functions in the same order that MySQL does).

If you replace ha_pbxt, then you will have to nevertheless include some of the functionality in this code. For example, you should take over the init and shutdown code.

What you need to keep is the "cursor" type paradigm.

What I mean is, to do and index or table scan you do the following:

- open a cursor for a table
 * which means grap an XTOpenTable from the table pool
- call init
 * Initialize the scan.
- Call search and next in a loop.
- call exit
 * Free resources
- close the cursor
 * which means return the open table to the pool

All such actions need to be enclosed in a:

- begin transaction
...
- commit/rollback transaction

The transaction is per thread, and all relevant information is stored in the XTThread structure.
Ok, a lot of open questions are answered by this. Thank you, Paul!


[snip]

Martin

P.S.: I will set up a TODO file to make it easier to track embedded PBXT's progress

OK, great.

--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






Follow ups

References