← Back to team overview

pbxt-discuss team mailing list archive

Re: Row buffers and Objects

 

On Thu, 27 May 2010 12:36:46 +0200, Paul McCullagh <paul.mccullagh@xxxxxxxxxxxxx> wrote:
> On May 26, 2010, at 5:26 AM, Stewart Smith wrote:
> > class Tuple
> > {
> > Tuple(Tuple &t)
> > { for(i= 0; i< t.nr_columns(); i++) { set_column(i, t.column(i)); } }
> > }
> 
> Currently, I can do this with one memcpy, in the cases when PBXT is  
> using a fixed length record structure.
> 
> But, I guess this is a small price to pay for proper encapsulation.

You could probably continue to do so, but the above would be the generic
solution.

> > class OverTheWireTuple : class Tuple
> >
> > class PBXTTuple : class Tuple
> > {
> >        int set_column(int colnr, Value v)
> >        {
> >          /* convert value into pbxt format and store in this  
> > PBXTTuple */
> >        }
> > }
> 
> Besides column(i) which returns a Value, I would need the following:
> 
> u_char *column_ptr(i)
> size_t column_size(i)
> bool is_column_null(i)
> 
> and also:
> 
> int set_column_data(int colnr, u_char *data, size_t len)
> int set_column_null(int colnr)
> 
> Then I can just copy the data if I don't care about the contents. This  
> would enable the engine to just pack (and later unpack) the data into  
> a buffer for storage on disk, without having to understand that the  
> data represents.

Yep. Pretty much what I also envisioned.

> (although some thought needs to be given to the endian problem -  
> currently byte order of data in the MySQL row buffer is in a processor  
> independent format, and can therefore be stored on disk without  
> further conversion).

Hrrm... probably just the same as today... Although I certainly would
not be placing bets on the endian independence currently properly working in all
cases (in MySQL or Drizzle).

I think there's ways to do it properly without too much hastle though.

> All I then need is for Drizzle to provide comparison routines for each  
> column.
> 
> In this way the engine does not need to know anything about the data,  
> and the interpretation of the data is always in sync with the server.

yep.

> These routines do not need to be methods on the Tuple. They can be  
> methods on the TableShare, for example. This makes most sense because  
> the comparing data depends on information stored in the data  
> dictionary, which includes: data type and collation sequences for  
> strings.
> 
> So on TableShare would could have a method:
> 
> int compare_column(int colnr, Tuple &ta, Tuple &tb)
> 
> and also:
> 
> int compare_column(int colnr, u_char *data_a, size_t len_a, u_char  
> *data_b, size_t len_b)
> 
> This would be sufficient for engines to build and compare index key  
> items (which are basically just Tuples with a mapping to the columns).

I'd like the query execution parts of Drizzle to also end up using the
same interface... and this can probably work... but I'm hand-waving a
little there :)

> > int PBXTCursor::doInsertRecord(Tuple &tuple)
> > {
> >        PBXTTuple pbxt_tuple(tuple);
> >        pbxt_write_row(pbxt_tuple);
> > }
> 
> This is good. And it fits into what Brian is suggesting: turning the  
> record array into something like this:
> 
> Tuple record[2]

Personally, I'd like this to end up just being a memory pool instead, as
a lot of the time you don't actually need 2 records and it's just a
waste of memory (especially for large tables).

but yeah, as a intermediate step, it's probably what will happen.

> > Where the upper layer has read off the wire a tuple, constructed a
> > OverTheWireTuple, which then gets handed to PBXT. Because PBXT doesn't
> > want it in that format, it converts it to its format.
> >
> > For reading a row, PBXT hands back a PBXTTuple, so for WHERE  
> > conditions
> > and the like the upper layer just checks the value in the
> > PBXTTuple. Only if the Row is going to back to the user over the wire
> > does it need to be converted into a OverTheWireTuple.
> >
> > A temp only engine could just use the OverTheWire format and *never*  
> > do
> > a conversion.
> 
> Although this may present a problem with regard to the scope of  
> validity of Tuples returned by the engine.
> 
> The best for the moment would be that the Tuple is valid until the  
> next call to the engine on the Cursor that returned the Tuple.

agreed.

I wouldn't also mind a mechanism that could implement either:
a) copying of the tuple
b) retain/release

so that if the upper layer did need it for longer it could ask the
engine to keep it around and you'd either get a copy (by default) or if
the engine is clever, just a reference to the same bit of memory.

> In your blog you discuss a nasty little exception to this rule. This  
> would need to be corrected, so that the scoping rules are simple for  
> the engine.

Yes. OMG is that execption sucky.

One great thing about the InnoDB code is that it does tend to slap you
whenever you do something wrong. With the MySQL inherited code... it,
well... I don't have to tell you about the wonderful corner cases :)

-- 
Stewart Smith



Follow ups

References