← Back to team overview

pbxt-discuss team mailing list archive

Re: AVG_ROW_LENGTH computation

 

Hi Brian,

On Apr 15, 2010, at 7:25 AM, Brian Evans wrote:

I'm trying to understand the computation listed on page 27 of the
downloaded slides from the MySQL Conference 2010.

Previously, it was recommended in a lp bug that "Average comp. rec. len"
should be used to save space and set as the AVG_ROW_LENGTH.

I get the part about "Byte size +1 of all columns".

How do you define the "Size of the Data log reference"?

The size of the data log reference is fixed at 12 bytes.

So you just have to add 12 to your calculation. PBXT will then add 14 to your calculation to get the actual fixed component size.

14 bytes is the size of the record header.

This is because PBXT assumes that the value you set for AVG_ROW_LENGTH is the size of the data packet in in the handle data record (i.e. the fixed component).

To make sure AVG_ROW_LENGTH data package is always stored in the fixed component, you have to add the size of the data log reference, because this is automatically added to the record header, when the record does not fit into the fixed component.

Take this example; How can I obtain the optimal AVG_ROW_LENGTH to store
this data?

So according to the data that you already have in this table the average row length is 124. So without further calculation, you can use this as the AVG_ROW_LENGTH as you have done.

This means that now 50% of your rows are fully contained in the fixed component of the record. This means that 50% of the time, PBXT will have to read from the data log which may be OK for you. To further optimize this you may have to look at the 90% percentile and so on. For example if you can make that 200, and thereby include 90% of the rows, that would be a good trade-off for the cost of 76 bytes.

One thing you can be sure of, using AVG_ROW_LENGTH=124 should give you fairly compact storage of your data.

Now further optimizations would involve you column access patterns. AVG_ROW_LENGTH=124 already includes all indexed columns, which is good for PBXT itself. But, there is no guarantee (for example) EntryDate is in the fixed component.

This is because:

Number of fixed columns = 3

So PBXT is saying only the first 3 columns guaranteed to be in the fixed component. So EntryDate may be in the Data log 50% of the time. This would not be good if the application is hitting this column all the time. However, if the application is doing SELECT * all the time, then there is nothing you can do to optimize it.

But if not, then you can easily make sure that EntryDate is in the fixed component of the record. There are 2 methods:

1. Increase AVG_ROW_LENGTH until it includes the first 6 columns.

2. Move EntryDate up in the column order (and maybe increase AVG_ROW_LENGTH if necessary).

So, (1) is probably not an option in this case because column 5 is TEXT data type which has a maximum length of 65000. And the max AVG_ROW_LENGTH is about 16K.

So the only possibility is (2).

So there is a general rule that can be followed here for PBXT tables in general. Move the BLOB fields to the end of the table schema, and the smaller, and fixed rows to the front of the table definition. Then you can get the most rows in the fixed component for as "cheaply" as possible.

By cheaply I mean that fixed component real estate is expensive, because it is cached by PBXT. It is therefore also very fast access. Even in table scans.

But, if the App is doing SELECT * than all this does not help you, because the data log record need to be loaded regardless.

But how many times have developers been told never to do SELECT * ... :)

Best regards,

Paul

CREATE TABLE `audit` (
 `ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `RefID` int(11) unsigned NOT NULL,
 `AuditCatagory` varchar(64) NOT NULL,
 `UserID` varchar(64) NOT NULL,
 `Descript` text NOT NULL,
 `EntryDate` date NOT NULL,
 `EntryTime` time NOT NULL,
 `Comments` text NOT NULL,
 `Details` text NOT NULL,
 PRIMARY KEY (`ID`),
 KEY `db_index` (`RefID`) USING BTREE,
 KEY `db_index_c` (`AuditCatagory`) USING BTREE
) ENGINE=PBXT AVG_ROW_LENGTH=124

CHECK TABLE: audit
Record buffer size      = 176
Fixed length rec. len.  = 174
Handle data record size = 138
Min/max header size     = 14/26
Min/avg/max record size = 23/1407/196762
Avg row len set for tab = 124
Rows fixed length       = NO
Maximum fixed size      = 65536
Minimum variable size   = 320
Minimum auto-increment  = 2485698
Number of columns       = 9
Number of fixed columns = 3
Columns req. for index  = 3
Rec len req. for index  = 4
Columns req. for blobs  = 0
Number of blob columns  = 0
Number of indices       = 3
Extendend data length   = 71393428
Minumum comp. rec. len. = 31
Average comp. rec. len. = 124
Maximum comp. rec. len. = 2316
Free record count       = 2
Deleted record count    = 0
Allocated record count  = 2211916


_______________________________________________
Mailing list: https://launchpad.net/~pbxt-discuss
Post to     : pbxt-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~pbxt-discuss
More help   : https://help.launchpad.net/ListHelp



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






References