← Back to team overview

maria-discuss team mailing list archive

Calculating Maximum Bytes Per Record

 

*Hello, *
*
*
*I have recently been doing some capacity planning exercises for a new
database we are developing in MariaDB.  One of the exercises I completed was
to create a database with all tables loaded with maximum width records at
100,000, 500,000, and 1,000,000 records.  I noticed that when I did this it
seems that the larger the tables grow, the less bytes per record are
required.  I am curious what causes this to happen?  These are all InnoDB
tables saved to their own tablespaces.*





*  Max Bytes Per Record



Record Count Record Count Record Count
100,000 500,000 1,000,000  acct_activation 241 176 164  acct_deactivated 419
336 327  disk_usage 136 75 63  keywords 713 646 633  pods 77,511 56,698
53,126  pods_joined 178 101 96  pod_activity 461 369 357  pod_files 78,224
57,220 53,615  sessions 168 92 80  users 1,426 1,351 1,338  usr_profile
78,392 57,343 53,730 *


*I also created a query (using Information_Schema.Columns) to calculate
maximum possible bytes per record / table (SQL attached below).  I ran the
query and compared the estimates to the actual values produced above.  The
query is close, but I would like to make it a little more accurate if
possible. *


  *TABLE_SCHEMA* *TABLE_NAME* *PK_BYTES_TOT* *FIELD_BYTES_TOT* *
IX_FIELD_BYTES_TOT* *IX_FIELD_COUNT* *IX_OVERHEAD_BYTES* *IX_BYTES_TOT* *
TABLE_BYTES_TOT*  pods acct_activation 0 73 33 1 29 62 *189*  pods
acct_deactivated 8 269 0 0 0 0 *388*  pods devices 1 273 0 0 0 0 *384*  pods
disk_usage 8 32 0 0 0 0 *56*  pods keywords 0 264 264 2 58 322 *820*  pods
pods 8 65,821 16 1 29 45 *92,224*  pods pods_joined 0 16 16 2 58 74 *126*
pods pod_activity 0 275 9 1 29 38 *438*  pods pod_files 8 66,376 56 2 58 114
*93,097*  pods sessions 0 10 10 2 58 68 *109*  pods users 8 1,112 41 1 29 70
*1,666*  pods usr_profile 8 66,628 0 0 0 0 *93,290*

*Does anyone have any suggestions for my calculations in the code below to
make it more accurate?*
*
*
*Basic Assumptions *
*
*

   - Calculated field widths are defined by data type in the CASE logic
   below.
   - Primary key bytes = The sum of all defined PK calculated field widths
   - Index Bytes = Primary key bytes +  Index calculated field width + 29
   bytes overhead (for each index defined)
   - 1 byte is added to each field that is Nullable.
   - All bytes defined above are multiplied by 1.4 for BTREE overhead.*  *

*
*
*Current SQL to calculate MAX record bytes by table:*
*
*
SELECT B.TABLE_SCHEMA
     , B.TABLE_NAME
     , SUM(PK_BYTES) AS PK_BYTES_TOT
     , SUM(FIELD_BYTE_SPACE) AS FIELD_BYTES_TOT
     , SUM(IX_BYTES) AS IX_FIELD_BYTES_TOT
     , SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) AS IX_FIELD_COUNT
     , SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29 AS
IX_OVERHEAD_BYTES
     , (SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29) + SUM(IX_BYTES)
AS IX_BYTES_TOT
     , (SUM(PK_BYTES) + SUM(FIELD_BYTE_SPACE) +
       (SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29) + SUM(IX_BYTES))
* 1.4 AS TABLE_BYTES_TOT
FROM
(
SELECT A.*
     , CASE WHEN COLUMN_KEY = 'PRI'THEN FIELD_BYTE_SPACE ELSE 0 END AS
PK_BYTES
     , CASE WHEN A.COLUMN_KEY <> 'PRI'
             AND A.COLUMN_KEY <> '' THEN (PK_BYTE_SPACE + FIELD_BYTE_SPACE)
ELSE 0 END AS IX_BYTES
FROM (
SELECT PK_SP.TABLE_SCHEMA
     , PK_SP.TABLE_NAME
     , PK_SP.COLUMN_NAME
     , DATA_TYPE
     , CHARACTER_MAXIMUM_LENGTH
     , NUMERIC_PRECISION
     , IS_NULLABLE
     , COLUMN_KEY
     , CHARACTER_SET_NAME
     , CHARACTER_OCTET_LENGTH
     , (CASE -- CHARACTER FIELDS
            WHEN DATA_TYPE = 'varchar'             THEN
CHARACTER_MAXIMUM_LENGTH + 1
            WHEN DATA_TYPE = 'char'                THEN
CHARACTER_MAXIMUM_LENGTH
            WHEN DATA_TYPE = 'tinyblob'
              OR DATA_TYPE = 'tinytext'            THEN
CHARACTER_MAXIMUM_LENGTH + 1
            WHEN DATA_TYPE = 'blob'
              OR DATA_TYPE = 'text'                THEN
CHARACTER_MAXIMUM_LENGTH + 2
            WHEN DATA_TYPE = 'mediumblob'
              OR DATA_TYPE = 'mediumtext'          THEN
CHARACTER_MAXIMUM_LENGTH + 3
            WHEN DATA_TYPE = 'largeblob'
              OR DATA_TYPE = 'largetext'           THEN
CHARACTER_MAXIMUM_LENGTH + 4
            -- NUMERIC FIELDS
            WHEN DATA_TYPE = 'tinyint'             THEN 1
            WHEN DATA_TYPE = 'smallint'            THEN 2
            WHEN DATA_TYPE = 'mediumint'           THEN 3
            WHEN DATA_TYPE = 'int'
              OR DATA_TYPE = 'integer'             THEN 4
            WHEN DATA_TYPE = 'bigint'              THEN 8
            WHEN DATA_TYPE = 'float'
             AND (NUMERIC_PRECISION <= 24
              OR NUMERIC_PRECISION IS NULL)        THEN 4
            WHEN DATA_TYPE = 'float'
             AND NUMERIC_PRECISION > 24            THEN 8
            WHEN DATA_TYPE = 'bit'                 THEN (NUMERIC_PRECISION +
7) / 8
            WHEN DATA_TYPE = 'double'
              OR DATA_TYPE = 'numeric'             THEN
              (FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION-
FLOOR(NUMERIC_PRECISION/9)*9)*.5,0)
            -- DATETIME FIELDS
            WHEN DATA_TYPE = 'date'
              OR DATA_TYPE = 'time'                THEN 3
            WHEN DATA_TYPE = 'datetime'            THEN 8
            WHEN DATA_TYPE = 'timestamp'           THEN 4
            WHEN DATA_TYPE = 'year'                THEN 1
       ELSE 999999999999999 END) +
       (CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END) AS FIELD_BYTE_SPACE
     , CASE WHEN PK_BYTE_SPACE IS NULL THEN 0 ELSE PK_BYTE_SPACE END AS
PK_BYTE_SPACE
  FROM information_schema.columns AS PK_SP
  LEFT OUTER JOIN
       (SELECT TABLE_SCHEMA
             , TABLE_NAME
             , SUM((CASE -- CHARACTER FIELDS
            WHEN DATA_TYPE = 'varchar'             THEN
CHARACTER_MAXIMUM_LENGTH + 1
            WHEN DATA_TYPE = 'char'                THEN
CHARACTER_MAXIMUM_LENGTH
            WHEN DATA_TYPE = 'tinyblob'
              OR DATA_TYPE = 'tinytext'            THEN
CHARACTER_MAXIMUM_LENGTH + 1
            WHEN DATA_TYPE = 'blob'
              OR DATA_TYPE = 'text'                THEN
CHARACTER_MAXIMUM_LENGTH + 2
            WHEN DATA_TYPE = 'mediumblob'
              OR DATA_TYPE = 'mediumtext'          THEN
CHARACTER_MAXIMUM_LENGTH + 3
            WHEN DATA_TYPE = 'largeblob'
              OR DATA_TYPE = 'largetext'           THEN
CHARACTER_MAXIMUM_LENGTH + 4
            -- NUMERIC FIELDS
            WHEN DATA_TYPE = 'tinyint'             THEN 1
            WHEN DATA_TYPE = 'smallint'            THEN 2
            WHEN DATA_TYPE = 'mediumint'           THEN 3
            WHEN DATA_TYPE = 'int'
              OR DATA_TYPE = 'integer'             THEN 4
            WHEN DATA_TYPE = 'bigint'              THEN 8
            WHEN DATA_TYPE = 'float'
             AND (NUMERIC_PRECISION <= 24
              OR NUMERIC_PRECISION IS NULL)        THEN 4
            WHEN DATA_TYPE = 'float'
             AND NUMERIC_PRECISION > 24            THEN 8
            WHEN DATA_TYPE = 'bit'                 THEN (NUMERIC_PRECISION +
7) / 8
            WHEN DATA_TYPE = 'double'
              OR DATA_TYPE = 'numeric'             THEN
              (FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION-
FLOOR(NUMERIC_PRECISION/9)*9)*.5,0)
            -- DATETIME FIELDS
            WHEN DATA_TYPE = 'date'
              OR DATA_TYPE = 'time'                THEN 3
            WHEN DATA_TYPE = 'datetime'            THEN 8
            WHEN DATA_TYPE = 'timestamp'           THEN 4
            WHEN DATA_TYPE = 'year'                THEN 1
       ELSE 999999999999999 END) +
       (CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END)) AS PK_BYTE_SPACE
        FROM information_schema.columns COL_SP
        WHERE COLUMN_KEY = 'PRI'
          AND TABLE_SCHEMA = 'PODS'
        GROUP BY TABLE_SCHEMA
               , TABLE_NAME) AS IX_SP
     ON PK_SP.TABLE_SCHEMA = IX_SP.TABLE_SCHEMA
    AND PK_SP.TABLE_NAME = IX_SP.TABLE_NAME
  WHERE PK_SP.TABLE_SCHEMA = 'PODS') AS A
  ) AS B
  GROUP BY B.TABLE_SCHEMA
         , B.TABLE_NAME
*
*