maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12757
Re: Request for Discussion: Structure for JSON histogram objects
Just to clarify: we are looking at two options currently.
Please find their description below.
== Current mysql.column_stats ==
Currently, MariaDB defines mysql.column_stats as:
CREATE TABLE column_stats (
db_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
table_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
column_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
nulls_ratio decimal(12,4) DEFAULT NULL,
avg_length decimal(12,4) DEFAULT NULL,
avg_frequency decimal(12,4) DEFAULT NULL,
hist_size tinyint(3) unsigned DEFAULT NULL,
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB')
COLLATE utf8mb3_bin DEFAULT NULL,
histogram varbinary(255) DEFAULT NULL,
PRIMARY KEY (db_name,table_name,column_name)
);
It seems there is some redundancy here: the histogram column is of variable
length, but there is also hist_size column which also stores length of the
histogram (?).
== Variant #1: Add JSON support to the current scheme
* Keep all the same fields, but add another histogram type: JSON (more on the
* name choice in the next email)
* Change the definition of the 'histogram' column so it is not limited to 255
bytes.
CREATE TABLE column_stats (
...
-- the same columns as before
...
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB', 'JSON')
COLLATE utf8mb3_bin DEFAULT NULL,
histogram blob DEFAULT NULL,
PRIMARY KEY (db_name,table_name,column_name)
);
Advantages:
* The definition is close to the current definition
* The data from the older version can be kept as-is: no migration or conversion
is necessary
* The format is backward-compatible: old server binary can read the records
stored in the new format (we'll still need to verify this)
Disadvantages:
- depending on value of hist_type, histogram column holds either JSON or binary
data, which is not a good data modeling practice.
- JSON histogram holds real bucket bounds. The column's min_value and max_value
will be stored twice: First, in min_value and max_value columns, and then
in the first/last bound in the histogram.
On the other hand, separate min_value and max_value columns allow storing
min/max statistics without collecting the histogram.
== Variant #2: Change to JSON-only ==
New mysql.column_stats will look like this:
CREATE TABLE column_stats (
db_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
table_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
column_name varchar(64) COLLATE utf8mb3_bin NOT NULL,
-- perhaps some columns like avg_length, avg_frequency
histogram JSON, -- All histograms are in JSON
PRIMARY KEY (db_name,table_name,column_name)
);
The histogram will be always stored in JSON. Old histograms will
be converted to the new format.
Advantages
- straightforward data modeling.
- no "baggage" (Do we need old-format histograms, long term?)
Disadvantages
- This is not compatible with the old data format.
- If we chose to support legacy histograms, we'll need to convert them into
JSON (This should be fairly easy to with DECODE_HISTOGRAM function is useful for that)
Any thoughts on this anyone?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Follow ups
References