maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12162
Re: MDEV-17399 Add support for JSON_TABLE.
Hi Alexey,
Please find my review input below.
There is one big issue and a number of smaller ones.
> commit 654fdfee33e3eafe3b7f25d7e213717c22ea1e18
> Author: Alexey Botchkov <holyfoot@xxxxxxxxxxxx>
> Date: Mon Mar 30 01:00:28 2020 +0400
>
> MDEV-17399 Add support for JSON_TABLE.
>
> Syntax for JSON_TABLE added.
> The ha_json_table handler added. Executing the JSON_TABLE we
> create the temporary table of the ha_json_table, add dependencies
> of other tables and sometimes conditions to WHERE.
>
== The big issue ==
I think some design choices of this patch are questionable:
The temporary table has a unique key. What is it for? The key is defined over
the field holding the JSON text. What if the JSON text gets bigger than
MAX_KEY_LEN?
Then, there is an Item_func_eq(json_text, temp_table_field), which is set to
be "always equal" with set_always_equal(). This looks like a hack.
EXPLAIN shows that JSON_TABLE uses a kind of ref access which I think is
counter-intuitive.
What is the goal of all this?
The JSON_TABLE table is "laterally dependent" on the tables that are referred
from its arguments.
For the optimizer, this means:
1. The JSON_TABLE table must be put after its dependencies (this is similar to
outer joins)
2. The JSON_TABLE can only be read when there are "current records" available
for each of the dependencies (an example when they are not available: join
buffering. There are many records in the buffer, which one we should produce
matches for? For outer joins, join buffering code has a complex system to
track this).
The above "kludges" make an attempt to trick the optimizer into meeting
requrirements #1 and #2 but I think this is approach is hack-ish and has
holes.
It would be much better if the optimizer was explicitly aware that a table is
"laterally dependent" on other tables.
"table dependencies" are already there, so the property #1 is already taken
care of.
We still need to take care of #2 which means disable join buffering in such
cases. I think this should be easy to do.
== Small bits from trying out the patch ==
=== FOR ORDINALITY doesn't seem to work ===
select *
from
json_table('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]',
'$[*]' COLUMNS(a for ordinality)
) as tt;
+------+
| a |
+------+
| |
| |
+------+
2 rows in set (0.001 sec)
=== NULL ON ERROR doesn't seem to work ===
select *
from
json_table('[{"a": 1, "b": [11,111]}, {"a": "bbbb", "b": [22,222]}]',
'$[*]' COLUMNS( a DECIMAL(6,3) PATH '$.a' NULL ON ERROR)) as tt;
+-------+
| a |
+-------+
| 1.000 |
| 0.000 |
+-------+
I would expect the second row to have NULL, not 0.
=== Lack of test coverage ===
Other parts, like NESTED PATH, also have no test coverage at all. I think this
must be addressed before this is done.
=== No error on Invalid Json input ===
If I pass invalid JSON, I get no warning or error or anything:
MariaDB [test]> select * from json_table('[{"a": 1, invalid-json', '$[*]' COLUMNS( a INT PATH '$.a')) as tt;
+------+
| a |
+------+
| 1 |
+------+
1 row in set (0.001 sec)
=== Recursive ON-rule in the grammar ===
The json_opt_on_empty_or_error rule in sql_yacc.yy is recursive and causes the
following to be accepted:
select *
from
json_table(
'[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]',
'$[*]'
COLUMNS(a INT PATH '$.a' NULL ON EMPTY NULL ON EMPTY NULL ON EMPTY)
) as tt;
Is this intentional?
=== No error on invalid LATERAL dependency ===
create table t1 (item_name varchar(32), item_props varchar(1024));
insert into t1 values ('Laptop', '{"color": "black", "price": 1000}');
insert into t1 values ('Jeans', '{"color": "blue", "price": 50}');
MariaDB [test]> select * from t1 right join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
Empty set (0.000 sec)
The above query cannot be executed as left join execution requires that T is
computed before 1, but t is dependent on t1. We dont get an error for this
though.
MySQL produces this:
ERROR 3668 (HY000): INNER or LEFT JOIN must be used for LATERAL references made by 'T'
with left join, there's a demo of how the trickery with the optimizer was
successful (and I think one can construct other examples of this):
MariaDB [test]> select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
+-----------+-----------------------------------+-------+
| item_name | item_props | color |
+-----------+-----------------------------------+-------+
| Laptop | {"color": "black", "price": 1000} | blue |
| Jeans | {"color": "blue", "price": 50} | blue |
+-----------+-----------------------------------+-------+
2 rows in set (0.002 sec)
MariaDB [test]> explain select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | SIMPLE | T | ALL | NULL | NULL | NULL | NULL | 40 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
2 rows in set (0.001 sec)
=== AS is required with table alias ===
sql_yacc.yy has this:
> +table_function:
> + JSON_TABLE_SYM '(' expr ',' TEXT_STRING_sys
> + { ...
> + }
> + json_table_columns_clause ')' AS ident_table_alias
Is the 'AS' really required? MySQL-8 doesn't seem to require it.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Follow ups