← Back to team overview

maria-developers team mailing list archive

Re: MDEV-17399 Add support for JSON_TABLE.


Hi Alexey,

Please find my review input below.
There is one big issue and a number of smaller ones.

> commit 654fdfee33e3eafe3b7f25d7e213717c22ea1e18
> Author: Alexey Botchkov <holyfoot@xxxxxxxxxxxx>
> Date:   Mon Mar 30 01:00:28 2020 +0400
>     MDEV-17399 Add support for JSON_TABLE.
>     Syntax for JSON_TABLE added.
>     The ha_json_table handler added. Executing the JSON_TABLE we
>     create the temporary table of the ha_json_table, add dependencies
>     of other tables and sometimes conditions to WHERE.

== The big issue ==

I think some design choices of this patch are questionable:

The temporary table has a unique key. What is it for? The key is defined over
the field holding the JSON text. What if the JSON text gets bigger than

Then, there is an Item_func_eq(json_text, temp_table_field), which is set to
be "always equal" with set_always_equal(). This looks like a hack.

EXPLAIN shows that JSON_TABLE uses a kind of ref access which I think is

What is the goal of all this? 
The JSON_TABLE table is "laterally dependent" on the tables that are referred
from its arguments.

For the optimizer, this means:
1. The JSON_TABLE table must be put after its dependencies (this is similar to
   outer joins)
2. The JSON_TABLE can only be read when there are "current records" available 
  for each of the dependencies (an example when they are not available: join
  buffering. There are many records in the buffer, which one we should produce 
  matches for? For outer joins, join buffering code has a complex system to
  track this).

The above "kludges" make an attempt to trick the optimizer into meeting
requrirements #1 and #2 but I think this is approach is hack-ish and has 
It would be much better if the optimizer was explicitly aware that a table is
"laterally dependent" on other tables.

"table dependencies" are already there, so the property #1 is already taken 
care of.
We still need to take care of #2 which means disable join buffering in such
cases. I think this should be easy to do.

== Small bits from trying out the patch ==
=== FOR ORDINALITY doesn't seem to work ===

select * 
json_table('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', 
           '$[*]' COLUMNS(a for ordinality)
          ) as tt;
| a    |
|      |
|      |
2 rows in set (0.001 sec)

=== NULL ON ERROR doesn't seem to work ===

select * 
  json_table('[{"a": 1, "b": [11,111]}, {"a": "bbbb", "b": [22,222]}]', 
             '$[*]' COLUMNS( a DECIMAL(6,3) PATH '$.a' NULL ON ERROR)) as tt;
| a     |
| 1.000 |
| 0.000 |

I would expect the second row to have NULL, not 0.

=== Lack of test coverage ===
Other parts, like NESTED PATH, also have no test coverage at all. I think this
must be addressed before this is done.

=== No error on Invalid Json input ===
If I pass invalid JSON, I get no warning or error or anything:

MariaDB [test]> select * from json_table('[{"a": 1, invalid-json', '$[*]' COLUMNS( a INT PATH '$.a')) as tt;
| a    |
|    1 |
1 row in set (0.001 sec)

=== Recursive ON-rule in the grammar ===
The json_opt_on_empty_or_error rule in sql_yacc.yy is recursive and causes the 
following to be accepted:
select * 
  '[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', 
) as tt;

Is this intentional? 

=== No error on invalid LATERAL dependency ===

create table t1 (item_name varchar(32), item_props varchar(1024));
insert into t1 values ('Laptop', '{"color": "black", "price": 1000}');
insert into t1 values ('Jeans',  '{"color": "blue", "price": 50}');

MariaDB [test]> select * from t1 right join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
Empty set (0.000 sec)

The above query cannot be executed as left join execution requires that T is
computed before 1, but t is dependent on t1.  We dont get an error for this

MySQL produces this:

ERROR 3668 (HY000): INNER or LEFT JOIN must be used for LATERAL references made by 'T'

with left join, there's a demo of how the trickery with the optimizer was
successful (and I think one can construct other examples of this):

MariaDB [test]> select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
| item_name | item_props                        | color |
| Laptop    | {"color": "black", "price": 1000} | blue  |
| Jeans     | {"color": "blue", "price": 50}    | blue  |
2 rows in set (0.002 sec)

MariaDB [test]> explain select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1;
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 2    |                                                 |
|    1 | SIMPLE      | T     | ALL  | NULL          | NULL | NULL    | NULL | 40   | Using where; Using join buffer (flat, BNL join) |
2 rows in set (0.001 sec)

=== AS is required with table alias ===

sql_yacc.yy has this:

> +table_function:
> +          JSON_TABLE_SYM '(' expr ',' TEXT_STRING_sys
> +          { ...
> +          }
> +          json_table_columns_clause ')' AS ident_table_alias

Is the 'AS' really required?  MySQL-8 doesn't seem to require it.

Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog

Follow ups