maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12249
Review for: MDEV-17399 Add support for JSON_TABLE, part #5
Hi Alexey,
On Fri, May 15, 2020 at 04:26:02PM +0400, Alexey Botchkov wrote:
> See the next iteration here
> https://github.com/MariaDB/server/commit/692cb566096d61b240ec26e84fc7d3c7d13f024c
> So the ha_json_table;:position() was implemented.
> The size of the reference depends on the nested path depth - each level
> adds some 9 bytes to the initial 5.
> That can be decreased but i decided to keep it simple initially and i doubt
> we're going to have really deep
> nesting in realistic scenarios.
This is ok.
I'm not sure if the server has a limit on the rowid size. Perhaps there is, and
in that case we just need to limit the depth we allow.
> And i'd like to ask you for some testing
> queries here. Or just the model how
> to produce queries that will be using the ::position() extensively.
I've provided a testcase for the position() call in my previous email.
https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43
As for rnd_pos(), one case that I'm aware of is when the query does a
filesort, and the row being sorted either includes a blob, or has the total
length exceeding certain limit.
I've tried constructing an example with blobs, and it fails with an assertion
before the execution reaches rnd_pos() calls:
select *
from
json_table('[{"color": "blue", "price": 50},
{"color": "red", "price": 100},
{"color": "rojo", "price": 10.0},
{"color": "blanco", "price": 11.0}]',
'$[*]' columns( color varchar(100) path '$.color',
price text path '$.price',
seq for ordinality
)
) as T
order by color desc;
fails an assertion:
mysqld: /home/psergey/dev-git/10.5-json/sql/field.cc:8309: virtual int Field_blob::store(const char*, size_t, CHARSET_INFO*): Assertion `marked_for_write_or_computed()' failed.
Once that is fixed, this should use position() and rnd_pos() calls.
This will likely expose more issues with rnd_pos(), see my comments to the
Json_table_nested_path::set_position below.
== A problem with VIEWs ==
mysql> select *
-> from
-> json_table(
-> '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]',
-> '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ;
+-------+
| color |
+-------+
| red |
| black |
+-------+
2 rows in set (6.34 sec)
Good so far. Now, let's try creating a VIEW from this:
create view v1 as
select *
from
json_table(
'[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]',
'$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ;
select * from v1;
ERROR 4041 (HY000): Unexpected end of JSON path in argument 2 to function 'JSON_TABLE'
Examining the .frm file, I see something that looks like garbage data:
query=select `T_A`.`color` AS `color` from JSON_TABLE(\'[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\', \'\'\0<D8>7^A<9C><FF>^?\0\0\0\0\0\0\0\0\0\0\0\0^A<A5><A5><A5><A5><A5>^H<D0>kWUU\0\0[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\0$[*]\'\' COLUMNS (`color` PATH \'^A<9C><FF>^?\0\0\0\0\0\0^A\0\0\0\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>
<A5><A5><A5><A5><A5><A5> D^A<9C><FF>^?\0\0H:^A<9C><FF>^?\0\0<A5><A5><A5><A5><A5><A5><A5><A5>100\0<8F><8F><8F><8F>$.color\')) T_A where `T_A`.`color` <> \'azul\'
EXPLAIN EXTENDED ...; SHOW WARNINGS; - also print something odd.
== EXPLAIN [FORMAT=JSON] ==
I think EXPLAIN output should provide an indication that a table function is
used.
MySQL does it like so:
<TODO>
== Unneeded recursive rules in grammar ==
I've already complained about this:
the grammar has recursive ON EMPTY, ON ERROR rules, which cause the following
to be accepted (note the two ON EMPTY clauses) :
select *
from
json_table('[{"color": "blue", "price": 50},{"color": "red"}]',
'$[*]' columns(
price varchar(255) path '$.price'
default 'xyz' on empty
default 'abc' on empty
)
) as T;
> commit 692cb566096d61b240ec26e84fc7d3c7d13f024c
> Author: Alexey Botchkov <holyfoot@xxxxxxxxxxxx>
> Date: Fri May 15 15:25:28 2020 +0400
>
> MDEV-17399 Add support for JSON_TABLE.
>
> Syntax for JSON_TABLE added.
> The ha_json_table handler added. Executing the JSON_TABLE we
> create the temporary table of the ha_json_table, add dependencies
> of other tables and sometimes conditions to WHERE.
> and sometimes conditions to WHERE.
I think this is not true anymore?
...
> diff --git a/include/my_base.h b/include/my_base.h
> index 7efa5eb9673..89ef3e8e7c1 100644
> --- a/include/my_base.h
> +++ b/include/my_base.h
> @@ -523,6 +523,13 @@ enum ha_base_keytype {
> #define HA_ERR_TABLESPACE_MISSING 194 /* Missing Tablespace */
> #define HA_ERR_SEQUENCE_INVALID_DATA 195
> #define HA_ERR_SEQUENCE_RUN_OUT 196
> +
> +/*
> + Share the error code to not increment the HA_ERR_LAST for now,
> + as it disturbs some storage engine's tests.
> + Probably should be fixed later.
> +*/
> +#define HA_ERR_INVALID_JSON HA_ERR_TABLE_IN_FK_CHECK
We definitely can't have this in the final patch.
We need to either:
A. Use an engine-specific error code. Check out MyRocks and
ha_rocksdb::get_error_message() for an example of how this is done
B. Indeed introduce a generic "Invalid JSON" error code.
I'm hesitant about B, let's discuss this with other developers.
...
> diff --git a/sql/table_function.cc b/sql/table_function.cc
> new file mode 100644
> index 00000000000..71f2378ce7d
> --- /dev/null
> +++ b/sql/table_function.cc
...
> +
> +class ha_json_table: public handler
> +{
Please add comments about these
> +protected:
> + Table_function_json_table *m_jt;
> + String m_tmps;
> + String *m_js;
> + uchar *m_cur_pos;
> +public:
> + ha_json_table(TABLE_SHARE *share_arg, Table_function_json_table *jt):
> + handler(&table_function_hton.m_hton, share_arg), m_jt(jt)
> + {
> + /*
> + set the mark_trx_read_write_done to avoid the
> + handler::mark_trx_read_write_internal() call.
> + It relies on &ha_thd()->ha_data[ht->slot].ha_info[0] to be set.
> + But we don't set the ha_data for the ha_json_table, and
> + that call makes no sence for ha_json_table.
> + */
> + mark_trx_read_write_done= 1;
> + ref_length= (jt->m_depth+1) * (4 + 1) +
> + jt->m_depth * (sizeof(Json_table_nested_path *));
> + }
> + ~ha_json_table() {}
> + handler *clone(const char *name, MEM_ROOT *mem_root) { return NULL; }
> + const char *index_type(uint inx) { return "NONE"; }
> + /* Rows also use a fixed-size format */
> + enum row_type get_row_type() const { return ROW_TYPE_FIXED; }
> + ulonglong table_flags() const
> + {
> + return (HA_FAST_KEY_READ | /*HA_NO_BLOBS |*/ HA_NULL_IN_KEY |
> + HA_CAN_SQL_HANDLER |
> + HA_REC_NOT_IN_SEQ | HA_NO_TRANSACTIONS |
> + HA_HAS_RECORDS | HA_CAN_HASH_KEYS);
> + }
> + ulong index_flags(uint inx, uint part, bool all_parts) const
> + {
> + return HA_ONLY_WHOLE_INDEX | HA_KEY_SCAN_NOT_ROR;
> + }
> + uint max_supported_keys() const { return 1; }
> + uint max_supported_key_part_length() const { return MAX_KEY_LENGTH; }
> + double scan_time() { return 1000000.0; }
> + double read_time(uint index, uint ranges, ha_rows rows) { return 0.0; }
The above two functions are never called.
Please
* add a comment saying that.
* add a DBUG_ASSERT() into them to back the point made by the comment :-)
> + int open(const char *name, int mode, uint test_if_locked);
> + int close(void) { return 0; }
> + int rnd_init(bool scan);
> + int rnd_next(uchar *buf);
> + int rnd_pos(uchar * buf, uchar *pos);
> + void position(const uchar *record);
> + int can_continue_handler_scan() { return 1; }
> + int info(uint);
> + int extra(enum ha_extra_function operation);
> + THR_LOCK_DATA **store_lock(THD *thd, THR_LOCK_DATA **to,
> + enum thr_lock_type lock_type)
> + { return NULL; }
> + int create(const char *name, TABLE *form, HA_CREATE_INFO *create_info)
> + { return 1; }
> +private:
> + void update_key_stats();
> +};
> +
...
> +void Json_table_nested_path::set_position(const char *j_start, const uchar *pos)
> +{
* This needs a comment.
* I don't see where the value of m_ordinality_counter is restored?
* The same about m_null - its value is not restored either?
> + if (m_nested)
> + {
> + memcpy(&m_cur_nested, pos, sizeof(m_cur_nested));
> + pos+= sizeof(m_cur_nested);
> + }
> +
> + m_engine.s.c_str= (uchar *) j_start + sint4korr(pos);
> + m_engine.state= (int) pos[4];
> + if (m_cur_nested)
> + m_cur_nested->set_position(j_start, pos);
> +}
> +
...
> +int ha_json_table::info(uint)
> +{
> + /* We don't want 0 or 1 in stats.records. */
> + stats.records= 4;
> + return 0;
Does this value matter? As far as I understand it doesn't, as the optimizer
will use the estimates obtained from Table_function_json_table::get_estimates.
Please add a comment about this.
> +}
> +
...
Please document this function. What's last_column? Why does print() method get
it and return it?
> +int Json_table_nested_path::print(THD *thd, TABLE_LIST *sql_table, String *str,
> + List_iterator_fast<Json_table_column> &it,
> + Json_table_column **last_column)
> +{
> + Json_table_nested_path *c_path= this;
...
> diff --git a/sql/table_function.h b/sql/table_function.h
> new file mode 100644
> index 00000000000..49d650cb7b2
> --- /dev/null
> +++ b/sql/table_function.h
> @@ -0,0 +1,168 @@
...
> +#include <json_lib.h>
> +
> +/*
> + The Json_table_nested_path represents the 'current nesting' level
> + for a set of JSON_TABLE columns.
> + Each column (Json_table_column instance) is linked with corresponding
> + 'nested path' object and gets it's piece of JSON to parse during the computation
> + phase.
> + The root 'nested_path' is always present as a part of Table_function_json_table,
> + then other 'nested_paths' can be created and linked into a tree structure when new
> + 'NESTED PATH' is met. The nested 'nested_paths' are linked with 'm_nested', the same-level
> + 'nested_paths' are linked with 'm_next_nested'.
> + So for instance
> + JSON_TABLE( '...', '$[*]'
> + COLUMNS( a INT PATH '$.a' ,
> + NESTED PATH '$.b[*]' COLUMNS (b INT PATH '$',
> + NESTED PATH '$.c[*]' COLUMNS(x INT PATH '$')),
> + NESTED PATH '$.n[*]' COLUMNS (z INT PAHT '$'))
> + results in 4 'nested_path' created:
> + root nested_b nested_c nested_n
> + m_path '$[*]' '$.b[*]' '$.c[*]' '$.n[*]
> + m_nested &nested_b &nested_c NULL NULL
> + n_next_nested NULL &nested_n NULL NULL
> +
> +and 4 columns created:
> + a b x z
> + m_nest &root &nested_b &nested_c &nested_n
> +*/
> +
> +
> +class Json_table_column;
> +
> +class Json_table_nested_path : public Sql_alloc
> +{
> +public:
> + bool m_null;
> + json_path_t m_path;
> + json_engine_t m_engine;
> + json_path_t m_cur_path;
> +
> + /* Counts the rows produced. Value is set to the FOR ORDINALITY coluns */
> + longlong m_ordinality_counter;
> +
> + Json_table_nested_path *m_parent;
> + Json_table_nested_path *m_nested, *m_next_nested;
> + Json_table_nested_path **m_nested_hook;
> + Json_table_nested_path *m_cur_nested;
Please add documentation about the above members. I see the diagram above, but
I think text descriptions are also needed for each member. What's m_nested_hook?
> + Json_table_nested_path(Json_table_nested_path *parent_nest):
> + m_parent(parent_nest), m_nested(0), m_next_nested(0),
> + m_nested_hook(&m_nested) {}
> + int set_path(THD *thd, const LEX_CSTRING &path);
> + void scan_start(CHARSET_INFO *i_cs, const uchar *str, const uchar *end);
> + int scan_next();
> + int print(THD *thd, TABLE_LIST *sql_table, String *str,
> + List_iterator_fast<Json_table_column> &it,
> + Json_table_column **last_column);
> + void get_current_position(const char *j_start, uchar *pos) const;
> + void set_position(const char *j_start, const uchar *pos);
> +};
...
> +class Table_function_json_table : public Sql_alloc
Please document the class and the data members.
> +{
> +public:
> + Item *m_json;
> + Json_table_nested_path m_nested_path;
> + List<Json_table_column> m_columns;
> + table_map m_dep_tables;
> + uint m_depth, m_cur_depth;
> +
> + Table_function_json_table(Item *json): m_json(json), m_nested_path(0),
> + m_depth(0), m_cur_depth(0) {}
> +
> + /*
> + Used in sql_yacc.yy.
> + Represents the current NESTED PATH level being parsed.
> + */
> + Json_table_nested_path *m_sql_nest;
> + void add_nested(Json_table_nested_path *np);
> + void leave_nested();
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
References