← Back to team overview

maria-developers team mailing list archive

Review for: MDEV-17399 Add support for JSON_TABLE, part #5

 

Hi Alexey,

On Fri, May 15, 2020 at 04:26:02PM +0400, Alexey Botchkov wrote:
> See the next iteration here
> https://github.com/MariaDB/server/commit/692cb566096d61b240ec26e84fc7d3c7d13f024c

> So the ha_json_table;:position() was implemented.
> The size of the reference depends on the nested path depth - each level
> adds some 9 bytes to the initial 5.
> That can be decreased but i decided to keep it simple initially and i doubt
> we're going to have really deep
> nesting in realistic scenarios. 

This is ok. 
I'm not sure if the server has a limit on the rowid size. Perhaps there is, and
in that case we just need to limit the depth we allow.

> And i'd like to ask you for some testing
> queries here. Or just the model how
> to produce queries that will be using the ::position() extensively.

I've provided a testcase for the position() call in my previous email.
https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43

As for rnd_pos(), one case that I'm aware of is when the query does a
filesort, and the row being sorted either includes a blob, or has the total
length exceeding certain limit.

I've tried constructing an example with blobs, and it fails with an assertion
before the execution reaches rnd_pos() calls:

select * 
from 
  json_table('[{"color": "blue", "price": 50},
               {"color": "red", "price": 100},
               {"color": "rojo", "price": 10.0},
               {"color": "blanco", "price": 11.0}]',
             '$[*]' columns( color varchar(100) path '$.color',
                          price text path '$.price',
                          seq for ordinality
                        )
             ) as T
order by color desc;

fails an assertion: 
  mysqld: /home/psergey/dev-git/10.5-json/sql/field.cc:8309: virtual int Field_blob::store(const char*, size_t, CHARSET_INFO*): Assertion `marked_for_write_or_computed()' failed.

Once that is fixed, this should use position() and rnd_pos() calls.
This will likely expose more issues with rnd_pos(), see my comments to the 
Json_table_nested_path::set_position below.

== A problem with VIEWs ==

mysql> select * 
    -> from 
    ->   json_table(
    ->   '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]',
    ->   '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ;
+-------+
| color |
+-------+
| red   |
| black |
+-------+
2 rows in set (6.34 sec)

Good so far. Now, let's try creating a VIEW from this:

create view v1 as
select * 
from 
  json_table(
  '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]',
  '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ;

select * from v1;
ERROR 4041 (HY000): Unexpected end of JSON path in argument 2 to function 'JSON_TABLE'

Examining the .frm file, I see something that looks like garbage data:

query=select `T_A`.`color` AS `color` from JSON_TABLE(\'[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\', \'\'\0<D8>7^A<9C><FF>^?\0\0\0\0\0\0\0\0\0\0\0\0^A<A5><A5><A5><A5><A5>^H<D0>kWUU\0\0[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\0$[*]\'\' COLUMNS (`color` PATH \'^A<9C><FF>^?\0\0\0\0\0\0^A\0\0\0\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>
<A5><A5><A5><A5><A5><A5> D^A<9C><FF>^?\0\0H:^A<9C><FF>^?\0\0<A5><A5><A5><A5><A5><A5><A5><A5>100\0<8F><8F><8F><8F>$.color\')) T_A where `T_A`.`color` <> \'azul\'

EXPLAIN EXTENDED ...; SHOW WARNINGS; - also print something odd.

== EXPLAIN [FORMAT=JSON] ==

I think EXPLAIN output should provide an indication that a table function is
used.

MySQL does it like so:
<TODO>

== Unneeded recursive rules in grammar ==

I've already complained about this: 

the grammar has recursive ON EMPTY, ON ERROR rules, which cause the following
to be accepted (note the two ON EMPTY clauses) :

select *  
from
  json_table('[{"color": "blue", "price": 50},{"color": "red"}]',
             '$[*]' columns( 
                       price varchar(255) path '$.price' 
                          default 'xyz' on empty 
                          default 'abc' on empty
                    )
    ) as T; 
 
> commit 692cb566096d61b240ec26e84fc7d3c7d13f024c
> Author: Alexey Botchkov <holyfoot@xxxxxxxxxxxx>
> Date:   Fri May 15 15:25:28 2020 +0400
> 
>     MDEV-17399 Add support for JSON_TABLE.
>     
>     Syntax for JSON_TABLE added.
>     The ha_json_table handler added. Executing the JSON_TABLE we
>     create the temporary table of the ha_json_table, add dependencies
>     of other tables and sometimes conditions to WHERE.

> and sometimes conditions to WHERE.

I think this is not true anymore? 

... 
> diff --git a/include/my_base.h b/include/my_base.h
> index 7efa5eb9673..89ef3e8e7c1 100644
> --- a/include/my_base.h
> +++ b/include/my_base.h
> @@ -523,6 +523,13 @@ enum ha_base_keytype {
>  #define HA_ERR_TABLESPACE_MISSING 194  /* Missing Tablespace */
>  #define HA_ERR_SEQUENCE_INVALID_DATA 195
>  #define HA_ERR_SEQUENCE_RUN_OUT   196
> +
> +/*
> +  Share the error code to not increment the HA_ERR_LAST for now,
> +  as it disturbs some storage engine's tests.
> +  Probably should be fixed later.
> +*/
> +#define HA_ERR_INVALID_JSON       HA_ERR_TABLE_IN_FK_CHECK

We definitely can't have this in the final patch.

We need to either:
A. Use an engine-specific error code. Check out MyRocks and
   ha_rocksdb::get_error_message() for an example of how this is done
B. Indeed introduce a generic "Invalid JSON" error code.

I'm hesitant about B, let's discuss this with other developers.

...
> diff --git a/sql/table_function.cc b/sql/table_function.cc
> new file mode 100644
> index 00000000000..71f2378ce7d
> --- /dev/null
> +++ b/sql/table_function.cc
...
> +
> +class ha_json_table: public handler
> +{

Please add comments about these 
> +protected:
> +  Table_function_json_table *m_jt;
> +  String m_tmps;
> +  String *m_js;
> +  uchar *m_cur_pos;
> +public:
> +  ha_json_table(TABLE_SHARE *share_arg, Table_function_json_table *jt):
> +    handler(&table_function_hton.m_hton, share_arg), m_jt(jt)
> +  {
> +    /*
> +      set the mark_trx_read_write_done to avoid the
> +      handler::mark_trx_read_write_internal() call.
> +      It relies on &ha_thd()->ha_data[ht->slot].ha_info[0] to be set.
> +      But we don't set the ha_data for the ha_json_table, and
> +      that call makes no sence for ha_json_table.
> +   */
> +    mark_trx_read_write_done= 1;
> +    ref_length= (jt->m_depth+1) * (4 + 1) +
> +                jt->m_depth * (sizeof(Json_table_nested_path *));
> +  }
> +  ~ha_json_table() {}
> +  handler *clone(const char *name, MEM_ROOT *mem_root) { return NULL; }
> +  const char *index_type(uint inx) { return "NONE"; }
> +  /* Rows also use a fixed-size format */
> +  enum row_type get_row_type() const { return ROW_TYPE_FIXED; }
> +  ulonglong table_flags() const
> +  {
> +    return (HA_FAST_KEY_READ | /*HA_NO_BLOBS |*/ HA_NULL_IN_KEY |
> +            HA_CAN_SQL_HANDLER |
> +            HA_REC_NOT_IN_SEQ | HA_NO_TRANSACTIONS |
> +            HA_HAS_RECORDS | HA_CAN_HASH_KEYS);
> +  }
> +  ulong index_flags(uint inx, uint part, bool all_parts) const
> +  {
> +    return HA_ONLY_WHOLE_INDEX | HA_KEY_SCAN_NOT_ROR;
> +  }
> +  uint max_supported_keys() const { return 1; }
> +  uint max_supported_key_part_length() const { return MAX_KEY_LENGTH; }
> +  double scan_time() { return 1000000.0; }
> +  double read_time(uint index, uint ranges, ha_rows rows) { return 0.0; }

The above two functions are never called.
Please
* add a comment saying that.
* add a DBUG_ASSERT() into them to back the point made by the comment :-)

> +  int open(const char *name, int mode, uint test_if_locked);
> +  int close(void) { return 0; }
> +  int rnd_init(bool scan);
> +  int rnd_next(uchar *buf);
> +  int rnd_pos(uchar * buf, uchar *pos);
> +  void position(const uchar *record);
> +  int can_continue_handler_scan() { return 1; }
> +  int info(uint);
> +  int extra(enum ha_extra_function operation);
> +  THR_LOCK_DATA **store_lock(THD *thd, THR_LOCK_DATA **to,
> +			     enum thr_lock_type lock_type)
> +    { return NULL; }
> +  int create(const char *name, TABLE *form, HA_CREATE_INFO *create_info)
> +    { return 1; }
> +private:
> +  void update_key_stats();
> +};
> +
...

> +void Json_table_nested_path::set_position(const char *j_start, const uchar *pos)
> +{

* This needs a comment.
* I don't see where the value of m_ordinality_counter is restored?
* The same about m_null - its value is not restored either?

> +  if (m_nested)
> +  {
> +    memcpy(&m_cur_nested, pos, sizeof(m_cur_nested));
> +    pos+= sizeof(m_cur_nested);
> +  }
> +
> +  m_engine.s.c_str= (uchar *) j_start + sint4korr(pos);
> +  m_engine.state= (int) pos[4];
> +  if (m_cur_nested)
> +    m_cur_nested->set_position(j_start, pos);
> +}
> +
...

> +int ha_json_table::info(uint)
> +{
> +  /* We don't want 0 or 1 in stats.records. */
> +  stats.records= 4;
> +  return 0;

Does this value matter? As far as I understand it doesn't, as the optimizer
will use the estimates obtained from Table_function_json_table::get_estimates.
Please add a comment about this.

> +}
> +
...

Please document this function. What's last_column? Why does print() method get
it and return it?

> +int Json_table_nested_path::print(THD *thd, TABLE_LIST *sql_table, String *str,
> +                                  List_iterator_fast<Json_table_column> &it,
> +                                  Json_table_column **last_column)
> +{
> +  Json_table_nested_path *c_path= this;
...

> diff --git a/sql/table_function.h b/sql/table_function.h
> new file mode 100644
> index 00000000000..49d650cb7b2
> --- /dev/null
> +++ b/sql/table_function.h
> @@ -0,0 +1,168 @@
...
> +#include <json_lib.h>
> +
> +/*
> +  The Json_table_nested_path represents the 'current nesting' level
> +  for a set of JSON_TABLE columns.
> +  Each column (Json_table_column instance) is linked with corresponding
> +  'nested path' object and gets it's piece of JSON to parse during the computation
> +  phase.
> +  The root 'nested_path' is always present as a part of Table_function_json_table,
> +  then other 'nested_paths' can be created and linked into a tree structure when new
> +  'NESTED PATH' is met. The nested 'nested_paths' are linked with 'm_nested', the same-level
> +  'nested_paths' are linked with 'm_next_nested'.
> +  So for instance
> +    JSON_TABLE( '...', '$[*]'
> +       COLUMNS( a INT PATH '$.a' ,
> +          NESTED PATH '$.b[*]' COLUMNS (b INT PATH '$',
> +                                        NESTED PATH '$.c[*]' COLUMNS(x INT PATH '$')),
> +          NESTED PATH '$.n[*]' COLUMNS (z INT PAHT '$'))
> +  results in 4 'nested_path' created:
> +                 root          nested_b       nested_c     nested_n
> +  m_path           '$[*]'         '$.b[*]'        '$.c[*]'     '$.n[*]
> +  m_nested          &nested_b     &nested_c       NULL         NULL
> +  n_next_nested     NULL          &nested_n       NULL         NULL
> +
> +and 4 columns created:
> +              a          b            x            z
> +  m_nest    &root      &nested_b    &nested_c    &nested_n
> +*/
> +
> +
> +class Json_table_column;
> +
> +class Json_table_nested_path : public Sql_alloc
> +{
> +public:
> +  bool m_null;
> +  json_path_t m_path;
> +  json_engine_t m_engine;
> +  json_path_t m_cur_path;
> +
> +  /* Counts the rows produced.  Value is set to the FOR ORDINALITY coluns */
> +  longlong m_ordinality_counter;
> +
> +  Json_table_nested_path *m_parent;
> +  Json_table_nested_path *m_nested, *m_next_nested;
> +  Json_table_nested_path **m_nested_hook;
> +  Json_table_nested_path *m_cur_nested;

Please add documentation about the above members. I see the diagram above, but
I think text descriptions are also needed for each member.  What's m_nested_hook?

> +  Json_table_nested_path(Json_table_nested_path *parent_nest):
> +    m_parent(parent_nest), m_nested(0), m_next_nested(0),
> +    m_nested_hook(&m_nested) {}
> +  int set_path(THD *thd, const LEX_CSTRING &path);
> +  void scan_start(CHARSET_INFO *i_cs, const uchar *str, const uchar *end);
> +  int scan_next();
> +  int print(THD *thd, TABLE_LIST *sql_table, String *str,
> +            List_iterator_fast<Json_table_column> &it,
> +            Json_table_column **last_column);
> +  void get_current_position(const char *j_start, uchar *pos) const;
> +  void set_position(const char *j_start, const uchar *pos);
> +};
...


> +class Table_function_json_table : public Sql_alloc

Please document the class and the data members.

> +{
> +public:
> +  Item *m_json;
> +  Json_table_nested_path m_nested_path;
> +  List<Json_table_column> m_columns;
> +  table_map m_dep_tables;
> +  uint m_depth, m_cur_depth;
> +
> +  Table_function_json_table(Item *json): m_json(json), m_nested_path(0),
> +    m_depth(0), m_cur_depth(0) {}
> +
> +  /*
> +    Used in sql_yacc.yy.
> +    Represents the current NESTED PATH level being parsed.
> +  */
> +  Json_table_nested_path *m_sql_nest;
> +  void add_nested(Json_table_nested_path *np);
> +  void leave_nested();

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog




References