maria-developers team mailing list archive

Thread
Date
Re: Review for: MDEV-17399 Add support for JSON_TABLE, part #4

To: Alexey Botchkov <holyfoot@xxxxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Sun, 10 May 2020 12:15:20 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20200419154951.GC12311@pslp4>
User-agent: Mutt/1.9.4 (2018-02-28)
Hi Alexey,

More review input, based on the latest code:

== It doesn't compile ==

I get the following compiler error and indeed looks the compiler complains
about a real problem there:

/home/psergey/dev-git2/10.5-hf-review/sql/item_jsonfunc.cc:340:15: error: ‘code’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
       my_error(code, MYF(0), JSON_DEPTH_LIMIT, n_param, fname, position);
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

== Rowid support ==

ha_json_table has these functions:

  int rnd_pos(uchar * buf, uchar *pos) { return 1; }
  void position(const uchar *record) {}

That is, saving the rowid does nothing, and attempting to read a row by rowid
returns an error.

It is mandatory for a storage engine to produce distinct rowids for different
rows. One user of this property is Duplicate Weedout optimization. 
Here's a testcase:
https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43

I think it is also mandatory to implement rnd_pos() and be able to read the row
by its rowid. 
One known user of this feature is "filesort over blob columns", but I've
experienced another error when trying to construct a testcase (see the next
section).

How do we implement position and rnd_pos().

=== Solution #1 ===

Let the rowid contain the FOR ORDINALITY number for each Json_nested_path.
This number will be different for each row and also it will identify the row.

I assume that implementing rnd_pos() will be hard, though. We no longer have
pointer to values we've returned for row number N. Should we walk the document
again from the start? This seems inefficient.

Summary: complex and/or inefficient

=== Solution #2 ===

when we have produced the output row, let's write it into a temporary table
(a HEAP or Aria table that is), and return something that identifies the row
we've written.

Ideally that should be the rowid of the row we just wrote, but AFAIU the
storage engine API doesn't allow one to easily get this.
We'll need to either implement such call, or roll our own rowids in form 
of a hidden auto-increment column.

Summary: also there's some work to do, but at least this has an advantage
of being potentially useful to other table functions.

== Text columns not supported? ==

I've accidentally discovered that JSON_TABLE cannot have text columns (this is
not obvious from the code).

select * 
from 
  json_table('[{"color": "blue", "price": 50},
               {"color": "red", "price": 100}]',
             '$[*]' columns( color varchar(100) path '$.color',
                          price text path '$.price'
                        )
             ) as T;

Any idea why? (MySQL supports them btw)

== Multiple nested paths produce -1 as FOR ORDINALITY column ==

set @js=
'
[
  {"color": "blue", "sizes": [1,2,3,4],  "prices" : [10,20]},
  {"color": "red",  "sizes": [10,11,12,13,14],  "prices" : [100,200,300]}
]';

select * 
from 
  json_table(@js,
             '$[*]' columns(
                          color varchar(4) path '$.color',
                          seq0 for ordinality,
                          nested path '$.sizes[*]' 
                            columns (seq1 for ordinality,
                                     size int path '$'),
                          nested path '$.prices[*]' 
                            columns (seq2 for ordinality,
                                     price int path '$')
                        )
             ) as T;
+-------+------+------+------+------------+-------+
| color | seq0 | seq1 | size | seq2       | price |
+-------+------+------+------+------------+-------+
| blue  |    1 |    1 |    1 | 2147483647 |  NULL |
| blue  |    1 |    2 |    2 | 2147483647 |  NULL |
| blue  |    1 |    3 |    3 | 2147483647 |  NULL |
| blue  |    1 |    4 |    4 | 2147483647 |  NULL |
| blue  |    1 |    4 | NULL |          1 |    10 |
| blue  |    1 |    4 | NULL |          2 |    20 |
| red   |    2 |    1 |   10 |          2 |  NULL |
| red   |    2 |    2 |   11 |          2 |  NULL |
| red   |    2 |    3 |   12 |          2 |  NULL |
| red   |    2 |    4 |   13 |          2 |  NULL |
| red   |    2 |    5 |   14 |          2 |  NULL |
| red   |    2 |    5 | NULL |          1 |   100 |
| red   |    2 |    5 | NULL |          2 |   200 |
| red   |    2 |    5 | NULL |          3 |   300 |
+-------+------+------+------+------------+-------+
14 rows in set (0.000 sec)

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
References

Review for: MDEV-17399 Add support for JSON_TABLE, part #3
From: Sergey Petrunia, 2020-04-19