maria-developers team mailing list archive

Thread
Date

Re: 734aee3cd3c: MDEV-27018 IF and COALESCE lose "json" property

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Tue, 28 Dec 2021 23:29:32 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <YctTKu9wKMJqsXv6@pweza>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

Hello Sergei,

Thanks for your review!

Please find comments below:


On 12/28/21 10:10 PM, Sergei Golubchik wrote:

Hi, Alexander!

On Dec 28, Alexander Barkov wrote:

revision-id: 734aee3cd3c (mariadb-10.5.13-33-g734aee3cd3c)
parent(s): 2776635cb98
author: Alexander Barkov
committer: Alexander Barkov
timestamp: 2021-12-27 17:54:31 +0400
message:

MDEV-27018 IF and COALESCE lose "json" property

Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property
from their arguments. The same problem was repeatable for single row subselects.

The problem happened because the method Item::is_json_type() was inconsistently
implemented across the Item hierarchy. For example, Item_hybrid_func
and Item_singlerow_subselect did not override is_json_type().


Okay. The approach is fine, but see some questions below:

diff --git a/sql/field.cc b/sql/field.cc
index 2c768527ced..8fa3bbd538c 100644
--- a/sql/field.cc
+++ b/sql/field.cc
@@ -7277,6 +7277,19 @@ bool Field_longstr::send(Protocol *protocol)
  }

+const Type_handler *Field_string::type_handler() const

+{
+  if (is_var_string())
+    return &type_handler_var_string;
+  /*
+    This is a temporary solution and will be fixed soon (in 10.9?).
+    Type_handler_string_json will provide its own Field_string_json.


why?



I'd like to split JSON off eventually:

- To turn JSON into a real pluggable data type,
  so it can be compiled as a dynamic plugin.

- For performance purposes, to avoid this constraint tests
  for "normal" CHAR/VARCHAR/TEXT fields.

- We'll probably have more FORMATs in the future.
  Adding all format tests inside "normal"
  Field_string/Field_varchar/Field_blob
  is not a good idea.

+  */
+  if (Type_handler_json_common::has_json_valid_constraint(this))


could you please use names that don't depend on how exactly we detect
json? especially if the goal is to move to FORMAT JSON syntax


For now this name reflects what it actually does.

When we split JSON off and fix the code to have JSON put extra
data type information into FRM, this test will be gone from
Field_xxx.

But it will move to the code opening FRM, to be able open
old tables where JSON is detected only by a CHECK
constraint. And there it will reflect what it actually
does again.

I propose to keep it as is.

+   return &type_handler_string_json;
+  return &type_handler_string;
+}
+
  	/* Copy a string and fill with space */

int Field_string::store(const char *from, size_t length,CHARSET_INFO *cs)

diff --git a/sql/sql_type.cc b/sql/sql_type.cc
index c1801c1ae3e..3b2753d80a6 100644
--- a/sql/sql_type.cc
+++ b/sql/sql_type.cc
@@ -1755,19 +1755,110 @@ const Type_handler *Type_handler_typelib::cast_to_int_type_handler() const

/***************************************************************************/+class Recursive_type_pair_iterator

+{
+  const Type_handler *m_a;
+  const Type_handler *m_b;
+  uint m_switched_to_base_count;
+public:
+  Recursive_type_pair_iterator(const Type_handler *a,
+                               const Type_handler *b,
+                               uint switched_to_base_count= 0)
+   :m_a(a), m_b(b), m_switched_to_base_count(switched_to_base_count)
+  { }
+  const Type_handler *a() const { return m_a; }
+  const Type_handler *b() const { return m_b; }
+  Recursive_type_pair_iterator base() const
+  {
+    Recursive_type_pair_iterator res(m_a->type_handler_base(),
+                                     m_b->type_handler_base());
+    res.m_switched_to_base_count= (res.m_a != NULL) + (res.m_b != NULL);
+    if (res.m_a == NULL)
+      res.m_a= m_a;
+    if (res.m_b == NULL)
+      res.m_b= m_b;
+    return res;
+  }
+  bool done() const
+  {
+    switch (m_switched_to_base_count)
+    {
+    case 2:
+      /*
+        Expression:
+          COALESCE(TYPE1, TYPE2)
+
+        where both type handlers derive from some other handlers, e.g.
+          VARCHAR -> TYPE1
+          TEXT    -> TYPE2
+
+        Continue aggregation as:
+          SELECT COALESCE(VARCHAR, TEXT)
+
+        Note, we now have only one level of data type inheritance,
+          e.g. VARCHAR -> VARCHAR/JSON
+
+        This code branch will change when we have longer inheritance chains:
+          A0 -> A1 -> A2 -> A3
+          B0 -> B1 -> B2
+
+        Functions like COALESE(A2, B2) will need to do full overload resolution:
+        - iterate through all pairs from the below matrix
+        - find all existing candidates
+        - resolve ambiguities in case multiple candidates, etc.


and I don't see how you're going to do that. you need to try
m_a+m_b.base(), m_a.base()+m_b, and m_a.base()+m_b.base().
And that's not what you're doing.

I suggest to add an assert that m_switched_to_base_count < 2
and stop trying to solve complex cases until we have at least one.


Yes, we'll need to do all these combinations eventually.
But I'm not trying to solve a more complex case
than it's really necessary now:


If the SQL query is:

  SELECT varchar_json_expr + text_json_expr FROM t1;

it works as follows:

- It tries VARCHAR/JSON+TEXT/JSON in Type_collection_json,
  nothing is found

- Then it tries directly VARCHAR+TEXT,
  which is m_a.base()+m_b.base(), and this
  combination is found in Type_collection_std,
  so it effectively performs:

    SELECT varchar_expr + text_expr FROM t1;

  In this case m_switched_to_base_count==2.
  The assert m_switched_to_base_count < 2 won't be correct.

  For now it does not test these pairs:
     m_a + m_b.base(), i.e. VARCHAR/JSON + TEXT
     m_a.base() + m_b, i.e. VARCHAR + TEXT/JSON
  I agree, we can start testing these pairs when
  we have a real more complex case.

  I believe the code is fine for now.

+
+            A0 A1 A2 A3
+         B0  .  .  .  .
+         B1  .  .  .  .
+         B2  .  .  .  .
+      */
+      return false;
+
+    case 1:
+      /*
+         Expression:
+           COALESCE(TYPE1, TEXT)
+
+         If only one handler derives from some other handler:
+           VARCHAR-> TYPE1
+
+         Continue aggregation as:
+           SELECT COALESCE(VARCHAR, TEXT)
+      */
+      return false;
+
+    case 0:
+    default:
+      /*
+        Non of the two handlers are derived from other handlers.
+        Nothing left to try.
+      */
+      return true;
+    }
+  }
+};

diff --git a/sql/sql_type_json.h b/sql/sql_type_json.h
index 6c4ee8cb2eb..4a394809a06 100644
--- a/sql/sql_type_json.h
+++ b/sql/sql_type_json.h
@@ -21,18 +21,145 @@

...

+  bool Item_append_extended_type_info(Send_field_extended_metadata *to,
+                                      const Item *item) const
+  {
+    return set_format_name(to); // Send "format=json" in the protocol
+  }
+
+  bool Item_hybrid_func_fix_attributes(THD *thd,
+                                       const char *name,
+                                       Type_handler_hybrid_field_type *hybrid,
+                                       Type_all_attributes *attr,
+                                       Item **items, uint nitems)
+                                       const override
+  {
+    if (BASE::Item_hybrid_func_fix_attributes(thd, name, hybrid, attr,
+                                              items, nitems))
+      return true;
+    /*
+      The above call can change the type handler on "hybrid", e.g.
+      choose a proper BLOB type handler according to the calculated max_length.
+      Convert general purpose string type handler to its JSON counterpart.
+      This makes hybrid functions preserve JSON data types, e.g.:
+        COALESCE(json_expr1, json_expr2) -> JSON
+    */
+    hybrid->set_handler(json_type_handler_from_generic(hybrid->type_handler()));


I don't get it. It's supposed to "fix attributes", like signed/unsigned,
max length, etc. The correct type handler should've been set already,
otherwise this type handler method wouldn't even be called.



That true, it looks confusing. I also would prefer it did not change the

data type after fixing attributes. But unfortunately, the handler canchange between BLOB variants in the end.


For "normal" BLOBs it works as follows

- First, the upper level code detects that the result is going to be
  some BLOB. But it does not know yet the exact BLOB variant,
  because it does not know max_length.
  So it chooses the largest one - Type_handler_long_blob.

- Then Type_handler_blob_common::Item_hybrid_func_fix_attributes
  is called, which calculates max_length, and then switches to a
  proper BLOB variant in the end, the smallest possible:


bool Type_handler_blob_common::
       Item_hybrid_func_fix_attributes(THD *thd,
                                       const LEX_CSTRING &func_name,

Type_handler_hybrid_field_type*handler,

                                       Type_all_attributes *func,
                                       Item **items, uint nitems) const
{
  if (func->aggregate_attributes_string(func_name, items, nitems))
    return true;
  handler->set_handler(blob_type_handler(func->max_length));
  return false;
}


For JSON I just call the inherited Item_hybrid_func_fix_attributes,
and then translate the optionally changed "normal" BLOB variant to
the corresponding BLOB/JSON variant.


Possibly, we'll eventually get rid of separate four handlers:

- Type_handler_tiny_blob
- Type_handler_blob
- Type_handler_medium_blob
- Type_handler_long_blob

and have just one handing all BLOB variants
(and just one handling all BLOB/JSON variants).

I seems it can be less confusing and more natural.

+    return false;
+  }
  };

+

+class Type_handler_string_json:
+  public Type_handler_general_purpose_string_to_json<Type_handler_string,
+                                                     type_handler_string>
+{ };
diff --git a/sql/sql_type_json.cc b/sql/sql_type_json.cc
index a804366ec03..a84f720bcc9 100644
--- a/sql/sql_type_json.cc
+++ b/sql/sql_type_json.cc
@@ -20,17 +20,111 @@
  #include "sql_class.h"

-Type_handler_json_longtext type_handler_json_longtext;

+Named_type_handler<Type_handler_string_json>
+  type_handler_string_json("char/json");


do you mean these names are not shown anywhere?


They are shown in error messages, e.g.:

Illegal parameter data types inet6 and longblob/json for operation '+'

+Named_type_handler<Type_handler_varchar_json>
+  type_handler_varchar_json("varchar/json");
+
+Named_type_handler<Type_handler_tiny_blob_json>
+  type_handler_tiny_blob_json("tinyblob/json");
+
+Named_type_handler<Type_handler_blob_json>
+  type_handler_blob_json("blob/json");
+
+Named_type_handler<Type_handler_medium_blob_json>
+  type_handler_medium_blob_json("mediumblob/json");
+
+Named_type_handler<Type_handler_long_blob_json>
+  type_handler_long_blob_json("longblob/json");

...

+/*
+  This method ressembles what Field_blob::type_handler()


resembles
(also below)


Thanks. Will fix.

+  does for general purpose BLOB type handlers.
+*/
+const Type_handler *
+Type_handler_json_common::json_blob_type_handler_by_length_bytes(uint len)
+{
+  switch (len) {
+  case 1: return &type_handler_tiny_blob_json;
+  case 2: return &type_handler_blob_json;
+  case 3: return &type_handler_medium_blob_json;
+  }
+  return &type_handler_long_blob_json;
+}


Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx

Follow ups

Re: 734aee3cd3c: MDEV-27018 IF and COALESCE lose "json" property
From: Sergei Golubchik, 2022-01-03

References

Re: 734aee3cd3c: MDEV-27018 IF and COALESCE lose "json" property
From: Sergei Golubchik, 2021-12-28