maria-developers team mailing list archive

Thread
Date
Review for: MDEV-27036: Add asserts to detect duplicate JSON keys

To: Sergei Krivonos <sergei.krivonos@xxxxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Wed, 24 Nov 2021 18:44:28 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.9.4 (2018-02-28)
Hi Sergei,

Please find my input below.

> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index 0dfe95e81b0..fa07608177a 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -7910,28 +7910,27 @@ best_access_path(JOIN      *join,
>                  can make an adjustment is a special case of the criteria used
>                  in ReuseRangeEstimateForRef-3.
>                */
> -              if (table->opt_range_keys.is_set(key) &&
> +              auto use_range_estimates= table->opt_range_keys.is_set(key) &&
>                    (const_part &

What is the reason for this change? Please restore the old code.

>                     (((key_part_map)1 << table->opt_range[key].key_parts)-1)) ==
>                    (((key_part_map)1 << table->opt_range[key].key_parts)-1) &&
>                    table->opt_range[key].ranges == 1 &&
> -                  records > (double) table->opt_range[key].rows)
> +                  records > (double) table->opt_range[key].rows;
> +              if (use_range_estimates)
>                {
>                  records= (double) table->opt_range[key].rows;
>                  trace_access_idx.add("used_range_estimates", "clipped down");
>                }
>                else
>                {
> +                trace_access_idx.add("used_range_estimates",false);
>                  if (table->opt_range_keys.is_set(key))
>                  {
> -                  trace_access_idx.add("used_range_estimates",false)
> -                                  .add("cause",
> -                                       "not better than ref estimates");
> +                  cause= "not better than ref estimates";

As we have already discussed:  Do not assign this to "cause". To avoid
duplicate names, change the name above from "cause" to e.g. "reason".

>                  }
>                  else
>                  {
> -                  trace_access_idx.add("used_range_estimates", false)
> -                                  .add("cause", "not available");
> +                  cause= "not available";
>                  }

The same as above.

>                }
>              }
> @@ -8217,9 +8215,10 @@ best_access_path(JOIN      *join,
>      best_uses_jbuf= TRUE;
>      best_filter= 0;
>      best_type= JT_HASH;
> +    Json_writer_object trace_access_hash(thd);
>      trace_access_hash.add("type", "hash");
>      trace_access_hash.add("index", "hj-key");
> -    trace_access_hash.add("cost", rnd_records);
> +    trace_access_hash.add("est_records_cost", rnd_records);

Please use "rnd_records" instead of "est_records_cost".

>    trace_access_hash.add("cost", best);
>    trace_access_hash.add("chosen", true);
>  }

> commit 0e8b82dbcbd8ae96e0758114c87e48061d603c5f
> Author: Sergei Krivonos <sergei.krivonos@xxxxxxxxxxx>
> Date:   Fri Nov 12 03:36:10 2021 +0200
>
>     MDEV-27036: add assert to detect duplicated JSON keys
>
> diff --git a/sql/my_json_writer.cc b/sql/my_json_writer.cc
> index 9470ba57855..501f0b72318 100644
> --- a/sql/my_json_writer.cc
> +++ b/sql/my_json_writer.cc
> @@ -39,6 +41,9 @@ inline void Json_writer::on_start_object()
>  #if !defined(NDEBUG) || defined(JSON_WRITER_UNIT_TEST)
>    if(!fmt_helper.is_making_writer_calls())
>    {
> +    if(got_name != named_item_expected()){
> +      std::cout <<"got_name["<<got_name<<"] != named_item_expected["<<named_item_expected()<<']'<<std::endl;
> +    }

First: as I've already mentioned, please don't write anything to to stderr or
stdout directly. Use the sql_print_error() function.

Second: the message is not informative. The above will print something like:

got_name[1] != named_item_expected[0]

which is not at all informative. What's "got_name[1]" ? 

Please change the printout to be something like:

Json_writer: attempt to write invalid JSON. Last data in JSON: %s

here %s should print last 256 chars in the JSON document.


>      VALIDITY_ASSERT(got_name == named_item_expected());
>      named_items_expectation.push_back(true);
>    }

> @@ -140,7 +150,19 @@ Json_writer& Json_writer::add_member(const char *name, size_t len)
>    }
>  #if !defined(NDEBUG) || defined(JSON_WRITER_UNIT_TEST)
>    if (!fmt_helper.is_making_writer_calls())
> +  {
> +    VALIDITY_ASSERT(!got_name);
>      got_name= true;
> +    VALIDITY_ASSERT(named_items.size());
> +    auto& named_items_keys= named_items.top();
> +    auto emplaced= named_items_keys.emplace(name, len);
> +    auto is_uniq_key= emplaced.second;
> +    if(!is_uniq_key)
> +    {
> +      std::cerr << "Duplicated key: " << *emplaced.first << std::endl;
> +      VALIDITY_ASSERT(is_uniq_key);

Same input as above: use sql_print_error and make the error message helpful.

> +    }
> +  }
>  #endif
>    return *this;
>  }


> commit 57e8f68955954ff96104c8356733bfa73feae849 (HEAD -> bb-10.8-MDEV-27036, origin/bb-10.8-MDEV-27036)
> Author: Sergei Krivonos <sergei.krivonos@xxxxxxxxxxx>
> Date:   Mon Nov 15 05:57:25 2021 +0200
>
>     MDEV-27036: unittest JSON object member name collision
>
> diff --git a/unittest/sql/my_json_writer-t.cc b/unittest/sql/my_json_writer-t.cc
> index 6398a589c33..7d20802527d 100644
> --- a/unittest/sql/my_json_writer-t.cc
> +++ b/unittest/sql/my_json_writer-t.cc
> @@ -119,7 +119,14 @@ int main(int args, char **argv)
>      ok(w.invalid_json, "JSON array end of object");
>    }
>  
> -
> +  {
> +    Json_writer w;
> +    w.start_object();
> +    w.add_member("name").add_ll(1);
> +    w.add_member("name").add_ll(2);
> +    w.end_object();
> +    ok(w.invalid_json, "JSON object member name collision");
> +  }

Please add more unit test coverage. 
  
  Check that this is accepted:

   {
      "foo": { 
         "foo": {}
      }
    }
  
  Check that this is not accepted:

   {
      "foo": { "bar": {} },  
      "foo": "something else"
   }

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net