← Back to team overview

maria-developers team mailing list archive

Re: GSOC21: MDEV-16375 & MDEV-23143

 

Hi Songlin!

It's great that you are excited about this project! Here are my thoughts on
your proposal and what I think you should focus on:

JSON_NORMALIZE seems simple at first, but I believe there are a lot of
corner cases. In order to get a proper specification for this function can
you have a look at other databases, to see if they implement something
similar? Have a look at the pandas python library, can you learn from their
experience?

Normalizing JSON can have some tricky cases such as:
a. How are arrays sorted if the values inside them are a mix of objects,
arrays, literals, numbers.
b. How do you define a sorting criteria between two JSON objects in this
case?
c. JSON is represented as text, however one can use it to store floating
point values. How do you plan to compare doubles and how would those values
be sorted? For example: 1 vs 1.0 vs 1.00 or 1000 vs 1e3?
d. What's the priority of null values, are they first, last?

The way we should handle this project is via TDD (Test Driven Development).
You would first write your test cases, covering as many corner cases as
possible, then implement the code such that it passes all the tests.

I suggest you add to your proposal some examples of how you define
JSON_NORMALIZE and JSON_EQUALS to behave, so that we can see you have
thought about points a, b, c, d from above.

As for JSON_EQUALS, assuming JSON_NORMALIZE is done correctly, it may work
as a simple strcmp between two normalized JSON objects, but I am not 100%
confident at this point, you would have to prove it :)

Vicențiu

On Tue, 30 Mar 2021 at 09:00, Hollow Man <hollowman@xxxxxxxxxxxx> wrote:

> Hi community!
>
> I've had my proposal shared with
> https://drive.google.com/file/d/1sv0qbqt9W-ob3GqxygWwRGurpRS1lCiv/view ,
> hope to get some feedback from the community.
>
> Songlin
>
> ------------------ Original ------------------
> *From: * "Hollow Man"<hollowman@xxxxxxxxxxxx>;
> *Date: * Thu, Mar 11, 2021 00:17 AM
> *To: * "maria-developers"<maria-developers@xxxxxxxxxxxxxxxxxxx>;
> *Subject: * GSOC21: MDEV-16375 & MDEV-23143
>
> Hi MariaDB community!
>
>    Glad to be here! My github account is @HollowMan6. Though I'm new to
> MariaDB community, I'm interested in MDEV-16375 & MDEV-23143: Function to
> normalize a json value & missing a JSON_EQUALS function for this year's
> GSOC project. Here are my first thoughts on these issues:
>
>    I have checked part of the codebase and I think the two issues can be
> merged into one. First we can create a function named JSON_NORMALIZE to
> normalize the json, which automatically parses the inputed json document,
> recursively sorts the keys (for objects) / sorts the numbers (for arrays),
> removes the spaces, and then return the json document string.
>
>    Then we create a function named JSON_EQUALS, which can be used to
> compare 2 json documents for equality realized by first seperately
> normalize the two json documents using JSON_NORMALIZE, then the 2 can be
> compared exactly as binary strings.
>
>    I have taken some inspirations from the Item_func_json_keys and
> json_scan_start for parsing json documents, and I think it's possible to
> sort the keys using std::map in STL for objects.
>
>    That's all for my ideas so far. Please correct me if I made some
> mistakes, and I'm going to work on my ideas later.
>
> Cheers!
>
> Hollow Man
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References