← Back to team overview

maria-developers team mailing list archive

Re: [Spatial] On current implementation approach

 

On 24 September 2013 04:17, Roberto Spadim <roberto@xxxxxxxxxxxxx> wrote:
> 2013/9/23 Mateusz Loskot <mateusz@xxxxxxxxxx>
>> On 23 September 2013 22:10, Alexey Botchkov <holyfoot@xxxxxxxxxxxx> wrote:
>> >>
>> >> 1. Is it possible to implement MariaDB extensions like Spatial (custom
>> >> type + set of functions) without such a tight coupling with the
>> >> internal implementation of the type system (without messing Field
>> >> class with geometry types directly, etc.)?
>> >
>> >
>> > Yes, it is possible. The core algorithms are separated from the Field
>> > structure and any other database internals.
>> > They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
>>
>> Yes, but my question is not really about location of computational
>> geometry
>> bits, but about the data management: SQL data type for geometry objects,
>> input/output routines.
>>
>> Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that
>> if:
>> 1. Field is the only place that defines GEOMETRY type (and there is no
>> CREATE TYPE support)
>
>
> create type probably will be a 10.1 feature:
> https://mariadb.atlassian.net/browse/MDEV-4912
> and maybe you will not have a spatial key optimization in the first version
> of this feature

This sounds very promising.

> in my opnion if you start a new udf today with gis, you should use the WKB +
> a second lib (geos is very good) to handle spatial data
> geos can use the WKB with a fast "unserialize": GEOSGeomFromWKB_buf


No, I don't want to go through WKB (Fast? Perhaps, not fast enough, for me :-)),
that is the whole point.
This is waste of time and will never perform as good as going through
structured type straight away.

(FYI, I've been part of PostGIS dev team, I've been also member of
GEOS dev team,
so it's fair to say I'm a bit experienced with spatial database
extensions, WKB, coding, encoding).

>> 2. UDF prototypes will use of GEOMETRY in their prototypes to declare
>> input/output parameters
>> then I couldn't understand how it is possible to remove geometry
>> definitions from Field
>> and other internal definitions.
>>
>> But, I've just found this project [1] with extra spatial UDFs, so I
>> think I understand the UDF
>> protocol regarding I/O arguments would not require explicit GEOMETRY type
>
> yes, you don't have a GEOMTRY_TYPE for arg_type[] at udf
> check your example at your mysql-spatial-udf git project:
> [...]

Yes, as I mentioned, I have read through the code of that project, so
now it's clear to me
what it currently takes to pass geometry into/form spatial functions
in MariaDB/MySQL.

>> making it possible to move Spatial Extensions completely out of
>> built-ins (trunk/sql/ files).
>>
>> [1] https://github.com/krandalf75/MySQL-Spatial-UDF
>
>
>
> mariadb 10.0 have plans about OPENGIS:
> https://mariadb.com/kb/en/plans-for-10x/#opengis-compliance

Thanks for this pointer, I'll be watching it.

> but i didn't found JIRA report about it, or another worklog or something
> similar (must check if it's in lauchpad bug track or another lauchpad
> branch)

I tried to find something on that topic myself too.

> and i don't know if mariadb will use GEOS... but from what i know, geos is
> the best opengis lib today, why not use it at mariadb?! =)

There are alternatives.

>> >> 2. Is it possible to implement Spatial using User-Defined Functions
>> >> (UDF) defined in shared binary?
>> >
>> >
>> > The spatial functions/operations can be implemented with UDF, but
>> > that makes query optimization and using Spatial keys problemmatic.
>>
>> So, for real use case, the idea I brainstormed above would not make sense.
>> Unless, there is workaround for those problems you mean.
>
>
> well i don't know what problemmatic means at high/low level, but i think
> it's something like this at sql layer:
>
> WHERE udf_function(x)
> in theory this udf_function() could be optimized with rtree index Y...
> but it will do a table scan...

Yes, of course, that's the sensible approach.

> optimizer don't know how to use index with udf functions yet :(

Ok, I think I understand it now.

> a workaround about index should be done at application side, could be
> something like:
> WHERE udf_function(x) and other_builtin_function_that_use_index(xxxx)
> with this "other_builtin_function_that_use_index" function (envelop funciont
> for example) , you could use the spatial index and optimize the query... but
> it's not the "best" solution at server side, but the only i can think as a
> udf developer :)
> [...]

Right.

>> >> 3. What is the reason behind using Well-Known-Binary (WKB) stream of
>> >> bytes to transport geometry values into/from functions? Is it due to
>> >> limitations of MariaDB type system where String is  the only universal
>> >> carrier for complex data? This concern is related to necessity of
>> >> encoding/decoding WKB when chaining spatial function calls, and
>> >> possibilities to avoid it.
>> >
>> >
>> > The reason was mostly historical. It was sufficient for the first
>> > implementations of the Geometry field types and somewhat convenient as
>> > we don't need to perform conversions
>> > when we need to import/export features in their WKB representation.
>> > But yes, that format is inefficient and difficult to handle properly. I
>> > plan to get rid of it internally - only support importing-exporting it.
>
>
> hum... what could be a better format? i think that at disk level (storage)
> WKB is a small format, not?
> [...]

It's not about size, but WKB may lack of some metadata required (i.e. SRID)
That's why in PostGIS we have, Extended WKB used internally

http://postgis.net/docs/manual-2.0/using_postgis_dbmanagement.html#EWKB_EWKT

>> IOW, there are only 3 types available (integer, real, string), so
>> String is the only one
>>
>> usable to pass geometry objects around, regardless of actual encoding
>> format,
>> WKB, WKT, any other binary stream...
>
>
> when using GEOMETRIC data type at sql functions, it's always ~WKB
> from your example at git, internally WKB start at position 4, and SRID is
> the first 4 bytes, check:
> [...]

Then, this is not WKB format (AFAIU OGC 06-103r3 specification).
There is no SRID embedded in OGC WKB and throughout the entire discussion,
I have assumed WKB means OGC WKB.

Certainly, using custom extended form of OGC WKB solves many problems
(see PostGIS manual referred earlier).

>> It means, that if I want to pass geometry to my_foo UDF:
>>
>> MSUDF_API char*
>> my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long
>> *length, char *is_null, char *error);
>>
>> the only option available is to make geometry into a kind of stream of
>> bytes
>> and passed as one of args item.
>> So, a kind of serialising/deserialising is in fact unavoidable.
>
>
> unserialize/serialize only if you use different string format than
> mariadb/mysql source code =) like geos...

Not really, encoding/decoding is necessary whenever you need
to access structured geometry objects, means, for every geometry algorithm.

>> Is my understanding correct?
>
> well i think yes :)
>
> did you checked that dev.mysql.com have a poll about
> "What are the top 3 GIS related features that you are most interested in for
> your new and existing MySQL projects?"

http://dev.mysql.com/tech-resources/quickpolls/index.php?pollname=top3-gis-features

Interesting indeed.


Best regards,
-- 
Mateusz  Loskot, http://mateusz.loskot.net
"Participation in this whole process is a form of torture" ~~ Szalony


Follow ups

References