← Back to team overview

maria-discuss team mailing list archive

Re: MariaDB Server 10.3 notes

 

On 13.10.2016 17:26, Andrew Hutchings wrote:

Whilst I agree it would be a nice feature and something I've heard for years, the implementation would likely be more complex than it seems.

For starters you have to consider the possibility of a procedure crashing and taking the whole daemon with it. You would likely need to fork a small worker process pool and have some kind of shared memory or socket communications for safety. In addition the implementation would have to be extremely careful not to add any potential security hole due to a zero-day in PHP or some bad input filtering for example.

I'm not saying it is impossible, but it will likely be a lot of work to get right and the APIs would need to be carefully thought out.

So the question becomes: is it worth spending time developing this over another feature? Or is it something that could be better implemented safely in another layer, such as in a database proxy?

Kind Regards
Andrew

I think crashing is much easier if one runs native C UDFs, or any kind of natively compiled plugins.

For external languages I think an important aspect is often overlooked. Namely, how to would access data. You can create a new JDBC connection, but it is likely not what you want to do. You just want to run a SELECT from the same session you're in, just like you do in SQL stored procedures. Exactly this requires a large amount of work in the connectors. It is not just running a JVM inside the server. There is no API for data access from the running session inside the server, not even for C.


On 13/10/16 07:04, Federico Razzoli wrote:
Hi all,

So basically everyone would love, love, love to have external languages for stored procedures, but no one is working on it... so bad. Please consider something: 1- Some features could be implemented as stored procedures, it's much easier. This has been done in the past (Flexviews, Securich...) but SQL is too limited. 2- I am sure that a lot of people would implement procedures libraries if they could use something like JavaScript or PHP. If we could use Python, stuff like NumPy and SciPy could be used. 3- SQL limitations could be lifted (namespaces for global objects, arrays, argv, cursors based on a prepared statement...) but will you ever do it? Probably not. But if we have external languages, who cares about procedural SQL limitations.

I believe that next releases features are selected based on their cost (not only that of course). But please consider points 2 and 3, and try to estimate the cost of features that the community can develop for you, plus the cost of features that won't be really needed anymore if external languages are available (arrays in SQL).

Cheers
Federico




--------------------------------------------
Mer 12/10/16, Sergei Golubchik <serg@xxxxxxxxxxx> ha scritto:

 Oggetto: Re: [Maria-discuss] MariaDB Server 10.3 notes
 A: "Justin Swanhart" <greenlion@xxxxxxxxx>
 Cc: "Maria Discuss" <maria-discuss@xxxxxxxxxxxxxxxxxxx>
 Data: Mercoledì 12 ottobre 2016, 14:15

 Hi, Justin!

 Very good questions,
 thanks!
 Some answers below:

 On Oct 12, Justin Swanhart
 wrote:
 >
 > > *
 InnoDB: InnoDB native partitioning - so MySQL 8 InnoDB? But
 Monty
 > > says there's next to no
 changes in InnoDB 8...  Instant add column.
 > > New InnoDB deadlock detection (8.0).
 New INFORMATION_SCHEMA table
 > >
 (8.0). Dedicated tablespace for temporary tables (in 5.7 and
 merged,
 > > check). Lock wait policy
 (contribution)
 >
 >
 Monty is *notorious* for low balling estimates.  His famous
 phrase is "it
 > is trivial".
 Everybody knows that if Monty says it is trivial, you can
 add
 > 10x the work to get it done, or
 more.

 Yes, I tend to agree
 with that. But Monty estimates are not that far off
 when applied to *Monty*. They're often too
 low when applied to others.

 https://www.google.de/search?q=10x+developer

 > includes transportable
 tablespaces for partitioned tables, and will likely
 > support native partitioning by the time 8
 rolls.  Native partitioning also
 >
 entails implementing the changes to the SE.  See:
 > http://mysqlserverteam.com/innodb-native-partitioning-early-access/

 InnoDB native partitioning is
 in 5.7.6:
 https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-6.html

 Is it better than upper-layer
 partitioning? Why?

 >
 > * more for window functions (user defined window
 functions, MDEV-10855)
 > What other
 databases have user definable window functions?  Will the
 syntax
 > be similar?  I can't really
 think of a reason to need custom window
 >
 functions personally, as the existing set is very
 comprehensive.  What kind
 > of custom
 window function do you think people would need to write?
 In most
 > implementations any aggregate
 functions can be used in a window frame
 >
 context, thus any aggregate SQL functions should satisfy
 requirements for
 > user defined window
 funcs.

 Yes, that was a
 confusing description.
 What it really means
 - we will have aggregate SQL functions (MDEV-7773)
 and preferrably, it should be possible to make
 them window-aware.

 Using an
 aggregate function as a window function, means applying it
 to
 every possible position of a sliding
 frame, and it has O(N*n) complexity
 (where N
 is total number of rows, n is the number of rows in the
 frame).
 If the aggregate function can remove
 rows from a group, it will only be O(N).

 > > * socket authentication
 > Can you explain this.  Is there an
 MDEV?

 That's using
 unix_socket authentication by default for the root user.
 It improves security and maintainability, but
 is not really new - Debian
 is already doing
 that for months. This change could be confusing and
 break exising user scripts, so it needs to be
 done with care.

 > > *
 X Protocol/Document store - only if people have time or
 money will
 > > it be done
 > If you want people to pay for it, perhaps
 you should implement it in
 > MaxScale
 instead of the server :)

 Interesting idea :)

 But the MariaDB Meetup is a community event
 organized by the MariaDB
 Foundation. This
 particular session was about MariaDB Server planning.
 MariaDB Foundation has nothing to do with
 MaxScale, so we could not have
 planned
 anything for it.

 > >*
 Peter asks if there is any plans to support other languages
 like V8?
 > PLEASE PLEASE PLEASE PLEASE
 IMPLEMENT WL-820.  External stored procs and
 > table functions.  Been sitting there for
 the taking for a long time.
 > Antony has
 tried to get you to get it into the server, but alas, it
 has
 > never happened, and though there
 continues to be wide requests for this
 >
 feature from the community, they fall on deaf ears.

 Frankly speaking, I'd love
 it. But this feature never got enough
 priority, not in MySQL times nor in MariaDB.
 Unfortunately.

 Perhaps
 we'll be able to sneak it into 10.3, but no promises
 here.

 > > *
 Compressed binary log (from Tencent)
 >
 Compressing the binary log saves on space on the master, but
 it makes
 > seeking into binary logs much
 more difficult, and searching backwards
 >
 through them becomes much more difficult (which has
 implications for query
 >
 'rewind')

 Tencent
 compresses individual events (think
 Compressed_query_log_event
 type). So seeking
 works as before, events are not decompressed on
 reading, they are sent compressed to slaves,
 etc. But the compression
 ratio is worse, of
 course.

 > > * Fix the
 XA transaction bug ( MySQL has fixed it already ) -
 MDEV-7974
 > Please actually complete XA
 support, don't just half fix it like Oracle
 > did.  Add full support for XA SUSPEND and
 XA RESUME and allow more than one
 >
 thread to participate in a distributed transaction in the
 server.

 Right. Still,
 MDEV-7974 is an important step, we cannot have proper XA
 without it. XA SUSPEND and XA RESUME should be
 the next one, I agree.

 >
 > * Indexes on expressions (this is part of virtual
 columns, will it not
 > > go into 10.2?
 Check with Serg)
 > Indexes on expressions
 requires parser support.
 > create index
 expr_idx on some_table(a + b);
 > select *
 from some_table where a+b > 30 and a+b <= 50 -- (uses
 range over
 > expr_idx)

 Right, but the parser support
 is the least of my worries. I don't want
 to low ball estmates :) but the *parser* can be
 fixed in a few hours.
 The most tricky part
 will be to fix the optimizer.

 > > * Flashback DDL MDEV-10571 (flashback
 DML will come in 10.2). It only
 > >
 works with row based replication. Talk about what to name
 it. There's
 > > already a MySQL
 time machine on github. Many like the name Rewind (but
 > > not Monty). Let's do a poll on
 the mailing list
 > Okay, Flashback query
 is VERY complicated. A flashback query is a
 > materialized view.  There are two ways
 generally to achieve flashback:
 > a)
 materialize the query as is, and instead of rolling the
 query forward
 > incrementally, you roll
 it backwards.  Flexviews achieves this by computing
 > the query as it is now, then computing the
 delta from a prior point in time
 > until
 the transaction in which that computation happened. Then the
 delta is
 > played back against the query,
 but the MONUS of each operation is applied
 > which changes insertions to deletions and
 vice versa.  This still presents
 > a
 problem for OUTER JOIN.  No documented asynchronous refresh
 algorithms
 > exist for outer join that
 I'm aware of, that would work with the concept of
 > reading row history from serialized
 changes.  Flexviews uses the "rolling
 > join propagation" algorithm, which
 only works with inner joins.

 Interesting... I'll continue reading on
 that, thanks.
 But see below

 > b) provide a
 point-in-time snapshot of every table used in the query at
 the
 > desired point in time, and run the
 query over those tables (this is a
 >
 synchronous mechanism which supports OUTER JOIN).  The
 problem here is how
 > to provide such a
 table.  The most straight-forward way is to copy the
 > table, and then do the backwards replay
 for each (ideally in parallel) but
 > this
 is obviously undesirable if the query is "select * from
 some_big_table
 > join another_big_table
 on (...)" because you have to fully copy each table
 > before you can undo changes.  Otherwise
 you need to have the SE display old
 >
 versions, just like it does for MVCC, but it has to display
 versions from
 > binary logs, not undo
 logs, and this requires a lot of SE and engine
 > changes!

 This is about correct. There are different
 applications for "flashback"
 with
 different use cases and different implementation
 trade-offs.

 One is, for
 example, to see historical sales numbers, for different
 months or years. That is, basically, for some
 kind of data analytics.
 Lots of SELECT
 queries, ad-hoc queries, at different historical time
 points.

 Another one is "damn, I've made a typo
 in a WHERE clause and updated too
 much". In this case one doesn't need
 joins or materialized views, one
 needs to
 see the data before the erroneous statement. This is not
 used
 often.

 For the second use case one can afford to copy
 tables and
 backward-replay the binary log.
 For the first use case one would need a
 completely different approach, I agree. May be
 something like what
 Flexviews does.

 > If you are going to have
 generic flashback, you might as well commit to
 > incrementally refreshable materialized
 views.

 Right, when
 we'll have materialized views, than we could think
 about incrementally updating them using rolling
 join propagation.

 > >
 * Additional GIS functions to stay compatible. Also it would
 be good
 > > to have a standalone GIS
 library (Georg suggests; wlad isn't too happy
 > > with the suggestion). Georg suggests
 that calculations should use the
 > >
 reference systems (Unflatten the world - MariaDB Server GIS
 the world
 > > looks flat)
 >
 > GIS functions should
 use gdal, just like postgresql does.  Then you can
 > also add support for rasters.  Here is an
 example of the awesomeness of
 >
 postgresql and postgis.  It uses SQL aggregate functions,
 table functions,
 > CTE, GIS raster
 functions, sequences, etc:
> https://github.com/greenlion/osmvox/blob/master/postgresql/combined_schema.sql

 May be. On the other hand,
 we're more precise that postgis, because our
 implementation uses fixed-point math, not
 doubles.
 And let's not forget that MySQL
 uses Boost::Geometry.

 >
 > * Query rewriting - MDEV-5561
 > I
 would like you to provide a SQL->DOM function call
 instead of just
 > providing a DOM to
 plugins.  This function could be exposed as a regular
 > item function as well so that anything in
 the server can parse SQL. I
 > suggest you
 implement my SQL "shim" interface and just provide
 some way to
 > get a easy to iterate over
 parsed data structure from the SQL, such as a
 > nested JSON array of objects.  A nested
 array is generated by
 > PHP-SQL-Parser
 and would provide a good template for such a JSON object.
 > https://github.com/greenlion/PHP-SQL-Parser

 We actually have an MDEV for
 that :)

 > > * With
 MyRocks coming, should we drop TokuDB (and maybe even
 deprecate
 > > in 10.2?) - bugs that
 MariaDB Corporation reports to Percona don't
 > > seem to get fixed. Peter says that
 bugs are fixed for customers... and
 >
 > there is ongoing development to make it better
 > It is certainly disheartening that Percona
 isn't responsive to MariaDB
 > bugs,
 but I'm sure you understand that it is hard for a
 competitor to fix
 > bugs for another
 competitor.  MariaDB maintains a forked version of
 > TokuDB.  It isn't fair to expect the
 upstream vendor to fix bugs that your
 >
 customers are paying you to support, does it?  Perhaps you
 should pay a
 > percentage of your support
 fees for TokuDB issues to Percona, or come to
 > some other support agreement.  Perhaps
 YOU should make the bug fixes and
 >
 submit them to Percona.

 And
 we do, look for bug reports I've reported on TokuDB to
 Percona -
 with patches :)

 But, really, Percona *is*
 fixing TokuDB bugs, it's just that they did a
 bit of refactoring after getting TokuDB and it
 took time for it to
 stabilize. And MariaDB
 Server has a vanilla TokuDB with almost no
 changes, we have no plans to fork it.

 > > * ORDER BY LIMIT
 optimizer bugs (MDEV-8306)
 > Oh god, this
 is a downward spiral every time somebody touches it.
 > Fixing it correctly requires rewriting the
 parser, something that
 > Oracle is
 undertaking.

 This
 didn't have much to do with the parser, and we've
 refactored that
 part of the parser in 10.2,
 so now this has nothing to do with the
 parser, it's purely the optimizer issue.

 > What about other new
 MySQL 8 features?  Are you getting rid of the .FRM
 > nightmare?  Are you going to support SET
 PERSIST?  etc?

 FRM was
 made purely optional in 10.0 - every storage engine decides
 for
 itself whether it uses FRMs or not. May
 be InnoDB will use FRMs in 10.3,
 may be it
 will not. MyISAM most probably will continue use them.

 SET PERSIST - and this is my
 personal opinion - this needs to be thought
 over *very* carefully. There have been quite a
 few security
 vulnerabilities with my.cnf
 stored in the datadir, and that's why since
 2005 the server no longer reads my.cnf in the
 datadir. But mysql_safe
 still does - and
 there have been new security vulnerabilities *last
 month*, caused by my.cnf in the datadir. And
 finally both MySQL (in
 5.7?) and MariaDB (in
 10.2) stopped reading my.cnf in the datadir for
 real.  So, having my security@xxxxxxxxxxx
 hat on, I look at SET PERSIST
 with a lot of
 suspicion, because it's nothing else than another
 attempt
 of storing server configuration
 information in the datadir and have it
 writable by the server itself.

 Regards,
 Sergei
 Chief Architect
 MariaDB
 and security@xxxxxxxxxxx

 _______________________________________________
 Mailing list: https://launchpad.net/~maria-discuss
 Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
 Unsubscribe : https://launchpad.net/~maria-discuss
 More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp





Follow ups

References