← Back to team overview

maria-developers team mailing list archive

Re: regex enhancements

 

Hi, Line!

On Feb 29, Line Bie Pedersen wrote:
> > > I have some questions related to this project, foremost are of course
> > > what kind of enhancement are you looking for? Are you looking for an
> > > update of existing code? I looked at the existing code and it seems to
> > > be the old regex library written by Henry Spencer. I did not look into
> > > it in detail, so I have no clear idea of the changes needed to bring
> > > it up to speed. Or are you looking to upgrade with an existing
> > > library? Or perhaps a rewrite suited to your needs? I can't see if you
> > > use the regex library for anything but deciding simple acceptance, but
> > > this would probably be a big factor in deciding. I would be happy to
> > > look into all three suggestions, or even a fourth, if you prefer
> > > something else.
> >
> > There're two answers to that.
> >
> > The first one - on the level of requirements, I am very much looking for
> > multi-byte support in the regex library. And REGEX_REPLACE() function
> > would be nice too - users ask for it quite often.
> 
> How will the REGEX_REPLACE() function work?

Similar to any other regex replace. Something like

   REGEX_REPLACE(orig_str, regex_str, replace_str, [flags])

for example

   REGEX_REPLACE('3.1415926', '(1).', '\\1_', 'g')

which would return 3.1_1_926

> > The second answer - I thought that the simple way of getting
> > multi-byte support, would be to remove Henry Spencer library, and
> > put some modern regex implementation instead. That could be
> > relatively easy to do for any undegraduate student with little
> > experience - this task was one of ideas we had for Google Summer of
> > Code.
> 
> Agree, this is the simple solution. Did you already have a library
> picked out? Are there any requirements to license and language? Would
> it be useful if you could limit the amount of memory the library can
> use?

No, I didn't. I would ask a student to compile a list of libraries and
suggest which one we should use. If I were doing it myself, I'd start
looking from PCRE, but I didn't "pick it up", becase I didn't really
consider the alternatives yet.

GPL will not do.  LGPL is fine, BSD, MIT - fine too.
Language - C or C++.
And it should allow concurrent multi-threaded use.

Yes, it would be a very useful feature. Otherwise a low-privileged user
may consume an arbitrary amount of memory and DoS the system.

> > Now, with your experience, you may prefer to do something else - not
> > replace the library, but rewrite it, or extend, whatever.  I'm fine
> > either way. It's only important that the result works with
> > multi-byte character sets (ideally - our character set code), and
> > that it can support Henry Spencer regex syntax.
> >
> What I would really like to do, is rewrite the thing! However, I do
> not have an unlimited amount of time for this project. I'll look into
> Henry Spencers old library and let you know what I decide to do.

Great! Thank you very much!

Regards,
Sergei



References