← Back to team overview

maria-developers team mailing list archive

Re: [Gsoc] Regex enhancements Project

 

Hi Sudheera and Sergei,

> In case of I missed some libraries, I guess you will enlighten me to
study about them too. considering the requirements I didn't see Asian
multi-byte support implemented in anywhere, what would we do about that.?

Do you know "oniguruma"?
http://www.geocities.jp/kosako3/oniguruma/
http://en.wikipedia.org/wiki/Oniguruma

Oniguruma is a regular expressions library, that supports multi-byte
character sets like big5, euc-kr and shift_jis. Oniguruma is used by
"mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I
think you can understand easily about how to use it.

Thanks,
Kentoku




2013/4/20 Sudheera Palihakkara <catchsudheera@xxxxxxxxx>

> Hello Sir,
>
> I've been working on this project for the past couple of days. I found
> that there are few good regex libraries suitable for this task. Considering
> the requirements I think PCRE, ICU regex and RGX would do the job. But ICU
> regex doesn't have recursion but it has well-documented easy-to-understand
> code. Currently I think PCRE is the best option we can have.
>
> In case of I missed some libraries, I guess you will enlighten me to study
> about them too. considering the requirements I didn't see Asian multi-byte
> support implemented in anywhere, what would we do about that.?
>
> In the google-melange page, under the application template there is a
> field called "Project description", what should I include there.? i mean do
> you expect a full description about the project including figures or just a
> brief just like in projects ideas page.
>
> Thank you.
>
>
> On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@xxxxxxxxxxxx>wrote:
>
>> Hi, Sudheera!
>>
>> On Apr 19, Sudheera Palihakkara wrote:
>> > Hi,
>> > I went through other threads on this topic. In one thread you mentioned
>> to
>> > choose a suitable regex library.
>> >
>> > *( Preliminary research - only about chosing a regex library to use in
>> > MariaDB. You should be able to explain why we should use this library
>> and
>> > not some other one.)
>> >
>> > *
>> > What do you mean by "choosing"? don't we have to enhance the exiting
>> regex
>> > library? Or choose from exiting already implemented libraries which are
>> > free to use? sorry if it's a stupid question, but I'm confused. :O
>>
>> Enhancing our old regex library to support all modern features and
>> multiple charsets is complex and bug-prone work.
>>
>> I don't see why we should bother doing it, when there are plenty of
>> regex libraries available.
>>
>> There's PHP's mb_regex, there's prce, and many others too. We'd better
>> just pick one that works better for MariaDB, and put it instead of
>> Henry Spencer's library.
>>
>> Regards,
>> Sergei
>>
>> P.S. Please, don't reply to me only, use reply-to-all, so that your
>> mails appear on the mailing list.
>>
>
>
>
> --
> *Sudheera Palihakkara.*
> Undergraduate
> Department of *Computer Science and Engineering,
> *Faculty of Engineering,
> *University of Moratuwa*,
> Sri Lanka.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References