← Back to team overview

maria-developers team mailing list archive

Re: [Gsoc] Regex enhancements Project

 

Hi Kentoku,

thank you, I will surely study about Oniguruma..! :)


On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@xxxxxxxxx> wrote:

> Hi Sudheera and Sergei,
>
> > In case of I missed some libraries, I guess you will enlighten me to
> study about them too. considering the requirements I didn't see Asian
> multi-byte support implemented in anywhere, what would we do about that.?
>
> Do you know "oniguruma"?
> http://www.geocities.jp/kosako3/oniguruma/
> http://en.wikipedia.org/wiki/Oniguruma
>
> Oniguruma is a regular expressions library, that supports multi-byte
> character sets like big5, euc-kr and shift_jis. Oniguruma is used by
> "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I
> think you can understand easily about how to use it.
>
> Thanks,
> Kentoku
>
>
>
>
> 2013/4/20 Sudheera Palihakkara <catchsudheera@xxxxxxxxx>
>
>> Hello Sir,
>>
>> I've been working on this project for the past couple of days. I found
>> that there are few good regex libraries suitable for this task. Considering
>> the requirements I think PCRE, ICU regex and RGX would do the job. But ICU
>> regex doesn't have recursion but it has well-documented easy-to-understand
>> code. Currently I think PCRE is the best option we can have.
>>
>> In case of I missed some libraries, I guess you will enlighten me to
>> study about them too. considering the requirements I didn't see Asian
>> multi-byte support implemented in anywhere, what would we do about that.?
>>
>> In the google-melange page, under the application template there is a
>> field called "Project description", what should I include there.? i mean do
>> you expect a full description about the project including figures or just a
>> brief just like in projects ideas page.
>>
>> Thank you.
>>
>>
>> On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@xxxxxxxxxxxx>wrote:
>>
>>> Hi, Sudheera!
>>>
>>> On Apr 19, Sudheera Palihakkara wrote:
>>> > Hi,
>>> > I went through other threads on this topic. In one thread you
>>> mentioned to
>>> > choose a suitable regex library.
>>> >
>>> > *( Preliminary research - only about chosing a regex library to use in
>>> > MariaDB. You should be able to explain why we should use this library
>>> and
>>> > not some other one.)
>>> >
>>> > *
>>> > What do you mean by "choosing"? don't we have to enhance the exiting
>>> regex
>>> > library? Or choose from exiting already implemented libraries which are
>>> > free to use? sorry if it's a stupid question, but I'm confused. :O
>>>
>>> Enhancing our old regex library to support all modern features and
>>> multiple charsets is complex and bug-prone work.
>>>
>>> I don't see why we should bother doing it, when there are plenty of
>>> regex libraries available.
>>>
>>> There's PHP's mb_regex, there's prce, and many others too. We'd better
>>> just pick one that works better for MariaDB, and put it instead of
>>> Henry Spencer's library.
>>>
>>> Regards,
>>> Sergei
>>>
>>> P.S. Please, don't reply to me only, use reply-to-all, so that your
>>> mails appear on the mailing list.
>>>
>>
>>
>>
>> --
>> *Sudheera Palihakkara.*
>> Undergraduate
>> Department of *Computer Science and Engineering,
>> *Faculty of Engineering,
>> *University of Moratuwa*,
>> Sri Lanka.
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-developers
>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-developers
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>


-- 
*Sudheera Palihakkara.*
Undergraduate
Department of *Computer Science and Engineering,
*Faculty of Engineering,
*University of Moratuwa*,
Sri Lanka.

Follow ups

References