openlp-core team mailing list archive

Thread
Date

Re: [Merge] lp:~orangeshirt/openlp/bibles into lp:openlp

To: mp+95805@xxxxxxxxxxxxxxxxxx
From: Meinert Jordan <meinertjordan@xxxxxxxxxx>
Date: Sun, 04 Mar 2012 23:38:17 -0000
In-reply-to: <20120304215817.24317.57883.launchpad@ackee.canonical.com>
Reply-to: mp+95805@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

It is difficult for me to understand your code (or maybe it is too late and I should go to bed).
For example: What are you doing in line 167?
match.group(u'book') matches on that (see method comment): ...(?!\s)(?P<book>[\d]*[^\d]+)(?<!\s)...
Means from the first non-whitespace character until the last non-whitespace character at the bookname end. It may contain digits, but only at the beginning.
Therefore your regexp makes no sense if I see it right.
167	+            regex_book = re.compile(u'^[1-4]?[\. ]{0,2}%s' % book.lower(),

I guess, what you want to do is something like this: Change line 232
- u'^\s*(?!\s)(?P<book>[\d]*[^\d]+)(?<!\s)\s*'
- u'^\s*(?!\s)(?P<book>\(?P<booknum>[\d])*\s*\(?P<bookbase>[^\d])+)(?<!\s)\s*'
Then you can make your new regexp:
regex_book = re.compile(u'^%s\.? ?%s' % (match.group(u'booknum'), match.group(u'bookbase')), re.UNICODE | re.IGNORECASE)

I've seen right, that you add the strings to the completer? Well, in that case: Why do you want to make any soft decission? The problem with this is, that you will produce a behaviour, which is intransparent for the user. The book name recognition I've written is such general, because internationalized names are very different. If you put some meaning on the dot, because we have it in German, it might contraproductive for other languages. With your current algorithm users might write regular expressions and the code is interpreting it.

My suggestion:
Make hard decission.
If you want to make it the best way possible:
Take the match.group(u'book') string, and escape all reserved characters:
# escape reserved characters
        for character in u'\\.^$*+?{}[]()':
            bookname = bookname.replace(character, u'\\' + character)
Then you can replace all whitespaces by a abriatary number of whitespaces and make a caseinsensitivematch:
re.compile(u'\s*%s\s*' % u'\s*'.join(bookname.split()), re.UNICODE | re.IGNORECASE)
Use regex.match() and not regexp.search(). Otherwhise John would be found in the Epistle of John as well. (And users will enter two-letter shortcuts and find something completely different)
-- 
https://code.launchpad.net/~orangeshirt/openlp/bibles/+merge/95805
Your team OpenLP Core is subscribed to branch lp:openlp.

Follow ups

Re: [Merge] lp:~orangeshirt/openlp/bibles into lp:openlp
From: Armin Köhler, 2012-03-05

References

[Merge] lp:~orangeshirt/openlp/bibles into lp:openlp
From: Armin Köhler, 2012-03-04