← Back to team overview

qpdfview team mailing list archive

Re: Request for review of DjVu method "findText"

 

Hello Razi,

Am 23.01.2015 um 23:37 schrieb Seyyed Razi Alavizadeh:
> Hello Adam,
> 
> Sorry for my late reply.

No, thanks for looking into it!

> I tested search with trunk 1834:
> If I enable "Whole Words" and I search for "a space" then results
> contains something like for example "a KB-space", I'm not sure if it is
> expected or not (?) but it's because "wordBegins&&wordEnds" is true when
> partial matching for "space".

Yes, the thing is that after your last change, we began to accept
non-letter (instead of whitespace) as word boundaries which I found
pretty useful. Of course this does not match what DjVuLibre thinks are
words and so we have two (DjVu) words "a" and "KB-space" on which the
second one fits when using whole words matching because of the hyphen.

Personally, I think that is actually useful and we can live with the
slightly larger number of false positives. Otherwise we'd have to revert
back to match our words with DjVuLibre's words.

> Another issue is highlighting search results within tree-view when
> searching "several words" with "Whole Words" disabled.
> For example the search phrase is "a space" one of result is "a subspace"
> that is not highlighted.

Yes, that seems because "emphasizeText" does not split the text to
search for into individual words before highlighting. I see a few
options here:

1) Make the "emphasizeText" function exactly as complicated as DjVu's
"findText" function with the possibility of breaking synchronicity with
other plugins.

2) Make the "findText" function's matching less fuzzy only finding
proper substrings (with consecutive whitespace being counted as one
symbol) and stop matching something like "a space" with "a KB-space".

3) Not only query the surroundingText but also the text to be
highlighted using "Page::text" with the actual result rectangle which
can then be highlighted a proper substring.

4) Ignore the differences in highlighting the more outlying cases.

Personally, I dislike option 2) and feel like 3) is the cleanest way of
going about this (also w.r.t. to other plugins) but I am unsure about
the performance implications. 1) may be a practical compromise. What do
you think?

> Best Regards,
> Razi.
> 

Best regards, Adam.

> 
> 2015-01-11 18:32 GMT+03:30 Adam Reichold <adam.reichold@xxxxxxxxxxx
> <mailto:adam.reichold@xxxxxxxxxxx>>:
> 
> Hello Razi,
> 
> I am currently working on getting the whole-words search function of
> Poppler exposed in the Qt frontends. I did already add the necessary
> UI elements to qpdfview and started work on the DjVu backend.
> 
> As you know, the DjVu search function, i.e. the "findText" method, has
> always been pretty "hand waving". To make it fulfil the requirements
> made by the UI at least to some degree, I extended it to respect the
> whole words flag and I also tried to restore the matching of search
> texts that span several words (which was gone after the last change to
> make it match words within words at non-letters).
> 
> It is now pretty complicated again (doing a word-by-word possibly with
> words being substrings of DjVu words match) and I would be grateful if
> you could review it as well when you find some spare time to do so.
> Thanks!
> 
> Best regards, Adam.
> 
>     --
>     Mailing list: https://launchpad.net/~qpdfview
>     Post to     : qpdfview@xxxxxxxxxxxxxxxxxxx
>     <mailto:qpdfview@xxxxxxxxxxxxxxxxxxx>
>     Unsubscribe : https://launchpad.net/~qpdfview
>     More help   : https://help.launchpad.net/ListHelp
> 
> 
> 
> 
> -- 
> Alavizadeh, Sayed Razi
> My Blog: http://pozh.org <http://pozh.org/>
> Saaghar (نرم‌افزار شعر): http://saaghar.pozh.org/
> Saaghar Fan Page: http://www.facebook.com/saaghar.p
> Saaghar Mailing List: http://groups.google.com/group/saaghar
> 

Attachment: signature.asc
Description: OpenPGP digital signature


Follow ups

References