← Back to team overview

qpdfview team mailing list archive

Re: Request for review of DjVu method "findText"

 

Hello again,

Am 24.01.2015 um 00:20 schrieb Adam Reichold:
> Hello Razi,
> 
> Am 23.01.2015 um 23:37 schrieb Seyyed Razi Alavizadeh:
>> Hello Adam,
>>
>> Sorry for my late reply.
> 
> No, thanks for looking into it!
> 
>> I tested search with trunk 1834:
>> If I enable "Whole Words" and I search for "a space" then results
>> contains something like for example "a KB-space", I'm not sure if it is
>> expected or not (?) but it's because "wordBegins&&wordEnds" is true when
>> partial matching for "space".
> 
> Yes, the thing is that after your last change, we began to accept
> non-letter (instead of whitespace) as word boundaries which I found
> pretty useful. Of course this does not match what DjVuLibre thinks are
> words and so we have two (DjVu) words "a" and "KB-space" on which the
> second one fits when using whole words matching because of the hyphen.
> 
> Personally, I think that is actually useful and we can live with the
> slightly larger number of false positives. Otherwise we'd have to revert
> back to match our words with DjVuLibre's words.
> 
>> Another issue is highlighting search results within tree-view when
>> searching "several words" with "Whole Words" disabled.
>> For example the search phrase is "a space" one of result is "a subspace"
>> that is not highlighted.
> 
> Yes, that seems because "emphasizeText" does not split the text to
> search for into individual words before highlighting. I see a few
> options here:
> 
> 1) Make the "emphasizeText" function exactly as complicated as DjVu's
> "findText" function with the possibility of breaking synchronicity with
> other plugins.
> 
> 2) Make the "findText" function's matching less fuzzy only finding
> proper substrings (with consecutive whitespace being counted as one
> symbol) and stop matching something like "a space" with "a KB-space".
> 
> 3) Not only query the surroundingText but also the text to be
> highlighted using "Page::text" with the actual result rectangle which
> can then be highlighted a proper substring.
> 
> 4) Ignore the differences in highlighting the more outlying cases.
> 
> Personally, I dislike option 2) and feel like 3) is the cleanest way of
> going about this (also w.r.t. to other plugins) but I am unsure about
> the performance implications. 1) may be a practical compromise. What do
> you think?

Attached, you also find what I would consider a clean approach to
separating the plug-in internal matching and the text highlighting for
you to have a look at and consider performance-wise.

Best regards, Adam.

>> Best Regards,
>> Razi.
>>
> 
> Best regards, Adam.
> 
>>
>> 2015-01-11 18:32 GMT+03:30 Adam Reichold <adam.reichold@xxxxxxxxxxx
>> <mailto:adam.reichold@xxxxxxxxxxx>>:
>>
>> Hello Razi,
>>
>> I am currently working on getting the whole-words search function of
>> Poppler exposed in the Qt frontends. I did already add the necessary
>> UI elements to qpdfview and started work on the DjVu backend.
>>
>> As you know, the DjVu search function, i.e. the "findText" method, has
>> always been pretty "hand waving". To make it fulfil the requirements
>> made by the UI at least to some degree, I extended it to respect the
>> whole words flag and I also tried to restore the matching of search
>> texts that span several words (which was gone after the last change to
>> make it match words within words at non-letters).
>>
>> It is now pretty complicated again (doing a word-by-word possibly with
>> words being substrings of DjVu words match) and I would be grateful if
>> you could review it as well when you find some spare time to do so.
>> Thanks!
>>
>> Best regards, Adam.
>>
>>     --
>>     Mailing list: https://launchpad.net/~qpdfview
>>     Post to     : qpdfview@xxxxxxxxxxxxxxxxxxx
>>     <mailto:qpdfview@xxxxxxxxxxxxxxxxxxx>
>>     Unsubscribe : https://launchpad.net/~qpdfview
>>     More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>>
>> -- 
>> Alavizadeh, Sayed Razi
>> My Blog: http://pozh.org <http://pozh.org/>
>> Saaghar (نرم‌افزار شعر): http://saaghar.pozh.org/
>> Saaghar Fan Page: http://www.facebook.com/saaghar.p
>> Saaghar Mailing List: http://groups.google.com/group/saaghar
>>
> 
> 
> 
=== modified file 'sources/documentview.cpp'
--- sources/documentview.cpp	2015-01-11 13:17:34 +0000
+++ sources/documentview.cpp	2015-01-24 11:07:39 +0000
@@ -820,11 +820,11 @@
     return m_searchTask->wholeWords();
 }
 
-QString DocumentView::surroundingText(int page, const QRectF& rect) const
+QPair< QString, QString > DocumentView::surroundingText(int page, const QRectF& rect) const
 {
     if(page < 1 || page > m_pages.size() || rect.isEmpty())
     {
-        return QString();
+        return qMakePair(QString(), QString());
     }
 
     // Fetch at most half of a line as centered on the given rectangle as possible.
@@ -834,7 +834,10 @@
 
     const QRectF surroundingRect(x, rect.top(), width, rect.height());
 
-    return m_pages.at(page - 1)->text(surroundingRect).simplified();
+    const QString& matchedText = m_pages.at(page - 1)->text(rect).simplified();
+    const QString& surroundingText = m_pages.at(page - 1)->text(surroundingRect).simplified();
+
+    return qMakePair(matchedText, surroundingText);
 }
 
 void DocumentView::show()

=== modified file 'sources/documentview.h'
--- sources/documentview.h	2015-01-11 13:17:34 +0000
+++ sources/documentview.h	2015-01-24 11:05:05 +0000
@@ -134,7 +134,7 @@
     bool searchMatchCase() const;
     bool searchWholeWords() const;
 
-    QString surroundingText(int page, const QRectF& rect) const;
+    QPair< QString, QString > surroundingText(int page, const QRectF& rect) const;
 
 signals:
     void documentChanged();

=== modified file 'sources/miscellaneous.cpp'
--- sources/miscellaneous.cpp	2015-01-17 14:08:33 +0000
+++ sources/miscellaneous.cpp	2015-01-24 11:18:38 +0000
@@ -56,7 +56,7 @@
     return true;
 }
 
-void emphasizeText(const QString& text, bool matchCase, bool wholeWords, const QString& surroundingText, QTextLayout& textLayout)
+void emphasizeText(const QString& matchedText, const QString& surroundingText, QTextLayout& textLayout)
 {
     QFont font = textLayout.font();
     font.setWeight(QFont::Light);
@@ -65,30 +65,14 @@
 
     QList< QTextLayout::FormatRange > additionalFormats;
 
-    const Qt::CaseSensitivity sensitivity = matchCase ? Qt::CaseSensitive : Qt::CaseInsensitive;
-    int index = 0;
-
-    while((index = surroundingText.indexOf(text, index, sensitivity)) != -1)
+    for(int index = 0; (index = surroundingText.indexOf(matchedText, index)) != -1; index += matchedText.length())
     {
-        const int nextIndex = index + text.length();
-
-        const bool wordBegins = index == 0 || !surroundingText.at(index - 1).isLetterOrNumber();
-        const bool wordEnds = nextIndex == surroundingText.length() || !surroundingText.at(nextIndex).isLetterOrNumber();
-
-        if(!wholeWords || (wordBegins && wordEnds))
-        {
-            QTextLayout::FormatRange formatRange;
-            formatRange.start = index;
-            formatRange.length = text.length();
-            formatRange.format.setFontWeight(QFont::Bold);
-
-            additionalFormats.append(formatRange);
-        }
-
-        if((index = nextIndex) >= surroundingText.length())
-        {
-            break;
-        }
+        QTextLayout::FormatRange formatRange;
+        formatRange.start = index;
+        formatRange.length = matchedText.length();
+        formatRange.format.setFontWeight(QFont::Bold);
+
+        additionalFormats.append(formatRange);
     }
 
     textLayout.setAdditionalFormats(additionalFormats);
@@ -647,15 +631,12 @@
         return;
     }
 
-    const QString text = index.data(SearchModel::TextRole).toString();
+    const QString matchedText = index.data(SearchModel::MatchedTextRole).toString();
     const QString surroundingText = index.data(SearchModel::SurroundingTextRole).toString();
 
-    if(!text.isEmpty() && !surroundingText.isEmpty())
+    if(!matchedText.isEmpty() && !surroundingText.isEmpty())
     {
-        const bool matchCase = index.data(SearchModel::MatchCaseRole).toBool();
-        const bool wholeWords = index.data(SearchModel::WholeWordsRole).toBool();
-
-        paintSurroundingText(painter, option, text, surroundingText, matchCase, wholeWords);
+        paintSurroundingText(painter, option, matchedText, surroundingText);
         return;
     }
 }
@@ -675,7 +656,7 @@
 }
 
 void SearchItemDelegate::paintSurroundingText(QPainter* painter, const QStyleOptionViewItem& option,
-                                              const QString& text, const QString& surroundingText, bool matchCase, bool wholeWords) const
+                                              const QString& matchedText, const QString& surroundingText) const
 {
     const int textMargin = QApplication::style()->pixelMetric(QStyle::PM_FocusFrameHMargin) + 1;
     const QRect textRect = option.rect.adjusted(textMargin, 0, -textMargin, 0);
@@ -691,7 +672,7 @@
     textLayout.setText(elidedText);
     textLayout.setFont(option.font);
 
-    emphasizeText(text, matchCase, wholeWords, surroundingText, textLayout);
+    emphasizeText(matchedText, surroundingText, textLayout);
 
 
     textLayout.beginLayout();

=== modified file 'sources/miscellaneous.h'
--- sources/miscellaneous.h	2015-01-17 12:47:36 +0000
+++ sources/miscellaneous.h	2015-01-24 11:17:40 +0000
@@ -345,7 +345,7 @@
     void paintProgress(QPainter* painter, const QStyleOptionViewItem& option,
                        int progress) const;
     void paintSurroundingText(QPainter* painter, const QStyleOptionViewItem& option,
-                              const QString& text, const QString& surroundingText, bool matchCase, bool wholeWords) const;
+                              const QString& matchedText, const QString& surroundingText) const;
 
 };
 

=== modified file 'sources/searchmodel.cpp'
--- sources/searchmodel.cpp	2015-01-11 13:17:34 +0000
+++ sources/searchmodel.cpp	2015-01-24 11:19:51 +0000
@@ -174,6 +174,8 @@
             return view->searchMatchCase();
         case WholeWordsRole:
             return view->searchWholeWords();
+        case MatchedTextRole:
+            return fetchMatchedText(view, result);
         case SurroundingTextRole:
             return fetchSurroundingText(view, result);
         case Qt::ToolTipRole:
@@ -377,7 +379,9 @@
         return;
     }
 
-    m_textCache.insert(job.key, job.object, job.object->length());
+    const int cost = job.object->first.length() + job.object->second.length();
+
+    m_textCache.insert(job.key, job.object, cost);
 
     emit dataChanged(createIndex(0, 0, view), createIndex(results->count() - 1, 0, view));
 }
@@ -424,19 +428,28 @@
     return createIndex(row, 0);
 }
 
+QString SearchModel::fetchMatchedText(DocumentView* view, const SearchModel::Result& result) const
+{
+    const TextCacheObject* object = fetchText(view, result);
+
+    return object != 0 ? object->first : QString();
+}
+
 QString SearchModel::fetchSurroundingText(DocumentView* view, const Result& result) const
 {
-    if(view == 0)
-    {
-        return QString();
-    }
-
+    const TextCacheObject* object = fetchText(view, result);
+
+    return object != 0 ? object->second : QString();
+}
+
+const SearchModel::TextCacheObject* SearchModel::fetchText(DocumentView* view, const SearchModel::Result& result) const
+{
     const TextCacheKey key = textCacheKey(view, result);
     const TextCacheObject* object = m_textCache.object(key);
 
     if(object != 0)
     {
-        return *object;
+        return object;
     }
 
     if(m_textWatchers.size() < 20 && !m_textWatchers.contains(key))
@@ -449,7 +462,7 @@
         watcher->setFuture(QtConcurrent::run(textJob, key, result));
     }
 
-    return QLatin1String("...");
+    return 0;
 }
 
 inline SearchModel::TextCacheKey SearchModel::textCacheKey(DocumentView* view, const Result& result)
@@ -465,9 +478,9 @@
 
 SearchModel::TextJob SearchModel::textJob(const TextCacheKey& key, const Result& result)
 {
-    const QString surroundingText = key.first->surroundingText(result.first, result.second);
+    const QPair< QString, QString > text = key.first->surroundingText(result.first, result.second);
 
-    return TextJob(key, new QString(surroundingText));
+    return TextJob(key, new TextCacheObject(text));
 }
 
 } // qpdfview

=== modified file 'sources/searchmodel.h'
--- sources/searchmodel.h	2015-01-11 13:17:34 +0000
+++ sources/searchmodel.h	2015-01-24 11:14:34 +0000
@@ -58,6 +58,7 @@
         TextRole,
         MatchCaseRole,
         WholeWordsRole,
+        MatchedTextRole,
         SurroundingTextRole
     };
 
@@ -107,7 +108,7 @@
 
 
     typedef QPair< DocumentView*, QByteArray > TextCacheKey;
-    typedef QString TextCacheObject;
+    typedef QPair< QString, QString > TextCacheObject;
 
     struct TextJob
     {
@@ -124,8 +125,11 @@
     mutable QCache< TextCacheKey, TextCacheObject > m_textCache;
     mutable QHash< TextCacheKey, TextWatcher* > m_textWatchers;
 
+    QString fetchMatchedText(DocumentView* view, const Result& result) const;
     QString fetchSurroundingText(DocumentView* view, const Result& result) const;
 
+    const TextCacheObject* fetchText(DocumentView* view, const Result& result) const;
+
     static TextCacheKey textCacheKey(DocumentView* view, const Result& result);
     static TextJob textJob(const TextCacheKey& key, const Result& result);
 

Attachment: signature.asc
Description: OpenPGP digital signature


References