← Back to team overview

zim-wiki team mailing list archive

caching and progressively slower page loads

 

When Zim starts up, switching pages is nice and snappy.

https://www.youtube.com/watch?v=tl4_t6dW1l0

After a while, loading even the smallest page has a noticeable delay.

https://www.youtube.com/watch?v=euyGgEwLaIk

Note the time required to display the new page in addition to the time before the cursor becomes active.

This is running a recent Github version (commit e46a4b40a90d6903b6fa860bd72a4464dab7d704) with all add-ons disabled, including built-in add-ons. However, it was originally reproduced on older versions, so it is not a new effect.

A few backtraces with GDB during a page switch shows most of the time is spent in this query from the "get_mytreeiter()" function in zim/notebook/index/pages.py:

    for i, row in enumerate(self.db.execute('''
        SELECT * FROM pages WHERE parent=?
        ORDER BY sortkey, name LIMIT 20 OFFSET ?
        ''',
        (parent_id, offset)
    )):

https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/e46a4b40a90d6903b6fa860bd72a4464dab7d704/zim/notebook/index/pages.py#L804

This is being called from the main loop.

    Traceback (most recent call first):
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/notebook/index/pages.py", line 805, in get_mytreeiter
        (parent_id, offset)
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py", line 224, in on_get_iter
        return self.get_mytreeiter(treepath)
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py", line 238, in on_iter_next
        return self.on_get_iter(treepath)
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", line 703, in _run_main_loop
        gtk.main()
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", line 656, in _run_cmd
        self._run_main_loop(cmd)
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", line 626, in run
        self._run_cmd(cmd, args) # test seam
File "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", line 845, in main

The query itself is fast, so it seems strange that so much time is spent here. However, running Zim with "cprofile" for about a day shows there are roughly 1.3e6 calls to the sqlite3 execute method, so even though the time per call is less than a millisecond, the cumulative time is considerable.

    Wed Jan 17 10:33:24 2018    zim.notebook.index.pages_1516168791.prof

43403639 function calls (42725089 primitive calls) in 86784.644 seconds

       Ordered by: internal time
       List reduced from 2168 to 4 due to restriction <4>

       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1 85453.595 85453.595 86765.186 86765.186 {gtk._gtk.main}
1347825 1154.685 0.001 1154.686 0.001 {method 'execute' of 'sqlite3.Connection' objects} 5138231 50.881 0.000 1207.454 0.000 /home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py:231(on_iter_next) 11 43.953 3.996 48.992 4.454 /home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/widgets.py:3152(run)

By setting a breakpoint in GDB:

    break _pysqlite_query_execute

and then using "ignore" and "info break", we can count how many queries are run. In my notebook of about 1,400 pages, there are queries amounting to roughly twice that on startup. When the pages begin loading slowly, it runs correspondingly more queries.

Since "get_mytreeiter()" caches the page on each call, is it possible that the cache is being duplicated or otherwise not working properly, resulting in an excessive number of queries?

If the cache is working as intended, could it be changed so that it does not become progressively slower?

Should I file a bug report on Github, or should I wait until there is something more concrete to submit?


Nathaniel Beaver

P.S. Upon request, I am happy to provide full data and details on collection methods.

Please let me know if there is something I overlooked or a flaw in the methodology.

This is a follow-up to my message from December 29, 2017 with subject "profiling zim for performance?"


Follow ups