zim-wiki team mailing list archive
-
zim-wiki team
-
Mailing list archive
-
Message #04614
caching and progressively slower page loads
When Zim starts up, switching pages is nice and snappy.
https://www.youtube.com/watch?v=tl4_t6dW1l0
After a while, loading even the smallest page has a noticeable delay.
https://www.youtube.com/watch?v=euyGgEwLaIk
Note the time required to display the new page in addition to the time
before the cursor becomes active.
This is running a recent Github version (commit
e46a4b40a90d6903b6fa860bd72a4464dab7d704) with all add-ons disabled,
including built-in add-ons. However, it was originally reproduced on
older versions, so it is not a new effect.
A few backtraces with GDB during a page switch shows most of the time is
spent in this query from the "get_mytreeiter()" function in
zim/notebook/index/pages.py:
for i, row in enumerate(self.db.execute('''
SELECT * FROM pages WHERE parent=?
ORDER BY sortkey, name LIMIT 20 OFFSET ?
''',
(parent_id, offset)
)):
https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/e46a4b40a90d6903b6fa860bd72a4464dab7d704/zim/notebook/index/pages.py#L804
This is being called from the main loop.
Traceback (most recent call first):
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/notebook/index/pages.py",
line 805, in get_mytreeiter
(parent_id, offset)
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
line 224, in on_get_iter
return self.get_mytreeiter(treepath)
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
line 238, in on_iter_next
return self.on_get_iter(treepath)
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 703, in _run_main_loop
gtk.main()
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 656, in _run_cmd
self._run_main_loop(cmd)
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 626, in run
self._run_cmd(cmd, args) # test seam
File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 845, in main
The query itself is fast, so it seems strange that so much time is spent
here. However, running Zim with "cprofile" for about a day shows there
are roughly 1.3e6 calls to the sqlite3 execute method, so even though
the time per call is less than a millisecond, the cumulative time is
considerable.
Wed Jan 17 10:33:24 2018 zim.notebook.index.pages_1516168791.prof
43403639 function calls (42725089 primitive calls) in
86784.644 seconds
Ordered by: internal time
List reduced from 2168 to 4 due to restriction <4>
ncalls tottime percall cumtime percall filename:lineno(function)
1 85453.595 85453.595 86765.186 86765.186 {gtk._gtk.main}
1347825 1154.685 0.001 1154.686 0.001 {method 'execute' of
'sqlite3.Connection' objects}
5138231 50.881 0.000 1207.454 0.000
/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py:231(on_iter_next)
11 43.953 3.996 48.992 4.454
/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/widgets.py:3152(run)
By setting a breakpoint in GDB:
break _pysqlite_query_execute
and then using "ignore" and "info break", we can count how many queries
are run. In my notebook of about 1,400 pages, there are queries
amounting to roughly twice that on startup. When the pages begin loading
slowly, it runs correspondingly more queries.
Since "get_mytreeiter()" caches the page on each call, is it possible
that the cache is being duplicated or otherwise not working properly,
resulting in an excessive number of queries?
If the cache is working as intended, could it be changed so that it does
not become progressively slower?
Should I file a bug report on Github, or should I wait until there is
something more concrete to submit?
Nathaniel Beaver
P.S. Upon request, I am happy to provide full data and details on
collection methods.
Please let me know if there is something I overlooked or a flaw in the
methodology.
This is a follow-up to my message from December 29, 2017 with subject
"profiling zim for performance?"
Follow ups