← Back to team overview

zim-wiki team mailing list archive

Re: PageRank for Zim

 

Hi Laecy,

All this information is in the sqlite database that is backing the zim
index, so could easily define a plugin to extract and present the data.

Just listing by number of links is easy.
Finding similarities is more difficult, since you need a strict algorithm
for "similar".

One way I can think of is that if you want to find pages that are linked by
the same pages, you generate backlink lists, diff them and than sort based
on the number of lines in the diff  as a percentage of the total length of
the list.

However, you will still see pages with only 2 backlinks, one of which is
similar have 50% match, which is intuitively not so "similar".

Probably you want to filter on both number of backlinks larger than X and
similarity more than Y percent.

Maybe as a clarifying question, do you start with a specific page to which
you want to compare other pages, or is it more a "many to many" comparison
that you are looking for?

I'm interested in creating an experimental plugin for this, as it is an
interesting use case.  In the past I thought the "link view" would address
questions like this by visually showing the hot nodes in the network, but
with the number of pages you are talking about, this won't work.

(Mental note: once we have a plugin based on links, could also think of
similarity in terms of tags and other properties)

Regards,

Jaap

On Thu, Jul 13, 2017 at 3:07 PM Laecy . <laesaleigh@xxxxxxxxx> wrote:

> Hello all,
>
> First, thanks for the 0.67 release! It's fantastic so far. My laptop is a
> dinky little thing I only use for writing (can't get distracted by gaming
> if my comp can't play the games :P), and I can definitely see an
> improvement in speed and reaction time, so whatever you guys did behind the
> scenes made a big difference. Really good job, thank you.
>
> I've been loading all my reference material into the wiki for easy access,
> and it occurred to me that I can use that to see which of my references
> draw on the same references themselves - if twelve books draw their
> conclusions from the same 100 studies, then I'm not getting the diversity
> of perspective I think I'm getting.
>
> To that end, the backlinks feature has been invaluable.
>
> The trouble is, at this point I have thousands of primary sources and will
> be accumulating more. The only way I know of to assess how many pages refer
> to any given page is to actually look at each individual one. Is there a
> way to generate an ordered list based on the number of backlinks and/or
> find pages with highly similar backlink lists?
>
> Thanks again!!
>
> -Laecy
> _______________________________________________
> Mailing list: https://launchpad.net/~zim-wiki
> Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~zim-wiki
> More help   : https://help.launchpad.net/ListHelp
>

References