ubuntu-phone team mailing list archive

Thread
Date
Re: Fwd: Re: Desktop file parsing - lets standardize

To: Gerry Boland <gerry.boland@xxxxxxxxxxxxx>
From: Ryan Lortie <desrt@xxxxxxxx>
Date: Tue, 12 Nov 2013 13:29:31 -0500
Cc: ubuntu-phone <ubuntu-phone@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <52825646.2060202@canonical.com>
hi Gerry,

Thanks for the mails.

On Tue, Nov 12, 2013, at 11:24, Gerry Boland wrote:
> -------- Original Message --------
> From: Thomas Voß <thomas.voss@xxxxxxxxxxxxx>
>
> What is the underlying technology used for the cache and how will it
> be exposed to the rest of the system?

The cache is a simple binary format that is easy to read.  There are no
library dependencies for consuming it.  The reader side is more or less
contained in this 600 lines of .c:

  https://github.com/desrt/desktop-file-index/blob/master/dfi-reader.c

The goal was to keep it easy to consume by various desktops without
needing to use 3rd party libraries -- this is why we didn't use GVDB
(for example).

The most complicated data structure contained in it is sorted lists.  We
use binary search for lookups.  This is largely consistent with the
design goal of wanting to do prefix matching for things like mime types
(and particularly) user-initiated application searches.  We can
binary-search for the prefix and iterate through the list until we hit
an item that doesn't have the prefix.

The result is pretty fast -- searching through applications takes
single-digit microseconds on my machine.

An additional goal of the cache file is saner treatment of translations.
 The current situation on Ubuntu is a pretty huge disaster.  We ship a
patch to GLib (which will never go upstream) that allows us to have the
translations split out of desktop files and put in gettext files.  The
reason here is to facilitate language packs: if we don't install the
language pack for language X then we don't have its translation in the
original desktop file.  The savings here are substantial when multiplied
by ~300 language packs...

The problem is that the reader needs to mmap the gettext .mo file for
every single desktop file it opens in order to lookup the translations. 
When iterating all the desktop files on your system (for example, as
Unity does) this gets pretty ridiculous.  It doesn't help that these
files can never be unmapped.  If you're using your system in non-English
take a look at the number of .mo files currently mapped into various
processes that use desktop files... For example, I notice that the .mo
from gnome-control-center (just to take one) ends up being used from
gnome-session, bamf, compiz, update-manager, and the application scope. 
The application scope has 88 .mo files mapped into it...

The desktop file index will try to resolve this situation by having all
of the translation mapping done at the time that the cache is built (ie:
all languages in the same file).  This produces a very large file if you
have a lot of language packs installed, but the builder is careful to
group languages together -- so you'll only end up faulting in the
portion of the file that corresponds to your current locale.  I've done
some testing to see which pages actually end up in the kernel page cache
and it's very small.

> If we want to standardize, I would propose to look at the
> implementation that is the most toolkit-/runtime-agnostic one, with
> the fewest dependencies to make sure that higher levels of the stack
> can easily consume it. Ideally something with a very simple API (and
> desktop file parsing is not rocket science, so it should be fairly
> small).

I'm not sure you could call GLib toolkit-agnostic (since it's pretty
clearly tied with Gtk) but it's also designed to be used on its own and
to be very easy to bind (and has more bindings than any other library I
know).  I'd also argue that it's essentially 'no additional dependency'
since we already have it on Ubuntu and it won't be going anywhere for a
very long time...

Despite what I say above about reading the desktop file index boiling
down to about 600 lines of .c, handling desktop files is a very
complicated affair.  There are a lot of non-trivial details of the
specification and it's non-trivial to get them right against the wide
number of applications in the wild.  There are also considerations like
Ubuntu's language pack handling that we'd have to patch into the new
solution.  Reading the keyfile contents is really only a small part of
dealing with this.

> - -------- Original Message --------
> From: Ted Gould <ted@xxxxxxxxxx>
> 
> I don't think that the cache is really useful for our use-cases.  If we
> do want a cache, I think it'd be better to add a Click desktop hook and
> cache what is specifically needed for that case.  For instance, the
> shell needs an icon and name cache, it doesn't need the other data.
> Same for scopes, the exec line is mostly useless, but breaking down the
> fields into the indexer at install time would save time when searching.

glick works by extracting desktop files from the app bundle as they're
installed and running update-desktop-database on them.  This will soon
result in the regeneration of the desktop index as well.  I assume that
Click does (or should do) something similar to this.

There's no need to invent additional magic here specifically for the
case of app bundles...

> It seems to me the most difficult thing here is parsing of the ini-ish
> file format, not creating a higher level abstraction of the keys.  Which
> most of the libraries you mentioned are trying to do.  I see benefit in
> having one parser of the file format, but the rest seems mostly icing
> and is always going to be task specific.

As I mention above, I disagree.  GKeyFile is a very small part of the
complication with desktop files.  It's not just as simple as "get me the
name" and you get returned the result from the keyfile.

 - finding the files in various directories
 - cross-directory masking when you have multiple files with same
 basename
 - checking for the case of '-' to '/' in filenames (kde4-kate ->
 kde4/kate.desktop)
 - gettext translation hacks as covered above
 - tryexec/nodisplay/hidden handling
 - onlyshowin/notshowin handling
 - expanding list of arguments on the format string of the Exec line
 - dbusactivatable
 - desktop actions
 - the insanely complicated mime type and "open this file with default
 app" handling
 - probably about a dozen other things not on my mind at the moment

The non-keyfile parts of the spec are _extremely_ non-trivial.  You
really absolutely want to have a robust existing parser to deal with all
of this for you.  I also think that we do want the desktop index instead
of reinventing a new format just for Click packages.  In my opinion this
hard-limits us to choosing either GDesktopAppInfo or using the desktop
file component from KDE (that will hopefully be pure Qt or even go
upstream into Qt some day).  Both of these will also include support for
the index soon (as David Faure of KDE has told me he plans to implement
it).  Of the two, I'd obviously favour GDesktopAppInfo from the
standpoint that Gerry's stated goal is to find one solution that we can
use from everywhere.

Cheers
References

Re: Desktop file parsing - lets standardize
From: Thomas Voß, 2013-11-12