← Back to team overview

arsenal-user team mailing list archive

Announcing json_check script


In the lpltk 2.0 release is a new script 'json_check', which validates
JSON data files.

It can also be used to validate launchpadlib cache files, which is why
I'm specifically announcing it.

If you've cronned launchpadlib scripts, you'll be familiar with the
error:  Out of the blue one day your script starts randomly failing with
tracebacks including assertions like:

  cElementTree.ParseError: unclosed token: line XXXX, column Y
  cElementTree.ParseError: no element found: line XXXX, column Y

  simplejson.decoder.JSONDecodeError: Extra data: line 1 column YYY -
  line 1 column YYY (char AAA - ZZZ)

  httplib.IncompleteRead: IncompleteRead(XYZ bytes read, PDQ more expected)

In diagnosing these failures, the root cause has been cache corruption
(see LP: #545401).  As a workaround, you can simply delete all your
cache files.  I.e., I used to use a cron job like this:

  # Clear out cruft from launchpadlib cache daily
  01 00 * * * find $HOME/.cache/lpltk/api.launchpad.net/cache/ -type f -mtime +7 -exec rm -f {} \;

However, you may have a LOT of stuff cached and it's a
waste to discard thousands of cached files when it's probably just a
single file that's broken.

json_check gives a more precise way to resolve the issue.

$ MY_LPLTK_CACHE=~/.cache/lpltk/api.launchpad.net/cache
$ json_check $MY_LPLTK_CACHE

$ tail -n 1 $MY_LPLTK_CACHE/api.launchpad.net,devel,bugs,bugtrackers,gnome-bugs-application,json,36ee5fc239e628d96a8d8e8c7041ac45

{"registrant_link": "https://api.launchpad.net/devel/~dholbach";, "contact_details": "bugmaster@xxxxxxxxx", "name": "gnome-bugs", "bug_tracker_type": "Bugzilla", "title": "GNOME Bug Tracker", "watches_collection_link": "https://api.launchpad.net/devel/bugs/bugtrackers/gnome-bugs/watches";, "has_lp_plugin": null, "web_link": "https://bugs.launchpad.net/bugs/bugtrackers/gnome-bugs";, "base_url": "https://bugzilla.gnome.org";, "active": true, "self_link": "https://api.launchpad.net/devel/bugs/bugtrackers/gnome-bugs";, "http_etag": "\"df5ab1003193f3b363d236eb54e4aa7dc0b38f5c-1063d5d40b083af2f35211ede4c4d7af6d06a83a\"", "summary": "The GNOME bug database and tracking system. It is used to to track bug reports and requests for enhancements for the GNOME Desktop, and related software such as  GTK+.", "resource_type_link": "https://api.launchpad.net/devel/#bug_tracker";, "base_url_aliases": ["http://bugs.gnome.org/";, "https://bugs.gnome.org/"]}"base_url_aliases": ["http://bugs.gnome.org/";, "https://bugs.gnome.org/"]}

$ cd $MY_LPLTK_CACHE && json_check $MY_LPLTK_CACHE | xargs rm -v

removed `api.launchpad.net,devel,bugs,bugtrackers,gnome-bugs-application,json,36ee5fc239e628d96a8d8e8c7041ac45'

So, now my cronjob looks like this:

SHELL = /bin/bash
PATH = ~/bin:/usr/bin/:/bin
LPLTK_CACHE = $HOME/.cache/lpltk/api.launchpad.net/cache

# Clear broken cache files from launchpadlib cache hourly
 01  * * * * cd $LPLTK_CACHE && json_check $LPLTK_CACHE | xargs rm -v

The json_check script can take a minute or so to run on a very full
cache, so I cron all my other lpltk scripts to start running after
minute 05.