← Back to team overview

getdeb-collaboration team mailing list archive

New "mirror-selector"

 

Hello,
I have done some research on the mirrorbrain tool (thanks to Christoph for
the ninja packaging).

File availability on mirrorbrain is done via a full mirror scanning
(collecting file sizes & last modified time), such scanning on our slowest
mirror took more than 10 minutes.
I am bit disappointed because I was hoping for on demand mirror availability
detection.
I believe that doing on demand check is more efficient despite the impact on
response time increase, instead of scanning the entire mirror we only get
the information for the files currently being requested.  The response time
impact of doing an HTTP HEAD request before redirecting the user to a mirror
can be minimized by using a cache system. Unlike mirror brain a database is
not required, however for the caching system and depend on the caching
policy  it might be a good option.

GetDeb's mirror selection has always been on demand check based, using a PHP
script which did the remote mirrors checking. It was highly inefficient
because it's done on a per web server thread basis and lacks a cache system.

Because I have a strong believe on the technical merit of the on demand
scan I have decided to implement a mirror selection system from scratch
using Python.
The utility/project name is "mirror-selector" it runs as a standalone HTTP
Server whose only purpose is to handle static file GET requests, check the
availability from a local directory (it must be run on a local mirror) and
then redirect to an available mirror after checking that an exact copy of
the file is available remotely.

The http server uses a fixed size thread pool, each web client request is
handled on it's own http server thread.
When  mirror-selector starts a thread is started for each mirror, each
mirror thread provides an input queue which maybe used by any http server
thread.
With this architecture all requests related to a unique mirror are handled
on a single thread, this allows to easily reuse the same TCP connection by
using HTTP 1.1 Keep-Alive for multiple requests. The caching facility is
also simpler to implement because it works on a per thread basis.

The code is  available at launchpad: bzr branch lp:mirror-selector (check
the README to test it), it should be considered as alpha.

GetDeb's/PlayDeb's main archive pool was already switched to
mirror-selector, we may intermittently swap  to the legacy selector as
serious problems maybe found.

To check if it's available and some stats:
http://archive.getdeb.net/status/

Thanks

-- 
João Luís Marques Pinto
GetDeb Team Leader
http://www.getdeb.net
http://blog.getdeb.net