duplicity-team team mailing list archive
-
duplicity-team team
-
Mailing list archive
-
Message #03717
[Merge] lp:~mwilck/duplicity/duplicity into lp:duplicity
Martin Wilck has proposed merging lp:~mwilck/duplicity/duplicity into lp:duplicity.
Requested reviews:
duplicity-team (duplicity-team)
For more details, see:
https://code.launchpad.net/~mwilck/duplicity/duplicity/+merge/301268
Glob matching in current duplicity uses a selector function that calls path_matches_glob(). This means that whenever a filename is matched, path_matches_glob() goes through the process of transforming a glob expression into regular expressions for filename and directory matching.
My proposed patches create a closure function instead that uses precalculated regular expressions; the regular expressions are thus constructed only once at initialization time.
This change speeds up duplicity a *lot* when complex include/exclude lists are in use: for my use case (dry-run, backup of an SSD filesystem), the speedup is a factor of 25 (runtime: 4s rather than 90s).
--
Your team duplicity-team is requested to review the proposed merge of lp:~mwilck/duplicity/duplicity into lp:duplicity.
=== modified file 'duplicity/globmatch.py'
--- duplicity/globmatch.py 2016-06-27 21:12:18 +0000
+++ duplicity/globmatch.py 2016-07-27 13:08:36 +0000
@@ -49,8 +49,9 @@
return list(map(glob_to_regex, prefixes))
-def path_matches_glob(path, glob_str, include, ignore_case=False):
- """Tests whether path matches glob, as per the Unix shell rules, taking as
+def path_matches_glob_fn(glob_str, include, ignore_case=False):
+ """Return a function test_fn(path) which
+ tests whether path matches glob, as per the Unix shell rules, taking as
arguments a path, a glob string and include (0 indicating that the glob
string is an exclude glob and 1 indicating that it is an include glob,
returning:
@@ -83,16 +84,19 @@
scan_comp_re = re_comp("^(%s)$" %
"|".join(_glob_get_prefix_regexs(glob_str)))
- if match_only_dirs and not path.isdir():
- # If the glob ended with a /, only match directories
- return None
- elif glob_comp_re.match(path.name):
- return include
- elif include == 1 and scan_comp_re.match(path.name):
- return 2
- else:
- return None
-
+ def test_fn(path):
+
+ if match_only_dirs and not path.isdir():
+ # If the glob ended with a /, only match directories
+ return None
+ elif glob_comp_re.match(path.name):
+ return include
+ elif include == 1 and scan_comp_re.match(path.name):
+ return 2
+ else:
+ return None
+
+ return test_fn
def glob_to_regex(pat):
"""Returned regular expression equivalent to shell glob pat
=== modified file 'duplicity/selection.py'
--- duplicity/selection.py 2016-07-24 12:30:45 +0000
+++ duplicity/selection.py 2016-07-27 13:08:36 +0000
@@ -33,7 +33,7 @@
from duplicity import diffdir
from duplicity import util # @Reimport
from duplicity.globmatch import GlobbingError, FilePrefixError, \
- path_matches_glob
+ path_matches_glob_fn
"""Iterate exactly the requested files in a directory
@@ -544,13 +544,10 @@
ignore_case = True
# Check to make sure prefix is ok
- if not path_matches_glob(self.rootpath, glob_str, include=1):
+ if not path_matches_glob_fn(glob_str, include=1)(self.rootpath):
raise FilePrefixError(glob_str)
- def sel_func(path):
- return path_matches_glob(path, glob_str, include, ignore_case)
-
- return sel_func
+ return path_matches_glob_fn(glob_str, include, ignore_case)
def exclude_older_get_sf(self, date):
"""Return selection function based on files older than modification date """
Follow ups