← Back to team overview

duplicity-team team mailing list archive

[Merge] lp:~mwilck/duplicity/duplicity into lp:duplicity

 

Martin Wilck has proposed merging lp:~mwilck/duplicity/duplicity into lp:duplicity.

Requested reviews:
  duplicity-team (duplicity-team)

For more details, see:
https://code.launchpad.net/~mwilck/duplicity/duplicity/+merge/301268

Glob matching in current duplicity uses a selector function that calls path_matches_glob(). This means that whenever a filename is matched, path_matches_glob() goes through the process of transforming a glob expression into regular expressions for filename and directory matching.

My proposed patches create a closure function instead that uses precalculated regular expressions; the regular expressions are thus constructed only once at initialization time.

This change speeds up duplicity a *lot* when complex include/exclude lists are in use: for my use case (dry-run, backup of an SSD filesystem), the speedup is a factor of 25 (runtime: 4s rather than 90s).

-- 
Your team duplicity-team is requested to review the proposed merge of lp:~mwilck/duplicity/duplicity into lp:duplicity.
=== modified file 'duplicity/globmatch.py'
--- duplicity/globmatch.py	2016-06-27 21:12:18 +0000
+++ duplicity/globmatch.py	2016-07-27 13:08:36 +0000
@@ -49,8 +49,9 @@
     return list(map(glob_to_regex, prefixes))
 
 
-def path_matches_glob(path, glob_str, include, ignore_case=False):
-    """Tests whether path matches glob, as per the Unix shell rules, taking as
+def path_matches_glob_fn(glob_str, include, ignore_case=False):
+    """Return a function test_fn(path) which
+    tests whether path matches glob, as per the Unix shell rules, taking as
     arguments a path, a glob string and include (0 indicating that the glob
     string is an exclude glob and 1 indicating that it is an include glob,
     returning:
@@ -83,16 +84,19 @@
     scan_comp_re = re_comp("^(%s)$" %
                            "|".join(_glob_get_prefix_regexs(glob_str)))
 
-    if match_only_dirs and not path.isdir():
-        # If the glob ended with a /, only match directories
-        return None
-    elif glob_comp_re.match(path.name):
-        return include
-    elif include == 1 and scan_comp_re.match(path.name):
-        return 2
-    else:
-        return None
-
+    def test_fn(path):
+
+        if match_only_dirs and not path.isdir():
+            # If the glob ended with a /, only match directories
+            return None
+        elif glob_comp_re.match(path.name):
+            return include
+        elif include == 1 and scan_comp_re.match(path.name):
+            return 2
+        else:
+            return None
+
+    return test_fn
 
 def glob_to_regex(pat):
     """Returned regular expression equivalent to shell glob pat

=== modified file 'duplicity/selection.py'
--- duplicity/selection.py	2016-07-24 12:30:45 +0000
+++ duplicity/selection.py	2016-07-27 13:08:36 +0000
@@ -33,7 +33,7 @@
 from duplicity import diffdir
 from duplicity import util  # @Reimport
 from duplicity.globmatch import GlobbingError, FilePrefixError, \
-    path_matches_glob
+    path_matches_glob_fn
 
 """Iterate exactly the requested files in a directory
 
@@ -544,13 +544,10 @@
             ignore_case = True
 
         # Check to make sure prefix is ok
-        if not path_matches_glob(self.rootpath, glob_str, include=1):
+        if not path_matches_glob_fn(glob_str, include=1)(self.rootpath):
             raise FilePrefixError(glob_str)
 
-        def sel_func(path):
-            return path_matches_glob(path, glob_str, include, ignore_case)
-
-        return sel_func
+        return path_matches_glob_fn(glob_str, include, ignore_case)
 
     def exclude_older_get_sf(self, date):
         """Return selection function based on files older than modification date """


Follow ups