← Back to team overview

duplicity-team team mailing list archive

[Merge] lp:~aaron-whitehouse/duplicity/08-merge-glob-parsers into lp:duplicity

 

Aaron Whitehouse has proposed merging lp:~aaron-whitehouse/duplicity/08-merge-glob-parsers into lp:duplicity.

Commit message:
* Use a single code path for glob strings whether or not these contain special characters/wildcards (glob_get_normal_sf) and remove glob_get_filename_sf and glob_get_tuple_sf.
* Remove run-tests-ve as this was identical to run-tests.

Requested reviews:
  duplicity-team (duplicity-team)

For more details, see:
https://code.launchpad.net/~aaron-whitehouse/duplicity/08-merge-glob-parsers/+merge/315851

Move to using a single code path for glob strings whether or not these contain special characters/wildcards. 

Note that a few test cases were changed in this patch, i.e. behaviour of the code has changed. In my view these changes were corrections, where the selection function was returning the incorrect value (e.g. an include, rather than a scan).

In my testing there has been no reduction in performance by using a parser that supports wildcards for all glob strings.

* Make globs of "/" work with globbing code (previously these were handled by non-globbing code).
* Make globs of "/" match everything (equivalent to "/**").
* Move to using single (globing, glob_get_normal_sf) function for glob strings with and without globbing characters. No performance impact.
* Removed unused non-globbing code (_glob_get_filename_sf and _glob_get_tuple_sf).
* Add more scan tests.

Minor ancillary change that were made:
* Removed run-tests-ve, which was identical to run-tests.

There is substantial refactoring that I plan to do to both the testing and substantive code, but I will do that as a separate branch to keep things tidy.
-- 
Your team duplicity-team is requested to review the proposed merge of lp:~aaron-whitehouse/duplicity/08-merge-glob-parsers into lp:duplicity.
=== modified file 'duplicity/globmatch.py'
--- duplicity/globmatch.py	2016-12-11 16:53:12 +0000
+++ duplicity/globmatch.py	2017-01-29 21:31:37 +0000
@@ -59,10 +59,15 @@
     1 - if the file should be included
     2 - if the folder should be scanned for any included/excluded files
     None - if the selection function has nothing to say about the file
+
+    Note: including a folder implicitly includes everything within it.
     """
     glob_ends_w_slash = False
 
-    if glob_str != "/" and glob_str[-1] == "/":
+    if glob_str == "/":
+        # If the glob string is '/', it implicitly includes everything
+        glob_str = "/**"
+    elif glob_str[-1] == "/":
         glob_ends_w_slash = True
         # Remove trailing / from directory name (unless that is the entire
         # string)
@@ -80,7 +85,8 @@
     # string translated into regex
     # ($|/) nothing must follow except for the end of the string, newline or /
     # Note that the "/" at the end of the regex means that it will match
-    # if the glob matches a parent folders of path
+    # if the glob matches a parent folders of path, i.e. including a folder
+    # includes everything within it.
     glob_comp_re = re_comp("^%s($|/)" % glob_to_regex(glob_str))
 
     if glob_ends_w_slash:
@@ -101,6 +107,11 @@
                            "|".join(_glob_get_prefix_regexs(glob_str)))
 
     def test_fn(path):
+        assert not path.name[-1] == "/" or path.name == "/", \
+            "path.name should never end in '/' during normal operation for " \
+            "normal paths (except '/' alone)\n" \
+            "path.name here is " + path.name + " and glob is " + glob_str
+
         if glob_comp_re.match(path.name):
             # Path matches glob, or is contained within a matching folder
             if not glob_ends_w_slash:

=== modified file 'duplicity/selection.py'
--- duplicity/selection.py	2016-12-26 21:15:37 +0000
+++ duplicity/selection.py	2017-01-29 21:31:37 +0000
@@ -429,9 +429,6 @@
         assert include == 0 or include == 1
         if glob_str == "**":
             sel_func = lambda path: include
-        elif not self.glob_re.match(glob_str):
-            # normal file
-            sel_func = self.glob_get_filename_sf(glob_str, include)
         else:
             sel_func = self.glob_get_normal_sf(glob_str, include)
 
@@ -477,66 +474,6 @@
                         (include and "include-if-present" or "exclude-if-present", filename)
         return sel_func
 
-    def glob_get_filename_sf(self, filename, include):
-        """Get a selection function given a normal filename
-
-        Some of the parsing is better explained in
-        filelist_parse_line.  The reason this is split from normal
-        globbing is things are a lot less complicated if no special
-        globbing characters are used.
-
-        """
-        # Internal. Used by glob_get_sf and unit tests.
-        # ToDo: Make all globbing/non-globbing use same code path
-        # This distinction has bitten us too many times with bugs in one or
-        # the other.
-        match_only_dirs = False
-
-        if filename != "/" and filename[-1] == "/":
-            match_only_dirs = True
-            # Remove trailing / from directory name (unless that is the entire
-            # string)
-            filename = filename[:-1]
-
-        if not filename.startswith(self.prefix):
-            raise FilePrefixError(filename)
-        index = tuple(filter(lambda x: x,
-                             filename[len(self.prefix):].split("/")))
-        return self.glob_get_tuple_sf(index, include, match_only_dirs)
-
-    def glob_get_tuple_sf(self, tuple, include, match_only_dirs=False):
-        """Return selection function based on tuple"""
-        # Internal. Used by glob_get_filename_sf.
-
-        def include_sel_func(path):
-            if len(tuple) == len(path.index) and match_only_dirs and not path.isdir():
-                # If we are assessing the actual directory (rather than the
-                # contents of the directory) and the glob ended with a /,
-                # only match directories
-                return None
-            elif (path.index == tuple[:len(path.index)] or
-                    path.index[:len(tuple)] == tuple):
-                return 1  # /foo/bar implicitly matches /foo, vice-versa
-            else:
-                return None
-
-        def exclude_sel_func(path):
-            if match_only_dirs and not path.isdir():
-                # If the glob ended with a /, only match directories
-                return None
-            elif path.index[:len(tuple)] == tuple:
-                return 0  # /foo excludes /foo/bar, not vice-versa
-            else:
-                return None
-
-        if include == 1:
-            sel_func = include_sel_func
-        elif include == 0:
-            sel_func = exclude_sel_func
-        sel_func.exclude = not include
-        sel_func.name = "Tuple select %s" % (tuple,)
-        return sel_func
-
     def glob_get_normal_sf(self, glob_str, include):
         """Return selection function based on glob_str
 
@@ -562,9 +499,13 @@
             glob_str = glob_str[len("ignorecase:"):].lower()
             ignore_case = True
 
-        # Check to make sure prefix is ok
-        if not path_matches_glob_fn(glob_str, include=1)(self.rootpath):
-            raise FilePrefixError(glob_str)
+        # Check to make sure prefix is ok, i.e. the glob string is within
+        # the root folder being backed up
+        file_prefix_selection = path_matches_glob_fn(glob_str, include=1)(self.rootpath)
+        if not file_prefix_selection:
+            # file_prefix_selection == 1 (include) or 2 (scan)
+            raise FilePrefixError(glob_str + " glob with " + self.rootpath.name +
+                                  " path gives " + str(file_prefix_selection))
 
         return path_matches_glob_fn(glob_str, include, ignore_case)
 

=== modified file 'testing/functional/test_selection.py'
--- testing/functional/test_selection.py	2017-01-15 18:59:42 +0000
+++ testing/functional/test_selection.py	2017-01-29 21:31:37 +0000
@@ -1088,5 +1088,36 @@
         restored = self.directory_tree_to_list_of_lists(restore_dir)
         self.assertEqual(restored, [])
 
+
+class TestAbsolutePaths(IncludeExcludeFunctionalTest):
+    """ Tests include/exclude options with absolute paths"""
+
+    def test_absolute_paths_non_globbing(self):
+        """ Test --include and --exclude work with absolute paths"""
+        self.backup("full", os.path.abspath("testfiles/select2"),
+                    options=["--include", os.path.abspath("testfiles/select2/3/3sub3/3sub3sub2/3sub3sub2_file.txt"),
+                             "--exclude", os.path.abspath("testfiles/select2/3/3sub3/3sub3sub2"),
+                             "--include", os.path.abspath("testfiles/select2/3/3sub2/3sub2sub2"),
+                             "--include", os.path.abspath("testfiles/select2/3/3sub3"),
+                             "--exclude", os.path.abspath("testfiles/select2/3/3sub1"),
+                             "--exclude", os.path.abspath("testfiles/select2/2/2sub1/2sub1sub3"),
+                             "--exclude", os.path.abspath("testfiles/select2/2/2sub1/2sub1sub2"),
+                             "--include", os.path.abspath("testfiles/select2/2/2sub1"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub3/1sub3sub2"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub3/1sub3sub1"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub2/1sub2sub3"),
+                             "--include", os.path.abspath("testfiles/select2/1/1sub2/1sub2sub1"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub1/1sub1sub3/1sub1sub3_file.txt"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub1/1sub1sub2"),
+                             "--exclude", os.path.abspath("testfiles/select2/1/1sub2"),
+                             "--include", os.path.abspath("testfiles/select2/1.py"),
+                             "--include", os.path.abspath("testfiles/select2/3"),
+                             "--include", os.path.abspath("testfiles/select2/1"),
+                             "--exclude", os.path.abspath("testfiles/select2/**")])
+        self.restore()
+        restore_dir = 'testfiles/restore_out'
+        restored = self.directory_tree_to_list_of_lists(restore_dir)
+        self.assertEqual(restored, self.expected_restored_tree)
+
 if __name__ == "__main__":
     unittest.main()

=== removed file 'testing/run-tests-ve'
--- testing/run-tests-ve	2015-07-31 14:49:46 +0000
+++ testing/run-tests-ve	1970-01-01 00:00:00 +0000
@@ -1,23 +0,0 @@
-#!/bin/bash
-#
-# Copyright 2002 Ben Escoto <ben@xxxxxxxxxxx>
-# Copyright 2007 Kenneth Loafman <kenneth@xxxxxxxxxxx>
-# Copyright 2011 Canonical Ltd
-#
-# This file is part of duplicity.
-#
-# Duplicity is free software; you can redistribute it and/or modify it
-# under the terms of the GNU General Public License as published by the
-# Free Software Foundation; either version 2 of the License, or (at your
-# option) any later version.
-#
-# Duplicity is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-# General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with duplicity; if not, write to the Free Software Foundation,
-# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
-tox

=== modified file 'testing/unit/test_globmatch.py'
--- testing/unit/test_globmatch.py	2016-12-11 16:53:12 +0000
+++ testing/unit/test_globmatch.py	2017-01-29 21:31:37 +0000
@@ -44,6 +44,25 @@
         assert r == "[a*b-c]e[^]]", r
 
 
+class TestMiscellaneousGlobMatching(UnitTestCase):
+    """Test various glob matching"""
+
+    def test_glob_scans_parent_directories(self):
+        """Test glob scans parent"""
+        with patch('duplicity.path.Path.isdir') as mock_isdir:
+            mock_isdir.return_value = True
+            # Note: paths ending in '/' do not work!
+            # self.assertEqual(
+            #     path_matches_glob_fn("testfiles/parent/sub/", 1)(Path(
+            #         "testfiles/parent/")), 2)
+            self.assertEqual(
+                path_matches_glob_fn("testfiles/parent/sub", 1)(Path(
+                    "testfiles/parent")), 2)
+            self.assertEqual(
+                path_matches_glob_fn("testfiles/select2/3/3sub2", 1)(Path(
+                "testfiles/select2/3")), 2)
+
+
 class TestDoubleAsteriskOnIncludesExcludes(UnitTestCase):
     """Test ** on includes and exclude patterns"""
 
@@ -163,29 +182,26 @@
             mock_isdir.return_value = False
             self.assertEqual(
                 path_matches_glob_fn("fold*/", 1)(Path("folder/file.txt")), 1)
-    #
-    # def test_slash_matches_everything(self):
-    #     """Test / matches everything"""
-    #     # ToDo: Not relevant at this stage, as "/" would not go through
-    #     # globmatch because it has no special characters, but it should be
-    #     # made to work
-    #     with patch('duplicity.path.Path.isdir') as mock_isdir:
-    #         mock_isdir.return_value = True
-    #         self.assertEqual(
-    #             path_matches_glob_fn("/",
-    #                                  1)(Path("/tmp/testfiles/select/1/2")), 1)
-    #         self.assertEqual(
-    #             path_matches_glob_fn("/",
-    #                                  1)(Path("/test/random/path")), 1)
-    #         self.assertEqual(
-    #             path_matches_glob_fn("/",
-    #                                  1)(Path("/")), 1)
-    #         self.assertEqual(
-    #             path_matches_glob_fn("/",
-    #                                  1)(Path("/var/log")), 1)
-    #         self.assertEqual(
-    #             path_matches_glob_fn("/",
-    #                                  1)(Path("/var/log/log.txt")), 1)
+
+    def test_slash_matches_everything(self):
+        """Test / matches everything"""
+        with patch('duplicity.path.Path.isdir') as mock_isdir:
+            mock_isdir.return_value = True
+            self.assertEqual(
+                path_matches_glob_fn("/",
+                                     1)(Path("/tmp/testfiles/select/1/2")), 1)
+            self.assertEqual(
+                path_matches_glob_fn("/",
+                                     1)(Path("/test/random/path")), 1)
+            self.assertEqual(
+                path_matches_glob_fn("/",
+                                     1)(Path("/")), 1)
+            self.assertEqual(
+                path_matches_glob_fn("/",
+                                     1)(Path("/var/log")), 1)
+            self.assertEqual(
+                path_matches_glob_fn("/",
+                                     1)(Path("/var/log/log.txt")), 1)
 
     def test_slash_star_scans_folder(self):
         """Test that folder/* scans folder/"""

=== modified file 'testing/unit/test_selection.py'
--- testing/unit/test_selection.py	2016-12-11 16:53:12 +0000
+++ testing/unit/test_selection.py	2017-01-29 21:31:37 +0000
@@ -67,12 +67,12 @@
             # with patch('duplicity.path.ROPath.isdir', return_value=True):
             # as build system's mock is too old to support it.
 
-            assert sf2(self.makeext("usr")) == 1
-            assert sf2(self.makeext("usr/local")) == 1
-            assert sf2(self.makeext("usr/local/bin")) == 1
-            assert sf2(self.makeext("usr/local/doc")) is None
-            assert sf2(self.makeext("usr/local/bin/gzip")) == 1
-            assert sf2(self.makeext("usr/local/bingzip")) is None
+            self.assertEqual(sf2(self.makeext("usr")), 2)
+            self.assertEqual(sf2(self.makeext("usr/local")), 2)
+            self.assertEqual(sf2(self.makeext("usr/local/bin")), 1)
+            self.assertEqual(sf2(self.makeext("usr/local/doc")), None)
+            self.assertEqual(sf2(self.makeext("usr/local/bin/gzip")), 1)
+            self.assertEqual(sf2(self.makeext("usr/local/bingzip")), None)
 
     def test_tuple_exclude(self):
         """Test exclude selection function made from a regular filename"""
@@ -138,13 +138,11 @@
         assert select.glob_get_sf("**.py", 1)(Path("/")) == 2
         assert select.glob_get_sf("**.py", 1)(Path("foo")) == 2
         assert select.glob_get_sf("**.py", 1)(Path("usr/local/bin")) == 2
-        assert select.glob_get_sf("/testfiles/select/**.py", 1)(Path("/testfiles/select/")) == 2
-        assert select.glob_get_sf("/testfiles/select/test.py", 1)(Path("/testfiles/select/")) == 1
-        assert select.glob_get_sf("/testfiles/select/test.py", 0)(Path("/testfiles/select/")) is None
-        # assert select.glob_get_normal_sf("/testfiles/se?ect/test.py", 1)(Path("/testfiles/select/")) is None
-        # ToDo: Not sure that the above is sensible behaviour (at least that it differs from a non-globbing
-        # include)
-        assert select.glob_get_normal_sf("/testfiles/select/test.py", 0)(Path("/testfiles/select/")) is None
+        assert select.glob_get_sf("/testfiles/select/**.py", 1)(Path("/testfiles/select")) == 2
+        assert select.glob_get_sf("/testfiles/select/test.py", 1)(Path("/testfiles/select")) == 2
+        assert select.glob_get_normal_sf("/testfiles/se?ect/test.py", 1)(Path("/testfiles/select")) == 2
+        assert select.glob_get_sf("/testfiles/select/test.py", 0)(Path("/testfiles/select")) is None
+        assert select.glob_get_normal_sf("/testfiles/select/test.py", 0)(Path("/testfiles/select")) is None
 
     def test_ignore_case(self):
         """test_ignore_case - try a few expressions with ignorecase:"""
@@ -160,11 +158,11 @@
         root = Path("/")
         select = Select(root)
 
-        assert select.glob_get_sf("/", 1)(root) == 1
-        assert select.glob_get_sf("/foo", 1)(root) == 1
-        assert select.glob_get_sf("/foo/bar", 1)(root) == 1
-        assert select.glob_get_sf("/", 0)(root) == 0
-        assert select.glob_get_sf("/foo", 0)(root) is None
+        self.assertEqual(select.glob_get_sf("/", 1)(root), 1)
+        self.assertEqual(select.glob_get_sf("/foo", 1)(root), 2)
+        self.assertEqual(select.glob_get_sf("/foo/bar", 1)(root), 2)
+        self.assertEqual(select.glob_get_sf("/", 0)(root), 0)
+        self.assertEqual(select.glob_get_sf("/foo", 0)(root), None)
 
         assert select.glob_get_sf("**.py", 1)(root) == 2
         assert select.glob_get_sf("**", 1)(root) == 1
@@ -809,7 +807,7 @@
 
     def test_exclude_after_scan(self):
         """Test select with an exclude after a pattern that would return a scan for that file"""
-        self.root = Path("testfiles/select2/3/")
+        self.root = Path("testfiles/select2/3")
         self.ParseTest([("--include", "testfiles/select2/3/**file.txt"),
                         ("--exclude", "testfiles/select2/3/3sub2"),
                         ("--include", "testfiles/select2/3/3sub1"),
@@ -892,7 +890,8 @@
 
     def test_glob_get_normal_sf_exclude_root(self):
         """Test simple exclude with / as the glob."""
-        self.assertEqual(self.exclude_glob_tester("/.git", "/"), None)
+        self.assertEqual(self.exclude_glob_tester("/.git", "/"), 0)
+        self.assertEqual(self.exclude_glob_tester("/testfile", "/"), 0)
 
     def test_glob_get_normal_sf_2(self):
         """Test same behaviour as the functional test test_globbing_replacement."""
@@ -954,7 +953,6 @@
     def test_glob_get_normal_sf_3_double_asterisks_dirs_to_scan(self):
         """Test double asterisk (**) replacement in glob_get_normal_sf with directories that should be scanned"""
         # The new special pattern, **, expands to any string of characters whether or not it contains "/".
-        self.assertEqual(self.include_glob_tester("/long/example/path/", "/**/hello.txt"), 2)
         self.assertEqual(self.include_glob_tester("/long/example/path", "/**/hello.txt"), 2)
 
     def test_glob_get_normal_sf_3_ignorecase(self):
@@ -976,5 +974,55 @@
         self.assertEqual(self.exclude_glob_tester("TEstFiles/SeLect2/2", "ignorecase:t?stFile**ect2/2",
                                                   "testfiles/select2"), 0)
 
+    def test_glob_dirs_to_scan(self):
+        """Test parent directories are marked as needing to be scanned"""
+        with patch('duplicity.path.Path.isdir') as mock_isdir:
+            mock_isdir.return_value = True
+            self.assertEqual(
+                self.glob_tester("parent", "parent/hello.txt", 1, "parent"), 2)
+
+    def test_glob_dirs_to_scan_glob(self):
+        """Test parent directories are marked as needing to be scanned - globs"""
+        with patch('duplicity.path.Path.isdir') as mock_isdir:
+            mock_isdir.return_value = True
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1", "*/select/1/1", 1,
+                                 "testfiles/select"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1/2",
+                                 "*/select/1/2/1", 1, "testfiles/select"), 2)
+            self.assertEqual(
+                self.glob_tester("parent", "parent/hel?o.txt", 1, "parent"), 2)
+            self.assertEqual(
+                self.glob_tester("test/parent/folder",
+                                 "test/par*t/folder/hello.txt", 1, "test"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1/1",
+                                 "**/1/2/1", 1, "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select2/3/3sub2",
+                                 "testfiles/select2/3/**file.txt", 1,
+                                 "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1/2",
+                                 "*/select/1/2/1", 1, "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1",
+                                 "testfiles/select**/2", 1, "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/efools",
+                                 "testfiles/select/*foo*/p*", 1,
+                                 "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/3",
+                                 "testfiles/select/**2", 1, "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select2/1/1sub1/1sub1sub2",
+                                 "testfiles/select2/**/3sub3sub2/3sub3su?2_file.txt",
+                                 1, "testfiles"), 2)
+            self.assertEqual(
+                self.glob_tester("testfiles/select/1",
+                                 "*/select/1/1", 1, "testfiles"), 2)
+
 if __name__ == "__main__":
     unittest.main()


References