← Back to team overview

launchpad-reviewers team mailing list archive

Re: [Merge] lp:~cjwatson/launchpad/codeimport-git-worker into lp:launchpad

 

I've sprinkled a bunch of extra comments around in various places.

Diff comments:

> 
> === modified file 'lib/lp/codehosting/codeimport/worker.py'
> --- lib/lp/codehosting/codeimport/worker.py	2016-10-11 15:36:32 +0000
> +++ lib/lp/codehosting/codeimport/worker.py	2016-10-11 15:36:32 +0000
> @@ -918,3 +973,67 @@
>          """See `PullingImportWorker.probers`."""
>          from bzrlib.bzrdir import BzrProber, RemoteBzrProber
>          return [BzrProber, RemoteBzrProber]
> +
> +
> +class GitToGitImportWorker(ImportWorker):
> +    """An import worker for imports from Git to Git."""
> +
> +    def _runGit(self, *args, **kwargs):
> +        """Run git with arguments, sending output to the logger."""
> +        cmd = ["git"] + list(args)
> +        git_process = subprocess.Popen(
> +            cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, **kwargs)
> +        for line in git_process.stdout:
> +            line = line.decode("UTF-8", "replace").rstrip("\n")
> +            # Remove any user/password data from URLs.
> +            line = re.sub(r"://([^:]*:[^@]*@)(\S+)", r"://\2", line)
> +            self._logger.info(line)
> +        retcode = git_process.wait()
> +        if retcode:
> +            raise subprocess.CalledProcessError(retcode, cmd)
> +
> +    def _doImport(self):
> +        self._logger.info("Starting job.")
> +        self._logger.info(config.codeimport.git_repository_store)
> +        try:
> +            self._opener_policy.checkOneURL(self.source_details.url)

As far as I can see from http.c, git follows up to 20 redirects as long as they're to http/https/ftp/ftps schemes, and there's no way to configure this.

The only things that this would let through that checkOneURL tries to forbid would be redirects to git.launchpad.net and redirects to a blacklisted hostname, which in practice is always localhost or 127.0.0.1.  Perhaps we can get away with this state for the time being?  Redirects to git.launchpad.net would basically be a slightly annoying waste of resources but nothing more, while redirects to localhost should in practice be harmless given that code import workers don't run web or FTP servers.

The only way I can think of to improve isolation here would be to direct all git requests through a proxy, which would be a non-trivial chunk of work.

> +        except BadUrl as e:
> +            self._logger.info("Invalid URL: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE_FORBIDDEN
> +        unauth_target_url = urljoin(
> +            config.codeimport.git_repository_store,
> +            self.source_details.target_id)
> +        split = urlsplit(unauth_target_url)
> +        if split.hostname:
> +            target_netloc = ":%s@%s" % (
> +                self.source_details.macaroon.serialize(), split.hostname)
> +        else:
> +            target_netloc = ""
> +        target_url = urlunsplit([
> +            split.scheme, target_netloc, split.path, "", ""])
> +        # XXX cjwatson 2016-10-11: Ideally we'd put credentials in a
> +        # credentials store instead.  However, git only accepts credentials
> +        # that have both a non-empty username and a non-empty password.
> +        self._logger.info("Getting existing repository from hosting service.")
> +        try:
> +            self._runGit("clone", "--bare", target_url, "repository")
> +        except subprocess.CalledProcessError as e:
> +            self._logger.info(
> +                "Unable to get existing repository from hosting service: "
> +                "%s" % e)
> +            return CodeImportWorkerExitCode.FAILURE
> +        self._logger.info("Fetching remote repository.")
> +        try:
> +            self._runGit(
> +                "remote", "add", "-f", "--mirror=fetch",
> +                "mirror", self.source_details.url, cwd="repository")

More or less, with the addition of --prune.  Though good grief HEAD handling is horrible, but I've just added a comment about that for now.

> +        except subprocess.CalledProcessError as e:
> +            self._logger.info("Unable to fetch remote repository: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE_INVALID
> +        self._logger.info("Pushing repository to hosting service.")
> +        try:
> +            self._runGit("push", "--mirror", target_url, cwd="repository")
> +        except subprocess.CalledProcessError as e:
> +            self._logger.info("Unable to push to hosting service: %s" % e)
> +            return CodeImportWorkerExitCode.FAILURE
> +        return CodeImportWorkerExitCode.SUCCESS
> 
> === modified file 'lib/lp/services/config/schema-lazr.conf'
> --- lib/lp/services/config/schema-lazr.conf	2016-10-11 15:36:32 +0000
> +++ lib/lp/services/config/schema-lazr.conf	2016-10-11 15:36:32 +0000
> @@ -417,7 +421,7 @@
>  svn_revisions_import_limit: 500
>  
>  # Secret key for macaroons used to grant git push permission to workers.
> -macaroon_secret_key:
> +macaroon_secret_key: none

ImplicitTypeSection._convert does:

    elif match.group('none'):
        return None

So yeah.  It works out the same either way since CodeImportJobMacaroonIssuer._root_secret just does "if not secret:", but I think None is a tidier default than the empty string.

>  
>  [codeimportdispatcher]
>  # The directory where the code import worker should be directed to


-- 
https://code.launchpad.net/~cjwatson/launchpad/codeimport-git-worker/+merge/308142
Your team Launchpad code reviewers is subscribed to branch lp:launchpad.


References