← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~ilasc/turnip:stats-collection-point into turnip:master

 

Ioana Lasc has proposed merging ~ilasc/turnip:stats-collection-point into turnip:master.

Commit message:
Add metrics collection points to GitProcessProtocol

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~ilasc/turnip/+git/turnip/+merge/391365

This MP uses print statements to show points in the code where we could potentially collect stats to be sent to statsd. The purpose of the MP is to give us a better starting point in discussing how and where is best to collect stats: in the PackBackendProtocol over pipes, inside the GitProcessProtocol itself, in a separate thread or otherwise.

So far I've tried several approaches:
1. wrap "default_reactor.spawnProcess(self.peer, cmd, args, env=env)" into a subprocess.call 
2. write to GitProcessProtocol from the PackBackendProtocol and back 
3. add additional fd to "default_reactor.spawnProcess(self.peer, cmd, args, env=env)" in the form "childFDs = { 0: "w", 1: "r", 2: "r" , 4: "w"}" and attempted to call writeToChild(4, _data_) but that didn't work as PackBackendProtocol is not a twisted.internet.protocol.ProcessProtocol 
4. from the failure of option 3 I thought if I could introduce another layer of twisted.internet.protocol.ProcessProtocol between PBP and GitProcessProtocol I might be able to use additional pipes to preserve Git's normal output as well as communicate metrics. 
5. spawn a "self written" separate thread to monitor the GitProcessProtocol process from outside
Essentially all 4 major approaches failed by breaking existing PBP <-> Git pipes or by not being able to pass metrics data through, I didn't pursue the 5th as I understand we do not want to go down the multithreading route. 

The current proposal uses the GitProcessProtocol itself as a stats collection point.Running the "test_a_single_clone_and_push" unit test will produce: 
turnip.pack.tests.test_functional.TestBackendFunctional.test_a_single_clone_and_push (v0 protocol) ... 

PID  4545  rss with psutil:  0  MB
PID  4545  maxrss with getrusage:  88  MB
PID  4545  cpu utilization as a percentage  0.0
PID  4545  hostname:  turnip-bionic
PID  4545  snapshot CPU times:  pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0)
PID  4545  execution time: 0.856162071228
PID  4545  command: git-upload-pack
PID  4545  repository:  /test


PID  4551  rss with psutil:  72  MB
PID  4551  maxrss with getrusage:  88  MB
PID  4551  cpu utilization as a percentage  0.0
PID  4551  hostname:  turnip-bionic
PID  4551  snapshot CPU times:  pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0)
PID  4551  execution time: 0.970222949982
PID  4551  command: git-receive-pack
PID  4551  repository:  /test
ok
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~ilasc/turnip:stats-collection-point into turnip:master.
diff --git a/turnip/pack/git.py b/turnip/pack/git.py
index 4407085..f0d9244 100644
--- a/turnip/pack/git.py
+++ b/turnip/pack/git.py
@@ -12,8 +12,11 @@ __metaclass__ = type
 
 
 import uuid
-
+import psutil
+import resource
 import six
+import socket
+import time
 from twisted.internet import (
     defer,
     error,
@@ -242,14 +245,34 @@ class GitProcessProtocol(protocol.ProcessProtocol):
     def __init__(self, peer):
         self.peer = peer
         self.out_started = False
+        self.create_time = 0
+        self.pid = 0
+        self.os_process = None
+        self.monitoring_thread = None
+
+        # increase number of processes counter here 
+        # and spwan send to statsd 
+        #self.statsd_client.incr('turnip.hostname.total.processes', rate=1)
 
     def connectionMade(self):
+        self.pid = self.transport.pid
+        self.os_process = psutil.Process(pid = self.pid)
+        self.create_time = self.os_process.create_time()
+
         self.peer.setPeer(self)
         self.peer.transport.registerProducer(self, True)
         self.transport.registerProducer(
             UnstoppableProducerWrapper(self.peer.transport), True)
         self.peer.resumeProducing()
 
+        print('\n')
+        print('PID ',self.pid, ' rss with psutil: ', self.os_process.memory_info().rss / 1024**2, ' MB')
+        print('PID ',self.pid, ' maxrss with getrusage: ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, ' MB')
+        print('PID ',self.pid, ' cpu utilization as a percentage ', self.os_process.cpu_percent() / psutil.cpu_count())
+        print('PID ',self.pid, ' hostname: ', socket.gethostname())
+        print('PID ',self.pid, ' snapshot CPU times: ', self.os_process.cpu_times())
+
+
     def outReceived(self, data):
         self.out_started = True
         self.peer.sendRawData(data)
@@ -292,6 +315,22 @@ class GitProcessProtocol(protocol.ProcessProtocol):
         self.transport.closeChildFD(0)
 
     def processEnded(self, reason):
+        # this instance of the GitProcessProtocol just ended 
+        # we record it duration for statsd metrics here 
+        process_duration = time.time() - self.create_time
+        if (self.peer.command == b'git-upload-pack' or
+            self.peer.command == b'git-receive-pack'):
+            print('PID ',self.pid, ' execution time:', 
+                (time.time() - self.create_time))
+            print('PID ',self.pid, ' command:', self.peer.command)
+            print('PID ',self.pid, ' repository: ',self.peer.raw_pathname)
+            
+        #self.statsd_client.timing('turnip.hostname.operation.timer', process_duration)
+
+        # decrease number of processes counter here 
+        # and spwan send to statsd 
+        #self.statsd_client.decr('turnip.hostname.total.processes')
+
         if reason.check(error.ProcessDone) and reason.value.exitCode != 0:
             code = reason.value.exitCode
             self.peer.log.info(
diff --git a/turnip/pack/tests/test_functional.py b/turnip/pack/tests/test_functional.py
index c06c4e2..34ea4f4 100644
--- a/turnip/pack/tests/test_functional.py
+++ b/turnip/pack/tests/test_functional.py
@@ -179,6 +179,28 @@ class FunctionalTestMixin(WithScenarios):
         defer.returnValue((out, err))
 
     @defer.inlineCallbacks
+    def test_a_single_clone_and_push(self):
+        # Test a full clone, commit, push, clone, commit, push, pull
+        # cycle using the backend server.
+        test_root = self.useFixture(TempDir()).path
+        clone1 = os.path.join(test_root, 'clone1')
+
+        # Clone the empty repo from the backend and commit to it.
+        yield self.assertCommandSuccess((b'git', b'clone', self.url, clone1))
+        yield self.assertCommandSuccess(
+            (b'git', b'config', b'user.name', b'Test User'), path=clone1)
+        yield self.assertCommandSuccess(
+            (b'git', b'config', b'user.email', b'test@xxxxxxxxxxx'),
+            path=clone1)
+        yield self.assertCommandSuccess(
+            (b'git', b'commit', b'--allow-empty', b'-m', b'Committed test'),
+            path=clone1)
+
+        # Push it back up to the backend.
+        yield self.assertCommandSuccess(
+            (b'git', b'push', b'origin', b'master'), path=clone1)
+
+    @defer.inlineCallbacks
     def test_clone_and_push(self):
         # Test a full clone, commit, push, clone, commit, push, pull
         # cycle using the backend server.