← Back to team overview

beeseek-devs team mailing list archive

[Branch ~beeseek-devs/beeseek/trunk] Rev 205: Add the script to track user actions to the pages sent with the proxy.

 

------------------------------------------------------------
revno: 205
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: trunk
timestamp: Thu 2009-01-29 16:54:54 +0100
message:
  Add the script to track user actions to the pages sent with the proxy.
  All HTML pages have now a <script> tag in the <head> section that points 
  to the JavaScript file that will track and send all user's actions.
renamed:
  beeseek/honeybee/session.py => beeseek/honeybee/main.py
modified:
  beeseek/decoders/chunks.py
  beeseek/decoders/gzip.py
  beeseek/honeybee/handler.py
  beeseek/network/highlevel.py
  beeseek/network/http.py
  beeseek/network/lowlevel.py
  honeybee
    ------------------------------------------------------------
    revno: 200.2.17
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Thu 2009-01-29 16:52:54 +0100
    message:
      Check the response status before adding the script.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.16
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Wed 2009-01-28 20:43:31 +0100
    message:
      Adapt code.
    modified:
      beeseek/decoders/gzip.py
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.15
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Wed 2009-01-28 20:25:02 +0100
    message:
      Merge with trunk.
    modified:
      beeseek/decoders/base.py
      beeseek/decoders/chunks.py
      beeseek/decoders/gzip.py
      beeseek/network/http.py
    ------------------------------------------------------------
    revno: 200.2.14
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-25 17:32:13 +0100
    message:
      Do not raise errors if the peer has closed the connection.
    modified:
      beeseek/network/lowlevel.py
    ------------------------------------------------------------
    revno: 200.2.13
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-25 13:22:10 +0100
    message:
      Send Host as first header.
    modified:
      beeseek/network/http.py
    ------------------------------------------------------------
    revno: 200.2.12
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sat 2009-01-24 13:11:30 +0100
    message:
      Check if Connection: close before using a chunked Transfer-Encoding.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.11
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sat 2009-01-24 13:07:19 +0100
    message:
      Decode multiple gzip members.
    modified:
      beeseek/decoders/gzip.py
    ------------------------------------------------------------
    revno: 200.2.10
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Fri 2009-01-23 14:28:42 +0100
    message:
      Add flush and improve data sending.
    modified:
      beeseek/decoders/base.py
      beeseek/decoders/gzip.py
      beeseek/honeybee/handler.py
      beeseek/network/highlevel.py
      beeseek/network/http.py
    ------------------------------------------------------------
    revno: 200.2.9
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Tue 2009-01-20 20:43:14 +0100
    message:
      Decode data from the server.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.8
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Tue 2009-01-20 20:42:50 +0100
    message:
      Various fixes.
    modified:
      beeseek/decoders/base.py
      beeseek/decoders/chunks.py
      beeseek/decoders/gzip.py
    ------------------------------------------------------------
    revno: 200.2.7
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Tue 2009-01-20 20:18:24 +0100
    message:
      Merge with trunk to get the GzipDecoder.
    added:
      beeseek/decoders/
      beeseek/decoders/__init__.py
      beeseek/decoders/base.py
      beeseek/decoders/chunks.py
      beeseek/decoders/gzip.py
    modified:
      beeseek/network/http.py
      beeseek/tests/network.py
    ------------------------------------------------------------
    revno: 200.2.6
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-18 19:55:24 +0100
    message:
      Temporary workaround to allow safe browsing.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.5
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-18 13:33:35 +0100
    message:
      Modify Content-Length when sending modified pages.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.4
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-18 13:07:56 +0100
    message:
      Try to keep the message the most unchanged reducing the CPU footprint.
    modified:
      beeseek/honeybee/handler.py
    ------------------------------------------------------------
    revno: 200.2.3
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: add-script
    timestamp: Sun 2009-01-18 13:04:54 +0100
    message:
      Do not send the message terminator in _iter_raw_chunked_body().
    modified:
      beeseek/network/http.py
    ------------------------------------------------------------
    revno: 200.2.2
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: results-collect
    timestamp: Sun 2009-01-18 12:44:40 +0100
    message:
      Modify pages adding the BeeSeek script.
    modified:
      beeseek/honeybee/handler.py
      beeseek/network/http.py
    ------------------------------------------------------------
    revno: 200.2.1
    committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
    branch nick: results-collect
    timestamp: Sat 2009-01-17 20:09:43 +0100
    message:
      Renamed module to avoid confusion.
    renamed:
      beeseek/honeybee/session.py => beeseek/honeybee/main.py
    modified:
      honeybee

=== modified file 'beeseek/decoders/chunks.py'
--- beeseek/decoders/chunks.py	2009-01-27 17:50:41 +0000
+++ beeseek/decoders/chunks.py	2009-01-28 19:25:02 +0000
@@ -41,7 +41,7 @@
 
 
     def recv(self, fp, size=-1):
-        chunklen = self.chunklen
+        chunklen = self._chunklen
         if chunklen == 0:
             chunklen = self._read_chunksize(fp)
             if chunklen == 0:

=== modified file 'beeseek/decoders/gzip.py'
--- beeseek/decoders/gzip.py	2009-01-28 18:51:01 +0000
+++ beeseek/decoders/gzip.py	2009-01-28 19:43:31 +0000
@@ -24,16 +24,18 @@
 
 class GzipDecoder(object):
 
-    __slots__ = '_decompress', '_buffer', '_header'
+    __slots__ = '_decompressor', '_buffer', '_header', '_unused_data'
     implements(IDecoder)
 
     def __init__(self):
-        self._decompress = None
+        self._decompressor = None
         self._buffer = ''
         self._header = 10
+        self._unused_data = ''
 
     def _decode(self, data):
-        if not self._decompress:
+        if not self._decompressor:
+            data = self._unused_data + data
             header = self._header
             read = header - len(data)
             if read >= 0:
@@ -41,8 +43,14 @@
                 return ''
             else:
                 data = data[header:]
-            self._decompress = zlib.decompressobj().decompress
-        return self._decompress(data)
+            self._decompressor = zlib.decompressobj(-zlib.MAX_WBITS)
+        data = self._decompressor.decompress(data)
+
+        if self._decompressor.unused_data:
+            self._unused_data = self._decompressor.unused_data
+            self._decompressor = None
+
+        return data
 
     def _read_buffer(self, size):
         data = self._buffer
@@ -53,13 +61,13 @@
         if size < 0:
             data = self._buffer
             if not data:
-                return self._decode(fp.raw_recv(size))
+                return self._decode(fp.recv(size))
             else:
                 self._buffer = ''
                 return data
         else:
             data = self._read_buffer(size)
             if not data:
-                return self._decode(fp.raw_recv(size))
+                return self._decode(fp.recv(size))
             else:
                 return data

=== modified file 'beeseek/honeybee/handler.py'
--- beeseek/honeybee/handler.py	2009-01-17 16:41:25 +0000
+++ beeseek/honeybee/handler.py	2009-01-29 15:52:54 +0000
@@ -20,6 +20,7 @@
 from urllib import unquote_plus
 from beeseek.interfaces import implements
 from beeseek import network, log, instance
+from beeseek.decoders import NullDecoder, GzipDecoder
 from beeseek.network import (IPSocket, IServerApplication,
                              HTTPServerApplication, HTTPClientApplication)
 from beeseek.ui import html
@@ -31,6 +32,10 @@
     __slots__ = ()
     implements(IServerApplication)
 
+    scripttag = ('<script type="text/javascript" >'
+                 'src="http://www.beeseek.org/search-data/script.js";>'
+                 '</script>')
+
     def handle(self):
         try:
             requestline, clientheaders = self.read_request()
@@ -97,12 +102,97 @@
         host.flush()
 
         statusline, serverheaders = host.read_response()
-        self.start_response(statusline[0], statusline[1], serverheaders,
-                            version=statusline[2])
-        self.raw_writelines(host.iter_raw_body())
+        if (requestline[0] == self.GET and statusline[0] == 200 and
+           'Content-Type' in serverheaders):
+            contenttype = serverheaders['Content-Type']
+            if ';' in contenttype:
+                contenttype = contenttype.split(';', 1)[0].rstrip()
+
+            if contenttype == 'text/html':
+                if 'Content-Encoding' in serverheaders:
+                    # We don't need to compress the content that we send to
+                    # the client: this will just require much CPU
+                    contentencoding = serverheaders.pop('Content-Encoding')
+                    serverheaders['X-Original-Encoding'] = contentencoding
+                    decoder = GzipDecoder()
+                    if 'Content-Length' in serverheaders:
+                        del serverheaders['Content-Length']
+                        if ('Connection' not in serverheaders
+                           or serverheaders['Connection'] != 'close'):
+                            serverheaders['Transfer-Encoding'] = 'chunked'
+                else:
+                    decoder = NullDecoder()
+                    if 'Content-Length' in serverheaders:
+                        contentlen = (host._decoder._chunklen +
+                                      len(self.scripttag))
+                        serverheaders['Content-Length'] = contentlen
+
+                self.start_response(statusline[0], statusline[1],
+                                    serverheaders, version=statusline[2])
+                self.send_script(host, decoder)
+                while True:
+                    data = decoder.recv(host)
+                    if not data:
+                        break
+                    self.write(data)
+            else:
+                self.start_response(statusline[0], statusline[1],
+                                    serverheaders, version=statusline[2])
+                self.raw_writelines(host.iter_raw_body())
+        else:
+            self.start_response(statusline[0], statusline[1],
+                                serverheaders, version=statusline[2])
+            self.raw_writelines(host.iter_raw_body())
+
         self.end_response()
         self.flush()
 
+    def _find_head_start(self, host, decoder, pool):
+        while True:
+            data = decoder.recv(host)
+            if not data:
+                return
+
+            while True:
+                i = data.find('<')
+                if i < 0:
+                    pool.append(data)
+                    break
+                else:
+                    pool.append(data[:i])
+                    data = data[i:]
+                    if len(data) < 5:
+                        data += decoder.read(host, 5 - len(data))
+                    if data[:5].lower() == '<head':
+                        return data
+                    else:
+                        pool.append(data[0])
+                        data = data[1:]
+
+    def _find_head_end(self, host, decoder, pool, data):
+        while data:
+            while True:
+                i = data.find('>')
+                if i < 0:
+                    pool.append(data)
+                    break
+                else:
+                    i += 1
+                    pool.append(data[:i])
+                    return data[i:]
+            data = decoder.recv(host)
+
+    def send_script(self, host, decoder):
+        pool = []
+        data = self._find_head_start(host, decoder, pool)
+        if not data:
+            return
+        data = self._find_head_end(host, decoder, pool, data)
+        self.write(''.join(pool))
+        self.write(self.scripttag)
+        self.write(data)
+
+
     def handle_connect(self, requestline):
         hostname, port = requestline[1].split(':')
         host = IPSocket()
@@ -126,6 +216,7 @@
                     self.raw_write(data)
                     self.flush()
 
+
     def handle_search(self, keywords):
         keywords = keywords.lower().split()
         keysdb = instance.keysdb

=== renamed file 'beeseek/honeybee/session.py' => 'beeseek/honeybee/main.py'
=== modified file 'beeseek/network/highlevel.py'
--- beeseek/network/highlevel.py	2009-01-17 16:41:25 +0000
+++ beeseek/network/highlevel.py	2009-01-23 13:28:42 +0000
@@ -81,7 +81,7 @@
 
     def writelines(self, data):
         for item in data:
-            self.write(data)
+            self.write(item)
 
     # Handler methods
 

=== modified file 'beeseek/network/http.py'
--- beeseek/network/http.py	2009-01-28 19:08:55 +0000
+++ beeseek/network/http.py	2009-01-28 19:25:02 +0000
@@ -54,6 +54,9 @@
     def readline(self, size=-1):
         """Read the next line."""
 
+    def iter_body(self):
+        pass
+
     def iter_raw_body(self):
         pass
 
@@ -170,6 +173,13 @@
     def readline(self, size=-1):
         return self._decoder.readline(self, size)
 
+    def iter_body(self):
+        while True:
+            data = self.recv()
+            if not data:
+                return
+            yield data
+
     def iter_raw_body(self):
         return self._decoder.iter_raw_body(self)
 
@@ -226,7 +236,7 @@
         self.raw_write('\r\n')
 
     def _write_chunk(self, data):
-        self.raw_write('%x\r\n%s\r\n' % (len(data), data))
+        self.raw_write('%x\r\n%s' % (len(data), data))
 
     def _end_chunked_message(self):
         self.raw_write('0\r\n\r\n')

=== modified file 'beeseek/network/lowlevel.py'
--- beeseek/network/lowlevel.py	2009-01-17 16:41:25 +0000
+++ beeseek/network/lowlevel.py	2009-01-25 16:32:13 +0000
@@ -222,20 +222,32 @@
 
     def raw_recv(self, size=-1):
         if size < 0:
-            size = 1024
-        data = os.read(self._fileno, size)
+            size = 5120
+        try:
+            data = os.read(self._fileno, size)
+        except IOError:
+            self.closed = True
+            return ''
         if not data:
             self.closed = True
         return data
 
     def raw_read(self, size=-1):
-        data = self._rfile.read(size)
+        try:
+            data = self._rfile.read(size)
+        except IOError:
+            self.closed = True
+            return ''
         if not data:
             self.closed = True
         return data
 
     def raw_readline(self, size=-1):
-        data = self._rfile.readline(size)
+        try:
+            data = self._rfile.readline(size)
+        except IOError:
+            self.closed = True
+            return ''
         if not data:
             self.closed = True
         return data

=== modified file 'honeybee'
--- honeybee	2009-01-13 13:08:37 +0000
+++ honeybee	2009-01-17 19:09:43 +0000
@@ -23,7 +23,7 @@
 
 if __name__ == '__main__':
     try:
-        from beeseek.honeybee.session import HoneybeeSession
+        from beeseek.honeybee.main import HoneybeeSession
     except ImportError:
         print >> sys.stderr, ('%s: ERROR: Cannot find the BeeSeek Library. '
                               'Please check your installation.\n\n' %



--
BeeSeek mainline
https://code.launchpad.net/~beeseek-devs/beeseek/trunk

Your team BeeSeek Developers is subscribed to branch lp:beeseek.
To unsubscribe from this branch go to https://code.launchpad.net/~beeseek-devs/beeseek/trunk/+edit-subscription.