beeseek-devs team mailing list archive
-
beeseek-devs team
-
Mailing list archive
-
Message #00119
[Branch ~beeseek-devs/beeseek/trunk] Rev 205: Add the script to track user actions to the pages sent with the proxy.
------------------------------------------------------------
revno: 205
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: trunk
timestamp: Thu 2009-01-29 16:54:54 +0100
message:
Add the script to track user actions to the pages sent with the proxy.
All HTML pages have now a <script> tag in the <head> section that points
to the JavaScript file that will track and send all user's actions.
renamed:
beeseek/honeybee/session.py => beeseek/honeybee/main.py
modified:
beeseek/decoders/chunks.py
beeseek/decoders/gzip.py
beeseek/honeybee/handler.py
beeseek/network/highlevel.py
beeseek/network/http.py
beeseek/network/lowlevel.py
honeybee
------------------------------------------------------------
revno: 200.2.17
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Thu 2009-01-29 16:52:54 +0100
message:
Check the response status before adding the script.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.16
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Wed 2009-01-28 20:43:31 +0100
message:
Adapt code.
modified:
beeseek/decoders/gzip.py
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.15
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Wed 2009-01-28 20:25:02 +0100
message:
Merge with trunk.
modified:
beeseek/decoders/base.py
beeseek/decoders/chunks.py
beeseek/decoders/gzip.py
beeseek/network/http.py
------------------------------------------------------------
revno: 200.2.14
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-25 17:32:13 +0100
message:
Do not raise errors if the peer has closed the connection.
modified:
beeseek/network/lowlevel.py
------------------------------------------------------------
revno: 200.2.13
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-25 13:22:10 +0100
message:
Send Host as first header.
modified:
beeseek/network/http.py
------------------------------------------------------------
revno: 200.2.12
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sat 2009-01-24 13:11:30 +0100
message:
Check if Connection: close before using a chunked Transfer-Encoding.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.11
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sat 2009-01-24 13:07:19 +0100
message:
Decode multiple gzip members.
modified:
beeseek/decoders/gzip.py
------------------------------------------------------------
revno: 200.2.10
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Fri 2009-01-23 14:28:42 +0100
message:
Add flush and improve data sending.
modified:
beeseek/decoders/base.py
beeseek/decoders/gzip.py
beeseek/honeybee/handler.py
beeseek/network/highlevel.py
beeseek/network/http.py
------------------------------------------------------------
revno: 200.2.9
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Tue 2009-01-20 20:43:14 +0100
message:
Decode data from the server.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.8
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Tue 2009-01-20 20:42:50 +0100
message:
Various fixes.
modified:
beeseek/decoders/base.py
beeseek/decoders/chunks.py
beeseek/decoders/gzip.py
------------------------------------------------------------
revno: 200.2.7
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Tue 2009-01-20 20:18:24 +0100
message:
Merge with trunk to get the GzipDecoder.
added:
beeseek/decoders/
beeseek/decoders/__init__.py
beeseek/decoders/base.py
beeseek/decoders/chunks.py
beeseek/decoders/gzip.py
modified:
beeseek/network/http.py
beeseek/tests/network.py
------------------------------------------------------------
revno: 200.2.6
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-18 19:55:24 +0100
message:
Temporary workaround to allow safe browsing.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.5
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-18 13:33:35 +0100
message:
Modify Content-Length when sending modified pages.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.4
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-18 13:07:56 +0100
message:
Try to keep the message the most unchanged reducing the CPU footprint.
modified:
beeseek/honeybee/handler.py
------------------------------------------------------------
revno: 200.2.3
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: add-script
timestamp: Sun 2009-01-18 13:04:54 +0100
message:
Do not send the message terminator in _iter_raw_chunked_body().
modified:
beeseek/network/http.py
------------------------------------------------------------
revno: 200.2.2
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: results-collect
timestamp: Sun 2009-01-18 12:44:40 +0100
message:
Modify pages adding the BeeSeek script.
modified:
beeseek/honeybee/handler.py
beeseek/network/http.py
------------------------------------------------------------
revno: 200.2.1
committer: Andrea Corbellini <andrea.corbellini@xxxxxxxxxxx>
branch nick: results-collect
timestamp: Sat 2009-01-17 20:09:43 +0100
message:
Renamed module to avoid confusion.
renamed:
beeseek/honeybee/session.py => beeseek/honeybee/main.py
modified:
honeybee
=== modified file 'beeseek/decoders/chunks.py'
--- beeseek/decoders/chunks.py 2009-01-27 17:50:41 +0000
+++ beeseek/decoders/chunks.py 2009-01-28 19:25:02 +0000
@@ -41,7 +41,7 @@
def recv(self, fp, size=-1):
- chunklen = self.chunklen
+ chunklen = self._chunklen
if chunklen == 0:
chunklen = self._read_chunksize(fp)
if chunklen == 0:
=== modified file 'beeseek/decoders/gzip.py'
--- beeseek/decoders/gzip.py 2009-01-28 18:51:01 +0000
+++ beeseek/decoders/gzip.py 2009-01-28 19:43:31 +0000
@@ -24,16 +24,18 @@
class GzipDecoder(object):
- __slots__ = '_decompress', '_buffer', '_header'
+ __slots__ = '_decompressor', '_buffer', '_header', '_unused_data'
implements(IDecoder)
def __init__(self):
- self._decompress = None
+ self._decompressor = None
self._buffer = ''
self._header = 10
+ self._unused_data = ''
def _decode(self, data):
- if not self._decompress:
+ if not self._decompressor:
+ data = self._unused_data + data
header = self._header
read = header - len(data)
if read >= 0:
@@ -41,8 +43,14 @@
return ''
else:
data = data[header:]
- self._decompress = zlib.decompressobj().decompress
- return self._decompress(data)
+ self._decompressor = zlib.decompressobj(-zlib.MAX_WBITS)
+ data = self._decompressor.decompress(data)
+
+ if self._decompressor.unused_data:
+ self._unused_data = self._decompressor.unused_data
+ self._decompressor = None
+
+ return data
def _read_buffer(self, size):
data = self._buffer
@@ -53,13 +61,13 @@
if size < 0:
data = self._buffer
if not data:
- return self._decode(fp.raw_recv(size))
+ return self._decode(fp.recv(size))
else:
self._buffer = ''
return data
else:
data = self._read_buffer(size)
if not data:
- return self._decode(fp.raw_recv(size))
+ return self._decode(fp.recv(size))
else:
return data
=== modified file 'beeseek/honeybee/handler.py'
--- beeseek/honeybee/handler.py 2009-01-17 16:41:25 +0000
+++ beeseek/honeybee/handler.py 2009-01-29 15:52:54 +0000
@@ -20,6 +20,7 @@
from urllib import unquote_plus
from beeseek.interfaces import implements
from beeseek import network, log, instance
+from beeseek.decoders import NullDecoder, GzipDecoder
from beeseek.network import (IPSocket, IServerApplication,
HTTPServerApplication, HTTPClientApplication)
from beeseek.ui import html
@@ -31,6 +32,10 @@
__slots__ = ()
implements(IServerApplication)
+ scripttag = ('<script type="text/javascript" >'
+ 'src="http://www.beeseek.org/search-data/script.js">'
+ '</script>')
+
def handle(self):
try:
requestline, clientheaders = self.read_request()
@@ -97,12 +102,97 @@
host.flush()
statusline, serverheaders = host.read_response()
- self.start_response(statusline[0], statusline[1], serverheaders,
- version=statusline[2])
- self.raw_writelines(host.iter_raw_body())
+ if (requestline[0] == self.GET and statusline[0] == 200 and
+ 'Content-Type' in serverheaders):
+ contenttype = serverheaders['Content-Type']
+ if ';' in contenttype:
+ contenttype = contenttype.split(';', 1)[0].rstrip()
+
+ if contenttype == 'text/html':
+ if 'Content-Encoding' in serverheaders:
+ # We don't need to compress the content that we send to
+ # the client: this will just require much CPU
+ contentencoding = serverheaders.pop('Content-Encoding')
+ serverheaders['X-Original-Encoding'] = contentencoding
+ decoder = GzipDecoder()
+ if 'Content-Length' in serverheaders:
+ del serverheaders['Content-Length']
+ if ('Connection' not in serverheaders
+ or serverheaders['Connection'] != 'close'):
+ serverheaders['Transfer-Encoding'] = 'chunked'
+ else:
+ decoder = NullDecoder()
+ if 'Content-Length' in serverheaders:
+ contentlen = (host._decoder._chunklen +
+ len(self.scripttag))
+ serverheaders['Content-Length'] = contentlen
+
+ self.start_response(statusline[0], statusline[1],
+ serverheaders, version=statusline[2])
+ self.send_script(host, decoder)
+ while True:
+ data = decoder.recv(host)
+ if not data:
+ break
+ self.write(data)
+ else:
+ self.start_response(statusline[0], statusline[1],
+ serverheaders, version=statusline[2])
+ self.raw_writelines(host.iter_raw_body())
+ else:
+ self.start_response(statusline[0], statusline[1],
+ serverheaders, version=statusline[2])
+ self.raw_writelines(host.iter_raw_body())
+
self.end_response()
self.flush()
+ def _find_head_start(self, host, decoder, pool):
+ while True:
+ data = decoder.recv(host)
+ if not data:
+ return
+
+ while True:
+ i = data.find('<')
+ if i < 0:
+ pool.append(data)
+ break
+ else:
+ pool.append(data[:i])
+ data = data[i:]
+ if len(data) < 5:
+ data += decoder.read(host, 5 - len(data))
+ if data[:5].lower() == '<head':
+ return data
+ else:
+ pool.append(data[0])
+ data = data[1:]
+
+ def _find_head_end(self, host, decoder, pool, data):
+ while data:
+ while True:
+ i = data.find('>')
+ if i < 0:
+ pool.append(data)
+ break
+ else:
+ i += 1
+ pool.append(data[:i])
+ return data[i:]
+ data = decoder.recv(host)
+
+ def send_script(self, host, decoder):
+ pool = []
+ data = self._find_head_start(host, decoder, pool)
+ if not data:
+ return
+ data = self._find_head_end(host, decoder, pool, data)
+ self.write(''.join(pool))
+ self.write(self.scripttag)
+ self.write(data)
+
+
def handle_connect(self, requestline):
hostname, port = requestline[1].split(':')
host = IPSocket()
@@ -126,6 +216,7 @@
self.raw_write(data)
self.flush()
+
def handle_search(self, keywords):
keywords = keywords.lower().split()
keysdb = instance.keysdb
=== renamed file 'beeseek/honeybee/session.py' => 'beeseek/honeybee/main.py'
=== modified file 'beeseek/network/highlevel.py'
--- beeseek/network/highlevel.py 2009-01-17 16:41:25 +0000
+++ beeseek/network/highlevel.py 2009-01-23 13:28:42 +0000
@@ -81,7 +81,7 @@
def writelines(self, data):
for item in data:
- self.write(data)
+ self.write(item)
# Handler methods
=== modified file 'beeseek/network/http.py'
--- beeseek/network/http.py 2009-01-28 19:08:55 +0000
+++ beeseek/network/http.py 2009-01-28 19:25:02 +0000
@@ -54,6 +54,9 @@
def readline(self, size=-1):
"""Read the next line."""
+ def iter_body(self):
+ pass
+
def iter_raw_body(self):
pass
@@ -170,6 +173,13 @@
def readline(self, size=-1):
return self._decoder.readline(self, size)
+ def iter_body(self):
+ while True:
+ data = self.recv()
+ if not data:
+ return
+ yield data
+
def iter_raw_body(self):
return self._decoder.iter_raw_body(self)
@@ -226,7 +236,7 @@
self.raw_write('\r\n')
def _write_chunk(self, data):
- self.raw_write('%x\r\n%s\r\n' % (len(data), data))
+ self.raw_write('%x\r\n%s' % (len(data), data))
def _end_chunked_message(self):
self.raw_write('0\r\n\r\n')
=== modified file 'beeseek/network/lowlevel.py'
--- beeseek/network/lowlevel.py 2009-01-17 16:41:25 +0000
+++ beeseek/network/lowlevel.py 2009-01-25 16:32:13 +0000
@@ -222,20 +222,32 @@
def raw_recv(self, size=-1):
if size < 0:
- size = 1024
- data = os.read(self._fileno, size)
+ size = 5120
+ try:
+ data = os.read(self._fileno, size)
+ except IOError:
+ self.closed = True
+ return ''
if not data:
self.closed = True
return data
def raw_read(self, size=-1):
- data = self._rfile.read(size)
+ try:
+ data = self._rfile.read(size)
+ except IOError:
+ self.closed = True
+ return ''
if not data:
self.closed = True
return data
def raw_readline(self, size=-1):
- data = self._rfile.readline(size)
+ try:
+ data = self._rfile.readline(size)
+ except IOError:
+ self.closed = True
+ return ''
if not data:
self.closed = True
return data
=== modified file 'honeybee'
--- honeybee 2009-01-13 13:08:37 +0000
+++ honeybee 2009-01-17 19:09:43 +0000
@@ -23,7 +23,7 @@
if __name__ == '__main__':
try:
- from beeseek.honeybee.session import HoneybeeSession
+ from beeseek.honeybee.main import HoneybeeSession
except ImportError:
print >> sys.stderr, ('%s: ERROR: Cannot find the BeeSeek Library. '
'Please check your installation.\n\n' %
--
BeeSeek mainline
https://code.launchpad.net/~beeseek-devs/beeseek/trunk
Your team BeeSeek Developers is subscribed to branch lp:beeseek.
To unsubscribe from this branch go to https://code.launchpad.net/~beeseek-devs/beeseek/trunk/+edit-subscription.