dulwich-users team mailing list archive

Thread
Date

Re: Bug report for dulwich

To: Sebastian Egner <sebastian_egner@xxxxxxxxx>
From: Jelmer Vernooij <jelmer@xxxxxxxxx>
Date: Fri, 12 Nov 2010 14:49:21 +0100
Cc: dulwich-users <dulwich-users@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4CDD175B.8040105@yahoo.com>

On Fri, 2010-11-12 at 11:30 +0100, Sebastian Egner wrote:
> On 11.11.10 18:03, Jelmer Vernooij wrote:
> > On Thu, 2010-11-11 at 16:49 +0100, Sebastian Egner wrote:
> >> On 11.11.10 15:20, Jelmer Vernooij wrote:
> >>> We'd need an alternative way to determine whether there's data available
> >>> from a subprocess. I'm not sure what the best way to do that is on
> >>> Windows. The subprocess module doesn't provide any hints about this.
> >> Your are right, of course. I checked: The object proc.stdout will be a
> >> Python 'file' object on Windows---and these don't come with a method of
> >> polling, unless the entire file is switched into asynchronous mode.
> > Switching the file to some sort of non-blocking mode should be fine, if
> > that gives you a way to poll whether there is data available.
> Unfortuntately, there is no commonly known way to do this in Python on
> Windows.
> http://stackoverflow.com/questions/375427/non-blocking-read-on-a-stream-in-python
> (Note that the 'fcntl' module is only available on the Unix platforms.)
> 
> >> However, from the source code I got the impression that it is sufficient
> >> to detect EOF because can_read() only checks if some bytes are waiting
> >> to be read, so the subsequent read() can still block on the second byte
> >> of the stream. This means, we can probably get away with a simple
> >> read-ahead mechanism: can_read() does a blocking read(1) and stores it,
> >> and the next Procotol.read(n) returns the stored byte appended with
> >> read(n-1).
> > This is not a file but a stream. Not being able to read from it does not
> > necessarily imply EOF. Attempting to read a single byte will mean
> > blocking until the other side provides that byte, while we could be
> > spending that time crunching data. The sYour patch adds econd byte could indeed block as
> > well, but because of the way the protocol works this will not be the
> > case (a packet is always at least 4 bytes).

> (2) I had traced the actual calls to SubprocessWrapper.{read,can_read}
> in a real example. It turned out that can_read() is called very
> infrequently. Most of the time, Dulwich simply calls read(n) without
> checking can_read() first---so the read(n) also blocks in the Unix
> version. (Or am I missing something?)
How often can_read() is called depends on the local and remote
repository and how much they are out of sync, it depends on the speed of
the remote server and on the latency.

> > Can you please attach diffs instead? They're much easier to read and
> > review.
> Attached.
Thanks.

> Please tell me what you intend to do with my suggestion, or what more I
> can do to help. I am in the process of selecting a Git toolchain for me
> and my Windows dev team. Hg/Hg-Git looked promising until I found out
> that it neither supports https transport nor git+ssh.
I agree that this is a genuine bug, and I'd like to see it fixed.
However, this patch is a hack, and I think it should be fixed properly.
There must be some way to poll whether there's data available, even if
it would just be by having a separate thread do the reading.

Alternatively, we could use paramiko instead of talking with a local ssh
process using subprocess.

There's a bug report here, somebody else ran into the same issue:

https://bugs.launchpad.net/dulwich/+bug/670035 

Cheers,

Jelmer

Attachment: signature.asc
Description: This is a digitally signed message part

References

Re: Bug report for dulwich
From: Jelmer Vernooij, 2010-11-11