maria-discuss team mailing list archive

Thread
Date

Re: Why isn't SO_SNDTIMEO used?

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Sun, 21 Mar 2010 22:21:29 +0200
Cc: maria-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87sk7xcgds.fsf@knielsen-hq.org>
Reply-to: monty@xxxxxxxxxxxx

Hi!

>>>>> "Kristian" == Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

<cut>

>> The problem is the following:
>> 
>> The alarm code now makes sure that we don't send the signal if we are
>> not waiting for it;  I may not be safe for the thread to receive the
>> kill signal at any point in time (for example in thread engine code,
>> which we don't want to interrupt).

Kristian> I do not see any problems sending the signal at any time. Of course, there
Kristian> should be an appropriate handler set up (so we do not kill ourselves), but any
Kristian> interuptible system call (like socket read()/write()) should in any case be
Kristian> coded in a way that is safe for EAGAIN interruption. But maybe I did not
Kristian> understand what particular problem you had in mind, not sure what you mean by
Kristian> "thread engine code" and why we do not want to interrupt it.

It depends on how good all the other libraries are that are in used.
For example, assume that we send a signal while a storage engine is
doing a read on a file.  There is a notable change the storage engine
will not do a retry ready in case of interrupts, especially if it
would use some library to do read/writes.   This is because in normal
cases on never gets a signal during read/write no MySQL.

>> The alarm code makes sure that the signal is never missed.  For
>> example, if we would send the signal just before we enter read with
>> SNO_SNDTIMEO, the thread would miss the signal and the 'kill command'
>> would not have any effect.

Kristian> Yes, you are right, this would be prone to races with missed signal.

Kristian> One option might be to call shutdown(2) on the socket, and then send the
Kristian> signal. But this only works for killing the connection, not for just killing a
Kristian> query. So not sure if this is a good idea.

Yes, we can't use shutdown() as we also want to be able to just kill
queries.  The other problem is that if we do a shutdown() we can't
tell the client that we did a 'graceful kill' and it didn't hit a bug.

>> To solve this, we would need to add the following mechanism:
>> 
>> - Add a flag to THD that signals if we are in a read() call on
>> a connection.  This flag should be modified under a mutex to ensure
>> that the 'kill thread-id' code knows if it should send a signal or
>> not.

Kristian> I did not understand why it is important not to send a signal if we are not in
Kristian> read().

Kristian> (Protecting with a mutex seems a bit of a problem, as I think there is no way
Kristian> to atomically unlock the mutex and initiate the read() call?)

The above is needed to ensure that we really get a signal during read
and we don't miss it.

Pseudo code:

Thread1:
get_mutex()
thd->in_read= 1;
release_mutex();
if (!thd->killed)
  read()
get_mutex()
thd->in_read= 0;
release_mutex();

The mutex would of course be a local mutex so there is never a
conflict from this, except if someone wants to send a kill signal.

When sending a kill in thread 2

do
{
  get_mutex();
  in_read= thd->in_read;
  thd->killed= 1;  
  release_mutex();
  if (!in_read)
    break;
  send_kill();
  sleep(1);
}

As you see, we don't need to have the mutex over the read.
We however need to mutex to ensure that we don't miss the kill signal
whatever happens.

In the above code, we may miss the kill signal, but this is ok as we
will retry until thread 2 succeeds to break the read.

Without a mutex, there is a chance that thread 2 will not detect that
thread 1 will do a read and just set the killed flag, while thread 1
may not see the killed flag but instead block in the read.

>> - The kill code should send multiple kill commands to the thread,
>> until the 'read()' flag changes state to 'not in read'.

Kristian> If this is acceptable (looping, sending kill and waiting a bit for the thread
Kristian> to respond), then the race can be solved easily enough this way.

Yes, but you need a mutex to make this fool proof.

Regards,
Monty

References

Why isn't SO_SNDTIMEO used?
From: MARK CALLAGHAN, 2009-08-14
Re: Why isn't SO_SNDTIMEO used?
From: Stewart Smith, 2010-02-16
Re: Why isn't SO_SNDTIMEO used?
From: Michael Widenius, 2010-03-10
Re: Why isn't SO_SNDTIMEO used?
From: MARK CALLAGHAN, 2010-03-16
Re: Why isn't SO_SNDTIMEO used?
From: Kristian Nielsen, 2010-03-17
Re: Why isn't SO_SNDTIMEO used?
From: Michael Widenius, 2010-03-18
Re: Why isn't SO_SNDTIMEO used?
From: Kristian Nielsen, 2010-03-19