maria-developers team mailing list archive
Mailing list archive
Re: MySQL / Maria Windows Port
>> The my_pwrite() code now looks like:
>> #if defined(_WIN32)
>> readbytes= my_win_pread(Filedes, Buffer, Count, offset);
>> And there is ew my_winfile.c file.
>> Is that your code?
Jeremiah> Yeah, this is definitely 99% my code. Someone spent the time to clean
Jeremiah> it, normalize it, and add a few fixes and polish, but it's essentially
Jeremiah> the same as what I submitted. Unfortunately, there's no one line
Jeremiah> attribution I was promised for contributing the code. :) But at least
Jeremiah> it's in there, that's a great start.
Sorry about the attribution; I will ensure that when we pull/implement
this to MariaDB we will add it!
>> Could you take a look a the MySQL-6.0 mysys/windows code and see if
>> you think if it's now good enough.
>> If not, do you think we should take the 6.0 code as a base for our
>> tree or should we start with your code?
Jeremiah> The MySQL 6.0 code for Windows file handling is a good starting point,
Jeremiah> but you can see the primary "hack" of my code is the my_open_osfhandle
Jeremiah> function, which is a table specifically for emulating the C style
Jeremiah> integer file handles. If I recall correctly (it has been years) the
Jeremiah> only reasons I had to implement this were:
Jeremiah> As you said, the my_file_info structure, which uses the file descriptor
Jeremiah> itself as an index.
Jeremiah> if ((uint) fd < my_file_limit && my_file_info[fd].type != UNOPEN)
Jeremiah> There you can see the opacity of the file descriptor is violated by
Jeremiah> using it as a bounds check for the array, although that's obviously
Jeremiah> follows using it as an index into the array. When I made the patch my
Jeremiah> understanding was that I needed to keep as many of the changes as
Jeremiah> possible within Windows ifdefs, and that changing the structure that
Jeremiah> contains all of the file handles significantly would be too much change
Jeremiah> to non-Windows code.
Yes, we should to strive to keep most of the code 'similar' to ensure
easy pulls between MySQL versions.
Jeremiah> There are obviously a lot of ways to resolve this. If I were writing it
Jeremiah> from scratch, I wouldn't pass the file descriptor around to different
Jeremiah> functions, I would pass a pointer to the my_file_info structure around
Jeremiah> instead, and all of the mysys file handling code would take a
Jeremiah> my_file_info structure pointer as the file desc parameter. There's no
Jeremiah> need to index into an array if you already have the pointer to the
Jeremiah> struct in question. The code would become portable, and the
Jeremiah> serialization on file open/close (fairly minor) would disappear.
Agree that a much better way would be to move away from int's and have
a structure instead.
It should however not be that hard to do this properly as most of the
MySQL code is using the type 'File' for the file handle.
My old idea was to create a new library, my_file, and copy all the
file handling and file name functions from this to my_file and then
change this to use a pointer to a structure instead of a File object.
The File objects should also contains the file name in parts and one
should be able to manipulate the file name without having the file open.
The changes in the code would mainly be:
- Create the new library.
- Add a separate call to populate the File object with the name parts
- Change all calls to my_open / my_read to my_file_open, my_file_read
etc... (Can be done trivially with one replace command)
- Change error handling to instead of checking if my_file_open returns
-1 see if it returns MY_INVALID_FILE
- Over time, change all file and file name handling functions to use the new
file handling functions.
You can find the original description of this task here:
I have included a halfdone version if this code in this email.
basharteg> If things are
basharteg> being designed in a way that isn't entirely UNIX-centric, it's easy to make
basharteg> MySQL for Windows fly. If we get to the point where we're willing to look
basharteg> at async socket I/O and thread queues / I/O Completion Ports for work
basharteg> dispatch, we can do even more.
>> I don't know much about that, so I would be really thankful for any
>> help I can get in this area.
Jeremiah> It's going to take a fair amount of design work to figure out how to
Jeremiah> adapt MySQL to this kind of model. It could be implemented on all
Jeremiah> platforms with epoll on Linux, kqueue on FreeBSD, and I/O Completion
Jeremiah> Ports on Windows. Event notification I/O is the key to serious
Jeremiah> scalability, but it requires significant design changes, especially when
Jeremiah> you're coming from a 1:1 threading model.
I have always been worried about the fact that if you have other
threads doing IO, you will always get a thread switch + some atomic
operations / mutex when doing any read or write operations.
The 1:1 model makes handling io operations much easier and if things
are buffered then this can be superior compared to having a pool for data.
Also, if all the threads are mixing cpu bound and io operations, this
gives you somewhat balanced load that works reasonable good in many
I understand that pools are superior in some context; I am just not
yet sure when it's best to use a pool and when to use 1:1 threads.
>> In MySQL 6.0 we you can choose at startup if you want to have one
>> thread per connection or a pool of threads to handle all connections.
>> I wrote the patch in such a way that it's < 10 min of works to pull
>> this code back to 5.1 (or MariaDB); Will do this next week.
Jeremiah> This change makes significant progress towards moving to event
Jeremiah> notification I/O. That means you broke the state information up between
Jeremiah> thread and connection and eliminated the bad thread == connection
Yes. The patch is now in lp:maria
basharteg> So in terms of MySQL, it definitely runs with a 1:1 threading model using
basharteg> blocking I/O. While this isn't uncommon for UNIX applications, you'll
basharteg> notice the "hot" UNIX apps that are more interested in the C10K (
basharteg> http://www.kegel.com/c10k.html) target for their scalability are starting to
basharteg> move to async I/O implementations. This has produced some good output like
basharteg> lighttpd (http://www.lighttpd.net/).
>> I haven't yet made up my mind what's the best route for MariaDB in
>> this case; I am waiting for a benchmark or nice code that will make up
>> my mind for me...
The thread goes both ways.
When MySQL was created, it was more common for people to use using
async instead of an 1:1 thread model. As far as people have told me,
after MySQL and as the thread libraries got better a lot of programs
changed to us a 1:1 thread model.
Don't think it has yet been proved what is the best model/mix long term.
A lot depends on the worklog and what the threads are doing.
For example in the MySQL case:
- If all your queries are short and there are very little conflicts
a pool-of-threads version is much better than a 1:1 model.
- If you have a big mix long running queries and short running
queries, a 1:1 thread application is superior to simple pole based
Jeremiah> Yeah, benchmarks are tough too, because serial benchmarks will tend to
Jeremiah> show the overhead involved in asynchronous I/O operations. You have to
Jeremiah> decide that you're developing for the multi-core world and accept the
Jeremiah> minor overhead in exchange for the significant scalability enhancements,
Jeremiah> and design your benchmarks to implement concurrent loads.
See above. What is best is mostly depending one what the threads are
supposed to do. Multi-core is not the most critical part here.
>> Note that the thread pool is still 'beta' quality, in the sense it
>> still have some notable week spots:
>> - If all threads are working or hang (in lock tables) then no new
>> connections can be established, one can't examine what is going on
>> and one can't kill offending threads (as one can't send in new queries)
>> The easyest fix would be to have another port where one could connect
>> with the 1:1 model and check what's going on.
I added this to the MariaDB thread-pool implementation that is now pushed.
Jeremiah> Having an administrative connection is something MySQL has needed for a
Jeremiah> very long time anyway. However, as someone who uses thread pools
Jeremiah> extensively on a daily basis, you can handle the cases of thread pool
Jeremiah> exhaustion in several ways, including safety timer events, timeouts on
Jeremiah> lock waits, event based notification on lock acquisition (may be
This is not easy when you don't want to abort any of the running
As the lock between threads can bappen very low, for example inside a
storage engine for which you don't have any control, none of the above
solutions are easy or even possible to implement.
Jeremiah> But as someone who runs a fairly large MySQL system for a
Jeremiah> telephony company, I can tell you that the 1:1 threading model doesn't
Jeremiah> save you from thread exhaustion any more than having a thread pool
Jeremiah> would. When we get threads stuck waiting for locks our servers can
Jeremiah> easily become thread exhausted with no way to kill the threads.
As long as you limit max_connections under the number of threads your
system can handle, you should always be able to connect as a SUPER
user and fix things.
Jeremiah> Besides, in a good design, you shouldn't have just one thread pool. The
Jeremiah> thread pool that handles I/O events (like accepting new connections)
Jeremiah> should be different from the worker thread pool that executes SQL
In the current thread pool implementation, the threads are waiting in
the pool for any request on connection port; In this case the
connection/login handling and reading commands from the user are both
Jeremiah> In the Windows async I/O world, we consider connections to be relatively
Jeremiah> inexpensive. I don't set connection limits in my applications. Each
Jeremiah> connection is a little bit of non-paged pool kernel memory, plus the
Jeremiah> little struct I use to keep the connection state information, plus one
Jeremiah> async I/O read pending (among thousands).
You also need to keep the connection state, which in MySQL means the
THD object. This is a semi-expensive object (10 K)
Jeremiah> But yes, one of the basics of our design is having at least two thread
Jeremiah> pools, one for socket I/O (at least so far as new connections, posting
Jeremiah> reads, and receiving queries) and one for SQL statements. The socket
Jeremiah> I/O thread pool should not be something that deadlocks, since whatever
Jeremiah> locking is used, it would be far more deterministic than the locking
Jeremiah> involved in executing SQL statements.
In the current implementation, we have only one pool and in MySQL:s
case it may be enough.
The reason for this is that when the thread gets a signal that there
is data to be read, it will have all data it needs in the buffer (as
the client commands are mostly very short and if they are not short,
the client is working full time to send it as they are always send
This means that there is in practice no deadlocks when it comes to
reading data and you win the context switch from reading data and then
giving it to a worker thread.
Jeremiah> Another thing to throw out there since I'm already being terribly
Jeremiah> verbose is, each query should have a timeout value, and when that
Jeremiah> timeout value is exceeded, the query should terminate. One of the worst
Jeremiah> parts about working in MySQL is the idea of a stuck query, especially
Jeremiah> when you need a separate connection to kill it. In systems like
Jeremiah> Microsoft SQL, the default query timeout is 2 minutes, and if you need
Jeremiah> to exceed that or you don't want a timeout, you can change it as part of
Jeremiah> the connection settings. Runaway queries running indefinitely doesn't
Jeremiah> make any practical sense.
This agains depends on the application. Most applications need to do
long statistical queries from time to time and you don't want to kill
Same thing with any queries that does critical updates or moves things
around (like ALTER TABLE, REPAIR etc).
In other words; I agree we need a timeout mechanism. I am however not
sure what should be the best default value for this.
That said, Guilhem did once implement a timeout for queries but this
was unfortunately never pushed becasue it changed the code too much at
the time when the patch was made and we where close to GA.
We should dig out this patch and fix it for the current code and add it.
Jeremiah> Also, when we get async I/O in place, you
Jeremiah> could eventually add an extension to the protocol that allows the client
Jeremiah> to cancel the query in progress. Since the incoming I/Os are handled by
Jeremiah> I/O threads rather than the SQL threads, you can still receive commands
Jeremiah> while the query is in progress. So even as you're executing a query for
Jeremiah> someone, you have an async read on the socket waiting for the next
Jeremiah> command. If the next command is received while the query is still in
Jeremiah> progress, you check to see if it's a cancel command, or simply a
Jeremiah> pipelined SQL query. If it's a cancel command, you flag your SQL thread
Jeremiah> to stop when it can, and if it's a pipelined SQL query they're probably
Jeremiah> violating the MySQL protocol anyway since they need to wait for a
Jeremiah> response to the previous query, but if we allowed pipelined queries some
Jeremiah> day in the future, they've lost their right to cancel this query and you
Jeremiah> simply queue the SQL query until the current query is finished. In the
Jeremiah> far future, multiple queries on the same connection are fine, and are
Jeremiah> given query identifiers and all handled through async I/O, and all
Jeremiah> queries are cancellable.
Now you can achieve the same thing by just sending the kill one another
connection (could for example be used through the extra-port).
Drizzle has done some work in the direction of sending many queries on
the same connection. We should look at this when their implementation
I am still a bit sceptical of having two thread pools; My major worry
is that it will for normal users slow down things instead of giving a
speed increase (because of the extra context switch and mutex/atomic
operations to shuffle things around)
We need to find someone that can write a prototype and show the
benefits trough some agreed to benchmark...
basharteg> We need to find the points in there where I/Os are taking place that can be
basharteg> adapted to an async model. Basically, the disk reads that are happening to
basharteg> satisfy the query, the network reads to get the query, the network writes to
basharteg> send the results, and the network waits for the next query. All of those
basharteg> things need to happen without a thread being associated with this
We need a proof of concept and benchmark of this...
>> For mysqld this isn't trivial as we can't easily change the code for
>> the engines.
>> What I don't like with the async model is that for reads on disk it
>> doesn't speed up things as much and it may even slow down thing as you
>> now have two threads communication and the original is just waiting
>> for the other to take the message, execute it, and tell the other that
>> it's now ready.
Jeremiah> Well, you can avoid async I/O for disk I/O if you think it's going to be
Jeremiah> a performance negative. On single or few drive SATA systems, you're
Jeremiah> right. On high end I/O subsystems, async I/O can definitely have a
Jeremiah> throughput benefit, especially when you have an advanced storage
Jeremiah> subsystem driver that combines operations. As we get to Solid State
Jeremiah> Disks in the future, it's going to be the same thing as with the
Jeremiah> multi-core CPUs we're adapting to. If you're not doing multiple things
Jeremiah> at once, your performance is going to be weak. For now though, you can
Jeremiah> leave out async disk I/O, but async socket I/O is the only way to get up
Jeremiah> to tens of thousands of connections and beyond. You shouldn't be afraid
Jeremiah> of the overhead of using multiple threads and multiple thread pools. If
Jeremiah> done properly, the penalty to serial operations is minimal, and the
Jeremiah> performance gains are significant (in addition to the robustness
Jeremiah> enhancements previously discussed).
The pool-of-threads code we have now already gives us 100,000 of
But I am still afread of splitting the current thread pool to 2 parts...
>> What would be the advantages of using Apache portable runtime?
>> Should this be instead of mysys ?
>> The benefits of using mysys are:
>> - Well working and integrated with all MySQL parts
>> - Well integrated with dbug statements
>> - Provides a lot of extra features, like hiding windows thread
Jeremiah> I can understand moving to apr from mysys might not be feasible, I just
Jeremiah> bring it up because they've abstracted async and event driven I/O and a
Jeremiah> lot of other platform differences very well. It works great on
Jeremiah> Windows, FreeBSD, and Linux. Probably not something we can go for
Jeremiah> right now, but they may have some useful designs and hints.
I am happy to take any good ideas from their libraires and add them to
ours and even move some of our code in their direction (if there is a
clear gain for this).
Description: Binary data