← Back to team overview

drizzle-discuss team mailing list archive

Re: Understanding MySQL Internals

 

Rehan,

On 16 Mar 2010, at 00:15, Rehan Iftikhar wrote:

Hi

I was wondering how relevant the content in Understanding MySQL Internals (http://www.amazon.com/Understanding-MySQL-Internals-Sasha-Pachev/dp/0596009577 ) is to drizzle?



Sasha Pachev's book on mySQL internals would be significantly more relevant to Drizzle than a book on Oracle internals.

The Core Drizzzle team are likely going to clam that the two are very different, but they have spent the last year making it different, and working on the differences. So the similarities might be somewhat invisible to them. So I will take a shot at an overview and they can correct me as needed.

I would say that until the Drizzle internals book is written that book would be the closest. I think most would agree that the differences are significant, but still the core of drizzle is a derived work from MySQL. In addition the storage engine concept and compatibility with the major MySQL storage engines such as innodb is in drizzle as well.

I can not find my copy of Sasha Pachev's book or I would go through it and tell you how well it covers the concepts that are similar.

The Drizzle team can add some details but as far as I can tell here are some things that are different and some things that are the same.

Authentication:
	Drizzle is  plugin based PAM and http_auth and others.
MySQL built in authentication of user,host, password at the DB, table and column level. Stored in a database table

Thread management:
I assume that this is similar. MySQL until very recently had a single multi threaded process. A thread was allocated per connection. That thread might be cached when the user disconnected and re-used for an incoming connection. In recent versions a "pool of threads" optimization has been added where a pool of threads is allocated for user connections and those threads are used as needed. I am not sure what code base drizzle started with and how stable pool of threads was anyhow, so what drizzle uses.. I do not know.

Parser:
I have not heard much chatter about the drizzle parser so I assume it is derived from the MySQL parser. I imagine MySQL dual license model caused the MySQL parser to not reuse some open source parser libraries so perhaps the code has been cleaned up.. but I am only guessing.

Optimizer:
I have not heard much chatter here either, so I assume that the optimizer is derived from the MySQL optimizer as well. The drizzle team can correct me if I am wrong.

Replication:
MySQL relied upon what had been a statement based binary log. Meaning that if a statement might have changed data, it was written to a log file at the SQL layer and the slave would replay the statements. Features were added in MySQL 5.1 to instead of logging the statements that may have changed or added rows we instead ask the storage engine for copies of the changes and place those in a "row" based replication log the binary log. Statement based was still supported and it was/is messy. Statement based had some issues, but row based added some issues and some confusion and some bugs. Drizzle, tore all of that out and implemented replication capability based upon google protocol buffers. http://code.google.com/apis/protocolbuffers/docs/overview.html Jay covers the internals fairly well in a series of blog posts here..http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/

It is fair to say that there are significant differences between the two systems regarding replication.

Transaction stuff:
mySQL at the core was not a transactional database, it was made to work somehow with transactional storage engines. I sometimes would think of the MySQL server or the SQL layer as coordinating a group transaction to the underlying storage engines. The relationship was complicated, and it complicated replication somewhat in ways that are a little complicated to go into, but in a purely transactional system the same log and system that is used for transactional consistency and durability can typical be used to assist the replication process. I am not sure what Drizzles statement of intent regarding transactions is. But it is important to note that the MySQL way.. led to a somewhat messy implementation and it seems that Drizzle is hard at work even lately in terms of cleaning that up.

It seems that any Database that allows plugins for Storage engines is going to have to hand off the Durability Requirements to the storage engines so in a rough outline things are somewhat similar, but the differences will be many.

I could picture Drizzle be more transactional and still allowing the storage engines to ignore the transactional stuff. Whereas MySQL was not transaction and forced the storage engines to do extra work in order to be transactional.

If you are looking for an understanding of the Relational Model and how SQL is optimized and how Joins are performed, I found Dan Tow's book SQL tuning http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733 to be helpful. It goes through the concepts of indexes and joins really well.

If I find my copy of Sasha Paschev's book I can give you a better review.



--
Tom  Hanlon








--
-Rehan
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp




Follow ups

References