drizzle-discuss team mailing list archive
Mailing list archive
Re: Understanding MySQL Internals
On 16 Mar 2010, at 00:15, Rehan Iftikhar wrote:
I was wondering how relevant the content in Understanding MySQL
) is to drizzle?
Sasha Pachev's book on mySQL internals would be significantly more
relevant to Drizzle than a book on Oracle internals.
The Core Drizzzle team are likely going to clam that the two are very
different, but they have spent the last year making it different, and
working on the differences. So the similarities might be somewhat
invisible to them. So I will take a shot at an overview and they can
correct me as needed.
I would say that until the Drizzle internals book is written that
book would be the closest. I think most would agree that the
differences are significant, but still the core of drizzle is a
derived work from MySQL. In addition the storage engine concept and
compatibility with the major MySQL storage engines such as innodb is
in drizzle as well.
I can not find my copy of Sasha Pachev's book or I would go through it
and tell you how well it covers the concepts that are similar.
The Drizzle team can add some details but as far as I can tell here
are some things that are different and some things that are the same.
Drizzle is plugin based PAM and http_auth and others.
MySQL built in authentication of user,host, password at the DB, table
and column level. Stored in a database table
I assume that this is similar. MySQL until very recently had a single
multi threaded process. A thread was allocated per connection. That
thread might be cached when the user disconnected and re-used for an
incoming connection. In recent versions a "pool of threads"
optimization has been added where a pool of threads is allocated for
user connections and those threads are used as needed. I am not sure
what code base drizzle started with and how stable pool of threads was
anyhow, so what drizzle uses.. I do not know.
I have not heard much chatter about the drizzle parser so I assume it
is derived from the MySQL parser. I imagine MySQL dual license model
caused the MySQL parser to not reuse some open source parser libraries
so perhaps the code has been cleaned up.. but I am only guessing.
I have not heard much chatter here either, so I assume that the
optimizer is derived from the MySQL optimizer as well. The drizzle
team can correct me if I am wrong.
MySQL relied upon what had been a statement based binary log. Meaning
that if a statement might have changed data, it was written to a log
file at the SQL layer and the slave would replay the statements.
Features were added in MySQL 5.1 to instead of logging the statements
that may have changed or added rows we instead ask the storage engine
for copies of the changes and place those in a "row" based replication
log the binary log. Statement based was still supported and it was/is
messy. Statement based had some issues, but row based added some
issues and some confusion and some bugs.
Drizzle, tore all of that out and implemented replication capability
based upon google protocol buffers. http://code.google.com/apis/protocolbuffers/docs/overview.html
Jay covers the internals fairly well in a series of blog posts here..http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/
It is fair to say that there are significant differences between the
two systems regarding replication.
mySQL at the core was not a transactional database, it was made to
work somehow with transactional storage engines. I sometimes would
think of the MySQL server or the SQL layer as coordinating a group
transaction to the underlying storage engines. The relationship was
complicated, and it complicated replication somewhat in ways that are
a little complicated to go into, but in a purely transactional system
the same log and system that is used for transactional consistency and
durability can typical be used to assist the replication process. I am
not sure what Drizzles statement of intent regarding transactions is.
But it is important to note that the MySQL way.. led to a somewhat
messy implementation and it seems that Drizzle is hard at work even
lately in terms of cleaning that up.
It seems that any Database that allows plugins for Storage engines is
going to have to hand off the Durability Requirements to the storage
engines so in a rough outline things are somewhat similar, but the
differences will be many.
I could picture Drizzle be more transactional and still allowing the
storage engines to ignore the transactional stuff. Whereas MySQL was
not transaction and forced the storage engines to do extra work in
order to be transactional.
If you are looking for an understanding of the Relational Model and
how SQL is optimized and how Joins are performed, I found Dan Tow's
book SQL tuning http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733
to be helpful. It goes through the concepts of indexes and joins
If I find my copy of Sasha Paschev's book I can give you a better
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp