← Back to team overview

drizzle-discuss team mailing list archive

Re: VOTE PLEASE - Drizzle Replication - Group ID vs. Global Transaction ID

 

So, this issue is more complex than my original email described. I chatted with Mark a bit yesterday about the Google patches, and what Mark describes as "unique to the scope" is correct: in Google's version of global transaction identifiers, the identifier for a group of events in the binlog (which is where the term "group" comes from) is unique within the set of nodes serving a specific set of tables/schemas.

In other words, there is a consistent hierarchical relationship enforced in the Google group_id replication system which ensures there is always a single master for a slave.

In Drizzle's replication system, this restriction does not exist. Multi-source replication is perfectly acceptable and this introduces a larger "scope" in which this identifier must be unique.

Everyone interested in this should fully read the Google FAQ linked in the original mailing list post and pay attention to the parts where Justin writes about possible solutions for multi-source/multi-master replication.

This is basically where I stand right now but I'm going to use the holidays to think and put on the wiki more ideas...:

1) Decide on a tuple format that Drizzle will use internally for the global identifier for a Transaction message. This could be:

(server_id, group_id)

or

(server_id, timestamp, other_identifier)

or

UUID

or something completely different...


2) Focus on the interfaces

Standardize the interface where logging mechanisms and replication plugins can ask a publisher for a global identifier representing its last consistent state.

Standardize the interface where a plugin can map Drizzle's internal global identifier to its own type of global identifier. For instance, let's say Drizzle's global identifier type is defined as:

typedef uint32_t ServerId;
typedef uint64_t GroupId;
typedef std::pair<ServerId, GroupId> GlobalTransactionId;

however Tungsten's replication system uses a UUID as it's global transaction identifier. There needs to be an interface for translating/mapping a value of one type to the other...

Anyway, like I said, over the holidays I'll be working on putting all of these disparate thoughts onto the Drizzle wiki. I'll post to the mailing list when I have a good, clean wiki describing the problems and possible interfaces and solutions.

Thanks!

Jay

Jobin Augustine wrote:
if i get it right. Replacing a "globally unique id" with a "local id" is a good move.

my vote is for you.
++

why because: it is futuristic..
Eric Day had a blog post regarding eventually consistent databases.
even if drizzle is hard consistent inside..it may not be true if we think about geographically distributed databases (say, many independent Drizzles instances) talking to each other. In a highly distributed environment, eventual consistency is something unavoidable. and in my humble opinion globally unique transaction id is not making much meaning and this move is in a right direction.

by the way. the name "group id" is again confusing. automatically the question comes "group of what?".
any better name for it?

Thank you,
Jobin.


On Wed, Dec 23, 2009 at 8:46 PM, Jay Pipes <Jay.Pipes@xxxxxxx <mailto:Jay.Pipes@xxxxxxx>> wrote:

    Hi all,

    I'd like to get some consensus votes to solidify the terminology
    around something that is soon to hit Drizzle's replication system:

    A way to uniquely identify a specific Transaction in a global
    replication environment.

    There are two different sets of terms in use regarding the above
    functionality, and I'd like to be able to settle on one set or the
    other.

    Indeed, if one looks at Google's implementation of the above
    functionality for MySQL 5.0, the terms "group id" and "global
    transaction ID" seem to be freely intermingled.  Even the URL and
    title of the Google FAQ on the subject/patch have contradicting terms!:

    http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds

    Note the URL says "Global Transaction IDs" and the page title says
    "Global Group IDs".  Very confusing to me.  Anyone else?

    I'd like to settle this confusion and just start referring to this
    functionality by a single term: "Group ID"

    The reason is that the group ID is actually *not* a global
    identifier. The global identifier is actually the server ID *plus*
    the group ID, and therefore referring to the group ID as the global
    transaction ID is a bit of a misnomer.

    I would like to change the TransactionContext message format from this:

    message TransactionContext
    {
     required uint32 server_id = 1; /* Unique identifier of a server */
     required uint64 transaction_id = 2;/*Globally-unique transaction ID */
     required uint64 start_timestamp = 3;
     required uint64 end_timestamp = 4;
    }

    to this:

    message TransactionContext
    {
     required uint32 server_id = 1; /* Unique identifier of a server */
     required uint64 group_id = 2;/* Unique ID of trx on this server */
     required uint64 start_timestamp = 3;
     required uint64 end_timestamp = 4;
    }

    Please let me know if this is OK with folks.  Thanks!

    Jay


    _______________________________________________
    Mailing list: https://launchpad.net/~drizzle-discuss
    <https://launchpad.net/%7Edrizzle-discuss>
    Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
    <mailto:drizzle-discuss@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe : https://launchpad.net/~drizzle-discuss
    <https://launchpad.net/%7Edrizzle-discuss>
    More help   : https://help.launchpad.net/ListHelp





References