maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #01656
Re: [Merge] lp:~paul-mccullagh/maria/maria-pbxt-rc3 into lp:maria
Hi Sergei,
I have tested this, and it all works as you specified, but now I have
found a problem with PBXT.
The problem stems from code in PBXT (in functions ha_pbxt::write_row,
ha_pbxt::update_row, ha_pbxt::delete_row) which I have pasted below.
The code is a hack!
According to the comment I had a problem with calling:
trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton), to register the
start of a transaction statement.
When I did this unconditionally at the start of a statement,
ha_commit_trans() was not always called (and MySQL became confused,
thinking a transaction is still running). On reading the MySQL code
(long time ago) I noticed that ha_commit_trans() was only always
called when an update was performed.
So I delayed the trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton)
call until write_row(), etc was called.
The problem now is that MySQL now fails to recognize PBXT transactions
as XA (see ha_write_row below), because mark_trx_read_write() is
called before write_row() (i.e. before trans_register_ha() call).
I noticed that InnoDB and NDB call trans_register_ha() unconditionally
at the start of a statement.
Could it be that the problem that lead to my "hack" has been fixed?
Best regards,
Paul
PBXT code in ha_pbxt::write_row, etc.
------------------------------------------------------------
/* GOTCHA: I have a huge problem with the transaction statement.
* It is not ALWAYS committed (I mean ha_commit_trans() is
* not always called - for example in SELECT).
*
* If I call trans_register_ha() but ha_commit_trans() is not called
* then MySQL thinks a transaction is still running (while
* I have committed the auto-transaction in ha_pbxt::external_lock()).
*
* This causes all kinds of problems, like transactions
* are killed when they should not be.
*
* To prevent this, I only inform MySQL that a transaction
* has beens started when an update is performed. I have determined
that
* ha_commit_trans() is only guarenteed to be called if an update is
done.
*/
if (!pb_open_tab->ot_thread->st_stat_trans) {
trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton);
XT_PRINT0(pb_open_tab->ot_thread, "ha_pbxt::write_row
trans_register_ha all=FALSE\n");
pb_open_tab->ot_thread->st_stat_trans = TRUE;
}
-------------------------------
int handler::ha_write_row(uchar *buf)
{
int error;
Log_func *log_func=
Write_rows_log_event::binlog_row_logging_function;
DBUG_ENTER("handler::ha_write_row");
mark_trx_read_write();
if (unlikely(error= write_row(buf)))
DBUG_RETURN(error);
if (unlikely(error= binlog_log_row(table, 0, buf, log_func)))
DBUG_RETURN(error); /* purecov: inspected */
DBUG_RETURN(0);
}
On Dec 2, 2009, at 12:26 PM, Sergei Golubchik wrote:
Hi, Paul!
On Dec 02, Paul McCullagh wrote:
Now, if a binlog doesn't indicate a crash (last binlog file was
closed properly, showing a normal shutdown procedure) or, for
example, there're no binlog files at all, the server cannot perform
the recovery - it doesn't know what transactions should be committed
and what transactions should be rolled back. A transaction may be
prepared in one engine and already committed (or rolled back) in
another. In this case the server requests the user to make the
decision, the user has to request an explicit commit or rollback
with
--tc-heuristic-recover command-line switch.
So if I understand this correctly, tc-heuristic-recover handles a
situation that should actually never occur.
Yes, that's mainly to account for a user mistake - deleting binlog
after
a crash or copying the live datadir from master to a slave without
binlog. Or something similar.
But if there's only one XA-capable storage engine, it's always
safe to
rollback all prepared transactions, there can be no other engine
that
has them committed. That's what the ifdef-ed code does - if InnoDB
is
the only XA-capable storage engine on recovery without binlog it
forces
a rollback of all not committed transactions, preserving the pre-XA
InnoDB behavior.
So, in fact, we could change this code:
to:
if (total_ha_2pc == (ulong) opt_bin_log+1) {
tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing
ROLLBACK
info.dry_run=FALSE;
}
I intentionally did it with ifdef - the idea was that if there could
*possibly* be another XA-engine, we need to go the safe way. Otherwise
one could, for example, restart the server with disabled pbxt and
unknowingly enable auto-rollback mode, causing inconsistent data.
Regards / Mit vielen Grüßen,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@xxxxxxx>
/ /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server
Architect
/_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München
161028
<___/ Sonnenallee 1, 85551 Kirchheim-
Heimstetten
Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring
--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com
References