← Back to team overview

maria-developers team mailing list archive

Re: Questions re MDEV-4736 and MDEV-4739 (was Re: Spider's installation sql file)

 

Hi Kentoku,

I just reviewed one of your revisions, specifically
bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/

I believe things are a bit more complex: 2PC protocol doesn't seem to permit
cohorts to fail during commit phase:
http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase

<quot>
If the coordinator received an agreement message from all cohorts during the
commit-request phase:
  1. The coordinator sends a commit message to all the cohorts.
  2. Each cohort completes the operation, and releases all the locks and
     resources held during the transaction.
  3. Each cohort sends an acknowledgment to the coordinator.
  4. The coordinator completes the transaction when all acknowledgments have
     been received.
</quot>

I read the above as: the only problem coordinator may experience is missing
acknowledgement. What shall coordinator do if some cohorts acknowledged
commit, but some did not? Probably spider should detect it earlier?

Sergei, what's your opinion?

Regards,
Sergey

On Mon, Sep 30, 2013 at 05:45:13AM +0900, kentoku wrote:
> Hi Sergey,
> 
> Thank you for your information. I could reproduce. I tried to fix it and
> pushed it.
> 
> Thanks,
> Kentoku
> 
> 
> 2013/9/27 Sergey Vojtovich <svoj@xxxxxxxxxxx>
> 
> > Hi Kentoku,
> >
> > BUILD/compile-amd64-debug-max
> > cd mysql-test
> > cat > t/AAA.test
> > --source include/have_innodb.inc
> >
> > install soname 'ha_spider.so';
> >
> > --connection default
> > eval CREATE TABLE t1 (a INT) ENGINE=InnoDB;
> >
> > --connect (con1,localhost,root,,)
> > XA START 'xa1';
> > INSERT INTO t1 (a) VALUES (1),(2);
> > XA END 'xa1';
> > XA PREPARE 'xa1';
> >
> > --connection default
> > --enable_reconnect
> > --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> > restart
> > EOF
> > --shutdown_server 0
> > --source include/wait_until_disconnected.inc
> > --source include/wait_until_connected_again.inc
> > XA RECOVER;
> > XA COMMIT 'xa1';
> > --End of file
> > ./mtr AAA
> >
> > Regards,
> > Sergey
> >
> > On Fri, Sep 27, 2013 at 11:21:01PM +0900, kentoku wrote:
> > > Hi Sergey,
> > >
> > > > > > The above should have been fixed back in the beginning of 2012.
> > Which
> > > > > MariaDB
> > > > > revision are you testing with?
> > > > > It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0.
> > > > That's very strange. How did you build it and what command line options
> > > did
> > > > you use (including those that are listed in my.cnf)?
> > > O.K. I send you later.
> > >
> > > > > By the way, I found a problem point in Spider. Currently it is fixed
> > and
> > > > > pushed. And I can't reproduce assertion failure. Can you reproduce
> > this
> > > > > failure yet?
> > > > I just tested 10.0.4-spider-3.0 with rev.3827, it still fails.
> > > Could you please tell me about build options and command line options
> > that
> > > you used?
> > >
> > > Thanks,
> > > Kentoku
> > >
> > >
> > >
> > > 2013/9/27 Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > >
> > > > Hi Kentoku,
> > > >
> > > > On Wed, Sep 25, 2013 at 02:53:17AM +0900, kentoku wrote:
> > > > > Hi Sergey,
> > > > >
> > > > > > The above should have been fixed back in the beginning of 2012.
> > Which
> > > > > MariaDB
> > > > > revision are you testing with?
> > > > > It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0.
> > > > That's very strange. How did you build it and what command line
> > options did
> > > > you use (including those that are listed in my.cnf)?
> > > >
> > > > > By the way, I found a problem point in Spider. Currently it is fixed
> > and
> > > > > pushed. And I can't reproduce assertion failure. Can you reproduce
> > this
> > > > > failure yet?
> > > > I just tested 10.0.4-spider-3.0 with rev.3827, it still fails.
> > > >
> > > > Regards,
> > > > Sergey
> > > >
> > > > >
> > > > > Thanks,
> > > > > Kentoku
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2013/9/24 Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > > > >
> > > > > > Hi Kentoku,
> > > > > >
> > > > > > On Thu, Sep 19, 2013 at 11:47:33PM +0900, kentoku wrote:
> > > > > > > Hi Sergey,
> > > > > > >
> > > > > > > > I'm afraid fixing rnd_end() callers in the server may stall
> > for a
> > > > long
> > > > > > > time.
> > > > > > > O.K. No need to fix it if it is not easy. This request is not
> > high
> > > > > > priority
> > > > > > > request.
> > > > > > >
> > > > > > > > Is it accaptable for spider to use bulk updates and deletes API
> > > > instead
> > > > > > > > (see handler.h: start_bulk_update/start_bulk_delete).
> > > > > > > Spider use it already. This API is not used if target table has
> > after
> > > > > > > trigger. I understand why this API is not used in this case, but
> > it
> > > > > > > sometimes causes performance problem. So, I thought it is better
> > to
> > > > > > prepare
> > > > > > > other choice for something wrong. Anyway, I will disable bulk
> > > > > > > updating/deleting feature without using API.
> > > > > > I see. Hope it is acceptable.
> > > > > >
> > > > > > >
> > > > > > > > Nope, it looks like a bug in thread pool. MDEV-4739 has
> > different
> > > > > > trace,
> > > > > > > how
> > > > > > > did you get this one? Just executed given test?
> > > > > > > I got this trace when I try to reproduce MDEV-4739. I did as the
> > > > > > followings.
> > > > > > >
> > > > > > > 1. start mysqld
> > > > > > > 2. log in mysqld
> > > > > > > 3. mysql> CREATE TABLE t1 (a INT) ENGINE=InnoDB;
> > > > > > > 4. mysql> XA START 'xa1';
> > > > > > > 5. mysql> INSERT INTO t1 (a) VALUES (1),(2);
> > > > > > > 6. mysql> XA END 'xa1';
> > > > > > > 7. mysql> XA PREPARE 'xa1';
> > > > > > > 8. kill -9 mysqld_safe and mysqld from another terminal
> > > > > > > 9. start mysqld on gdb
> > > > > > >
> > > > > > > At that time, InnoDB and Spider were enabled and log-bin was
> > > > disabled. So
> > > > > > > probably "total_ha_2pc > 1" was true, "opt_bin_log" was false.
> > > > > > > Does it help you?
> > > > > > We couldn't reproduce it yet. :(
> > > > > >
> > > > > > Looking through the code I noticed that call to thd_wait_begin()
> > looks
> > > > as
> > > > > > following:
> > > > > >
> > > > > > static void scheduler_wait_sync_begin(void) {
> > > > > >   thd_wait_begin(NULL, THD_WAIT_SYNC);
> > > > > > }
> > > > > >
> > > > > > Note that thd is always NULL. And it must be NULL at this point,
> > > > because
> > > > > > we're
> > > > > > booting. But according to your trace thd is not NULL.
> > > > > >
> > > > > > #0  0x00000000005eabf6 in thd_wait_begin (
> > > > > >     thd=0x29da060, wait_type=10)
> > > > > >     at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
> > > > > > #1  0x000000000072a114 in scheduler_wait_sync_begin ()
> > > > > >     at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59
> > > > > > ...
> > > > > >
> > > > > > The above should have been fixed back in the beginning of 2012.
> > Which
> > > > > > MariaDB
> > > > > > revision are you testing with?
> > > > > >
> > > > > > Thanks,
> > > > > > Sergey
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kentoku
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2013/9/19 Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > > > > > >
> > > > > > > > Hi Kentoku,
> > > > > > > >
> > > > > > > > I'm adding MariaDB developers to CC.
> > > > > > > >
> > > > > > > > On Thu, Sep 19, 2013 at 01:19:13AM +0900, kentoku wrote:
> > > > > > > > > Hi Sergey,
> > > > > > > > >
> > > > > > > > > > But what kind of errors are possible in your case? Other
> > > > storage
> > > > > > > > engines
> > > > > > > > > doesn't
> > > > > > > > > seem to suffer from this API violation.
> > > > > > > > >
> > > > > > > > > Spider support bulk updating and deleting for avoiding
> > network
> > > > > > roundtrip
> > > > > > > > > between data node. Some times, last bulk updating is
> > executed in
> > > > > > > > rnd_end()
> > > > > > > > > function. So rnd_end() has possibility getting errors from
> > data
> > > > node.
> > > > > > > > I'm afraid fixing rnd_end() callers in the server may stall
> > for a
> > > > long
> > > > > > > > time.
> > > > > > > > Is it accaptable for spider to use bulk updates and deletes API
> > > > instead
> > > > > > > > (see handler.h: start_bulk_update/start_bulk_delete).
> > > > > > > >
> > > > > > > > > By the way, about MDEV-4739. I get the following stack trace.
> > > > > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > > > > 0x00000000005eabf6 in thd_wait_begin (thd=0x29da060,
> > > > > > > > >     wait_type=10)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
> > > > > > > > > 4277      MYSQL_CALLBACK(thd->scheduler, thd_wait_begin,
> > (thd,
> > > > > > > > wait_type));
> > > > > > > > > (gdb) print thd
> > > > > > > > > $1 = (THD *) 0x29da060
> > > > > > > > > (gdb) bt
> > > > > > > > > #0  0x00000000005eabf6 in thd_wait_begin (
> > > > > > > > >     thd=0x29da060, wait_type=10)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
> > > > > > > > > #1  0x000000000072a114 in scheduler_wait_sync_begin ()
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59
> > > > > > > > > #2  0x0000000000d6dc20 in my_sync (fd=23, my_flags=0)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/mysys/my_sync.c:76
> > > > > > > > > #3  0x0000000000d6b54f in my_msync (fd=23,
> > > > > > > > >     addr=0x7ffff7ff4000, len=4096, flags=4)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/mysys/my_mmap.c:27
> > > > > > > > > #4  0x00000000008bea03 in TC_LOG_MMAP::open (
> > > > > > > > >     this=0x16e6a00, opt_name=0xe19c87 "tc.log")
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/log.cc:7735
> > > > > > > > > #5  0x00000000005751cb in init_server_components ()
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/mysqld.cc:4797
> > > > > > > > > #6  0x0000000000575a07 in mysqld_main (argc=30,
> > > > > > > > >     argv=0x1f204d0)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/mysqld.cc:5208
> > > > > > > > > #7  0x000000000056d884 in main (argc=11,
> > > > > > > > >     argv=0x7fffffffe3a8)
> > > > > > > > >     at /ssd1/mariadb-10.0.4/sql/main.cc:25
> > > > > > > > > (gdb) print thd->scheduler
> > > > > > > > > $2 = (scheduler_functions *) 0x8f8f8f8f8f8f8f8f
> > > > > > > > > (gdb) print thd_wait_begin
> > > > > > > > > $3 = {void (THD *,
> > > > > > > > >     int)} 0x5eaba4 <thd_wait_begin(THD*, int)>
> > > > > > > > > (gdb) print wait_type
> > > > > > > > > $4 = 10
> > > > > > > > >
> > > > > > > > > It is looks that "thd->scheduler" is not initialized. What
> > do you
> > > > > > think?
> > > > > > > > > Must storage engine set it?
> > > > > > > > Nope, it looks like a bug in thread pool. MDEV-4739 has
> > different
> > > > > > trace,
> > > > > > > > how
> > > > > > > > did you get this one? Just executed given test?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Sergey
> > > > > > > >
> > > > > >
> > > >
> >


Follow ups

References