← Back to team overview

maria-discuss team mailing list archive

Re: __转发:failed_to_connect_mariadb_at_addition_node_in_MariaDB_galera_cluster

 

Hi,

On Fri, Sep 2, 2016 at 11:50 PM, 西门吹牛 <zh1029@xxxxxxxx> wrote:

> Hi,
>   Thanks for the reponse.
>   No, I started mysqld on the second node via comamd "mysqld
> --defaults-extra-file=mdb.my.cnf --debug".
>

So the below log is from the first node? As I see following:

> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster'
option used, bootstrapping the cluster



>   The problem is permanent.
>   No abnormal output seen from the console. I feel mariadb may stuck
> somewher. from the trace log(attached mdb.mysqld.trace). it looks stuck at
> wsrep_replication_process. It may wait for some message from primary node.
> but I have no ieda what it waits for.
>

Apparently, its waiting for SST, as indicated by last few lines from the
attached trace:

> T@4    : | | | | enter: buffer: WSREP: Gap in state sequence. Need state
transfer.

You can also use --wsrep-debug=ON to collect more details on both the nodes.

Thanks,
Nirbhay



>
> --------------------------------
>
>
> ----- 原始邮件 -----
> 发件人:Nirbhay Choubey <nirbhay@xxxxxxxxxxx>
> 收件人:zh1029@xxxxxxxx
> 抄送人:maria-discuss <maria-discuss@xxxxxxxxxxxxxxxxxxx>, "yan-jack.chen" <
> yan-jack.chen@xxxxxxxxx>
> 主题:Re:_[Maria-discuss]_转发:failed_to_connect_mariadb_at_
> addition_node_in_MariaDB_galera_cluster
> 日期:2016年09月02日 23点26分
>
> Hi,
>
> On Thu, Sep 1, 2016 at 11:08 PM, 西门吹牛 <zh1029@xxxxxxxx> wrote:
>
> Hi,
>   I deployed two MariaDB galera version in two nodes to build cluster. But
> I can’t connect MariaDB in second node as seems the port is not created by
> MariaDB because looks like it is stuck somehow.
>   Version: mysqld 10.1.17-MariaDB-debug VS galera-3-25.3.17
>
> I started MariaDB in first node. Seems fine. Port 3307 was created and I
> can login Mariadb by mysql.
>
>
> You shouldn't use --wsrep-new-cluster to start the 2nd node (in case you
> are).
> Regarding the hang:
> Does it happen all the time? Repeatable?
> Is that all you see in the error log? Nothing after the partial last line?
> Will it be possible to attach mysqld to some debugger to check where
> exactly does it hang?
>
> Best,
> Nirbhay
>
>
>
>
>
>
> [root@MMN-0(RCP-69) /root/test]
>
> # /home/_rcpadmin/bin/mariadb/bin/mysqld --defaults-extra-file=./mmn.my.cnf
> --wsrep-new-cluster --debug
>
> 2016-09-01 16:41:37 140716544316288 [Note] /home/_rcpadmin/bin/mariadb/bin/mysqld
> (mysqld 10.1.17-MariaDB-debug) starting as process 15248 ...
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting wsrep_ready to 0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Read nil XID from
> storage engines, skipping position init
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): loading
> provider library '/usr/lib64/libgalera_smm.so'
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): Galera
> 3.17(r0) by Codership Oy <info@xxxxxxxxxxxxx> loaded successfully.
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: CRC-32C: using hardware
> acceleration.
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Found saved state:
> 900987cc-7003-11e6-b25f-de0a52317f1d:0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Passing config to GCS:
> base_dir = /mariadb/; base_host = MMN-0; base_port = 4567;
> cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin =
> PT1S; evs.delayed_keep_period
> = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S;
> evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3;
> evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout =
> PT5S; evs.user_send_window = 2; evs.view_forget_timeout
> = PT24H; gcache.dir = /mariadb/; gcache.keep_pages_size = 0;
> gcache.mem_size = 0; gcache.name = /mariadb//galera.cache;
> gcache.page_size = 300M; gcache.size = 300M; gcomm.thread_prio = ;
> gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16;
> gcs.fc_master_slave
> = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25;
> gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25;
> gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
> pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum
> = false; pc.ignore_sb = false; pc
>
> 2016-09-01 16:41:38 140716029634304 [Note] WSREP: Service thread queue
> flushed.
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Assign initial position
> for certification: 0, protocol version: -1
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_sst_grab()
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Start replication
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster'
> option used, bootstrapping the cluster
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting initial position
> to 900987cc-7003-11e6-b25f-de0a52317f1d:0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: protonet asio version 0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Using CRC-32C for
> message checksums.
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: backend: asio
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm thread scheduling
> priority set to other:0
>
> 2016-09-01 16:41:38 140716544316288 [Warning] WSREP: access
> file(/mariadb//gvwstate.dat) failed(No such file or directory)
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: restore pc from disk
> failed
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: GMCast version 0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://
> 0.0.0.0:4567') listening at tcp://0.0.0.0:4567
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://
> 0.0.0.0:4567') multicast: , ttl: 1
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: EVS version 0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: bootstrapping new
> group 'example_cluster'
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: start_prim is enabled,
> turn off pc_recovery
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Node e5a32d3c state prim
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP:
> view(view_id(PRIM,e5a32d3c,1) memb {
>
>         e5a32d3c,0
>
> } joined {
>
> } left {
>
> } partitioned {
>
> })
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: save pc into disk
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> without UUID: tcp://169.254.0.4:4567
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> proto entry 0x5652b0352ef0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> without UUID: tcp://169.254.0.5:4567
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> proto entry 0x5652b035b720
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> without UUID: tcp://169.254.0.6:4567
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr
> proto entry 0x5652b0363ea0
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: connected
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Changing maximum packet
> size to 64500, resulting msg size: 32636
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Shifting CLOSED -> OPEN
> (TO: 0)
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Opened channel
> 'example_cluster'
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: New COMPONENT: primary =
> yes, bootstrap = no, my_idx = 0, memb_num = 1
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Waiting for SST to
> complete.
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE_EXCHANGE: sent
> state UUID: e5a3bc45-701f-11e6-ba1c-471590fea490
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: sent
> state msg: e5a3bc45-701f-11e6-ba1c-471590fea490
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: got
> state msg: e5a3bc45-701f-11e6-ba1c-471590fea490 from 0 (MMN-0)
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Quorum results:
>
>         version    = 4,
>
>         component  = PRIMARY,
>
>         conf_id    = 0,
>
>         members    = 1/1 (joined/total),
>
>         act_id     = 0,
>
>         last_appl. = -1,
>
>         protocols  = 0/7/3 (gcs/repl/appl),
>
>         group UUID = 900987cc-7003-11e6-b25f-de0a52317f1d
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Flow-control interval:
> [16, 16]
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Restored state OPEN ->
> JOINED (0)
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Member 0.0 (MMN-0)
> synced with group.
>
> 2016-09-01 16:41:38 140716542806784 [Note] WSREP: New cluster view: global
> state: 900987cc-7003-11e6-b25f-de0a52317f1d:0, view# 1: Primary, number
> of nodes: 1, my index: 0, protocol version 3
>
> 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Shifting JOINED ->
> SYNCED (TO: 0)
>
> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: SST complete, seqno: 0
>
> 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: Using mutexes to ref
> count buffer pool pages
>
> 2016-09-01 16:41:38 140716544316288 [Note] InnoDB:  InnoDB: !!!!!!!!
> UNIV_DEBUG switched on !!!!!!!!!
>
> 2016-09-01 16:41:38 140716544316288 [Note] InnoDB:  InnoDB: !!!!!!!!
> UNIV_SYNC_DEBUG switched on !!!!!!!!!
>
> 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: The InnoDB memory heap
> is disabled
>
> 2016-09-01 16:41:38 140716
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>
>

References