pbxt-discuss team mailing list archive
-
pbxt-discuss team
-
Mailing list archive
-
Message #00129
Re: Problems with PBXT
Hi Paul,
On Sun, Apr 24, 2011 at 05:42:55PM +0200, Paul McCullagh wrote:
> Hi Erkan,
>
> Looks like the sweeper has not completed its work for the recovery
> phase. It may be hanging for some reason, but I don't know why this is
> the case.
So maybe it is a nice idea to have a possibility to monitor the sweeper
etc. threads also.
Are there known (fixed) issues for the way pbxt allocates memory. I see
on a regular basis pbxt failing on long running transactions aka
INSERT INTO innodb_table SELECT * from pbxt_table.
Giving in the cli:
ERROR 1297 (HY000): Got temporary error -1 'Cannot allocate memory' from PBXT
And in the error-log:
110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)
Regards
Erkan
>
> On Apr 23, 2011, at 5:42 PM, erkan yanar wrote:
>
> >Moin,
> >Given: | 5.2.5-MariaDB-log |
> > 110423 18:16:22 PBXT 1.0.11-7 Pre-GA STATUS OUTPUT
> >
> >I was converting a table from xtradb -> pbxt
> >Because of misconfiguration the system swapped and:
> >110423 05:10:33 [Error] SW-mysql_data void*
> >xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
> >
> >So I did it the -9 way.
> >Lets have a look into the errorlog then:
> >
> >110423 5:11:06 Percona XtraDB (http://www.percona.com) 1.0.15-12.5
> >started; log sequence number 221799126990
> >110423 5:11:06 [Note] Recovering after a crash using tc.log
> >110423 5:11:06 [Note] Starting crash recovery...
> >110423 05:11:07 [Note] PBXT: Recovering from 1-69, bytes to read:
> >6844334523
> >110423 05:13:17 [Note] PBXT: 1 2 3 4 5 6 7 8 9 10 11 12 13
> >14 15 16 17 18 19 20 21 22 23 24 25
> >110423 06:17:46 [Note] PBXT: 26 27 28 29 30 31 32 33 34 35 36 37 38
> >39 40 41 42 43 44 45 46 47 48 49 50
> >110423 07:38:43 [Note] PBXT: 51 52 53 54 55 56 57 58 59 60 61 62 63
> >64 65 66 67 68 69 70 71 72 73 74 75
> >110423 08:59:01 [Note] PBXT: 76 77 78 79 80 81 82 83 84 85 86 87 88
> >89 90 91 92 93 94 95 96 97 98 99 100
> >110423 10:11:00 [Note] PBXT: Recovering complete at 191-11307935,
> >bytes read: 6844334523
> >110423 10:11:00 [Note] Table pdns.records: free row count (1) has
> >been set to the number of rows on the list: 1
> >110423 10:11:00 [Note] Crash recovery finished.
> >
> >Imho quit slow .. then:
> >110423 10:11:02 [Note] Waiting for 'mysql_data' sweeper...
> >FYI: datadir := /data/mysql/mysql_data
> >This is the last entry. Does it mean it is still waiting?
> >In fact I can't access the table it just hangs doing a simple select
> >(limit 1)
> >
> >So doing a strace (at 23 18:23:07 ) I see :
> >pid 9633] <... pwrite resumed> ) = 12800
> >[pid 9633] pwrite(7, "\241-\1'www.5eabfd0749382558294a429f"...,
> >11776, 3689697280) = 11776
> >[pid 9633] pwrite(7, ",-\1#28fca10a02a6064013d946a93c56"..., 11776,
> >3707686912) = 11776
> >[pid 9633] pwrite(7, "\3103\1#5bee153b03d21ba03338c17bab94"...,
> >13312, 3721564160 <unfinished ...>
> >[pid 8558] <... nanosleep resumed> NULL) = 0
> >[pid 8558] nanosleep({0, 10000000}, <unfinished ...>
> >[pid 9633] <... pwrite resumed> ) = 13312
> >[pid 9633] pwrite(7, "\2241\1#6453b2a23d7b45ba6199a68ebfa1"...,
> >12800, 3745730560) = 12800
> >[pid 9633] pwrite(7, "N-\1(blog.6aa66275aedaeb43c6cd62b"..., 11776,
> >3764375552) = 11776
> >[pid 9633] pwrite(7, "\237,\1#243877286abd33de621d0caf4bf5"...,
> >11776, 3767013376 <unfinished ...>
> >[pid 9634] <... nanosleep resumed> NULL) = 0
> >[pid 9634] nanosleep({0, 10000000}, <unfinished ...>
> >[pid 9633] <... pwrite resumed> ) = 11776
> >[pid 9633] pwrite(7, "\266-\1(blog.ed01180d84db6d59f9af453"...,
> >11776, 3777302528 <unfinished ...>
> >[pid 8558] <... nanosleep resumed> NULL) = 0
> >[pid 8558] nanosleep({0, 10000000}, <unfinished ...>
> >[pid 10561] <... nanosleep resumed> NULL) = 0
> >[pid 10561] nanosleep({0, 100000000}, <unfinished ...>
> >[pid 9633] <... pwrite resumed> ) = 11776
> >[pid 9633] pwrite(7, "\2740\1*schock.3ce6deea5e12d77de535d"...,
> >12800, 3791163392 <unfinished ...>
> >[pid 9634] <... nanosleep resumed> NULL) = 0
> >[pid 9634] nanosleep({0, 10000000}, ^C <unfinished ...>
> >
> >
> >This is the table data. Does this means the sweeper is still
> >working? Why can't I access the table?
> >
> >root@localhost [pbxt]> show global variables like '%pbxt%';
> >+------------------------------+-------+
> >| Variable_name | Value |
> >+------------------------------+-------+
> >| pbxt_auto_increment_mode | 0 |
> >| pbxt_checkpoint_frequency | 24M |
> >| pbxt_data_file_grow_size | 50M |
> >| pbxt_data_log_threshold | 256M |
> >| pbxt_flush_log_at_trx_commit | 0 |
> >| pbxt_garbage_threshold | 50 |
> >| pbxt_index_cache_size | 3G |
> >| pbxt_log_buffer_size | 512M |
> >| pbxt_log_cache_size | 1G |
> >| pbxt_log_file_count | 10 |
> >| pbxt_log_file_threshold | 32MB |
> >| pbxt_max_threads | 2007 |
> >| pbxt_offline_log_function | 0 |
> >| pbxt_record_cache_size | 3G |
> >| pbxt_row_file_grow_size | 10M |
> >| pbxt_support_xa | ON |
> >| pbxt_sweeper_priority | 2 |
> >| pbxt_transaction_buffer_size | 8M |
> >+------------------------------+-------+
> >
> >select * from statistics;
> >
> >+----+-----------------------+-------------+
> >| 1 | Current Time | 1303576276 |
> >| 2 | Time Since Last Call | 47879149862 |
> >| 3 | Commit Count | 0 |
> >| 4 | Rollback Count | 0 |
> >| 5 | Wait for Xact Count | 0 |
> >| 6 | Dirty Xact Count | 4 |
> >| 7 | Read Statements | 0 |
> >| 8 | Write Statements | 0 |
> >| 9 | Record Bytes Read | 3420546564 |
> >| 10 | Record Bytes Written | 442544326 |
> >| 11 | Record File Flushes | 9 |
> >| 12 | Record Flush Time | 30046591 |
> >| 13 | Record Cache Hits | 28554990 |
> >| 14 | Record Cache Misses | 234060 |
> >| 15 | Record Cache Frees | 146298 |
> >| 16 | Record Cache Usage | 2898603144 |
> >| 17 | Index Bytes Read | 4162967552 |
> >| 18 | Index Bytes Written | 2231254528 |
> >| 19 | Index File Flushes | 159 |
> >| 20 | Index Flush Time | 4939549599 |
> >| 21 | Index Cache Hits | 744372857 |
> >| 22 | Index Cache Misses | 1826952 |
> >| 23 | Index Cache Usage | 3221225472 |
> >| 24 | Index Log Bytes In | 3944771819 |
> >| 25 | Index Log Bytes Out | 681640564 |
> >| 26 | Index Log File Syncs | 639 |
> >| 27 | Index Log Sync Time | 2048645036 |
> >| 28 | Xact Log Bytes In | 2523484015 |
> >| 29 | Xact Log Bytes Out | 873305600 |
> >| 30 | Xact Log File Syncs | 10315 |
> >| 31 | Xact Log Sync Time | 3357323411 |
> >| 32 | Xact Log Cache Hits | 20337 |
> >| 33 | Xact Log Cache Misses | 209087 |
> >| 34 | Xact Log Cache Usage | 1073708456 |
> >| 35 | Data Log Bytes In | 0 |
> >| 36 | Data Log Bytes Out | 0 |
> >| 37 | Data Log File Syncs | 0 |
> >| 38 | Data Log Sync Time | 0 |
> >| 39 | Bytes to Checkpoint | 6818078139 |
> >| 40 | Log Bytes to Write | 125051695 |
> >| 41 | Log Bytes to Sweep | 6818078139 |
> >| 42 | Sweeper Wait on Xact | 0 |
> >| 43 | Index Scan Count | 1 |
> >| 44 | Table Scan Count | 0 |
> >| 45 | Select Row Count | 1 |
> >| 46 | Insert Row Count | 0 |
> >| 47 | Update Row Count | 0 |
> >| 48 | Delete Row Count | 0 |
> >+----+-----------------------+-------------+
> >
> >So why can't I access the table?
> >What did I wrong?
> >
> >Regards
> >Erkan
> >
> >--
> >über den grenzen muß die freiheit wohl wolkenlos sein
> >
> >_______________________________________________
> >Mailing list: https://launchpad.net/~pbxt-discuss
> >Post to : pbxt-discuss@xxxxxxxxxxxxxxxxxxx
> >Unsubscribe : https://launchpad.net/~pbxt-discuss
> >More help : https://help.launchpad.net/ListHelp
>
--
über den grenzen muß die freiheit wohl wolkenlos sein
Follow ups
References