← Back to team overview

pbxt-discuss team mailing list archive

Re: Problems with PBXT

 


On Apr 25, 2011, at 11:34 PM, erkan yanar wrote:

On Sun, Apr 24, 2011 at 05:42:55PM +0200, Paul McCullagh wrote:
Hi Erkan,

Looks like the sweeper has not completed its work for the recovery
phase. It may be hanging for some reason, but I don't know why this is
the case.

So maybe it is a nice idea to have a possibility to monitor the sweeper
etc. threads also.

Yes, this would help.


Are there known (fixed) issues for the way pbxt allocates memory. I see
on a regular basis pbxt failing on long running transactions aka
INSERT INTO innodb_table SELECT * from pbxt_table.
Giving in the cli:
ERROR 1297 (HY000): Got temporary error -1 'Cannot allocate memory' from PBXT

How long does this operation take?

I can see a potential problem if PBXT is executing many transactions at the same time as the long running transaction.

A structure is held in memory for each transaction. This structure is not large, but could amount to millions if you are executing 1000s of operations per second, and the long transaction runs for an hour.

But, of course, it would help to know where the system ran out of memory. Unfortunately that information is not yet provided, because no one has ever run into this problem.


And in the error-log:

110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)

Regards
Erkan



On Apr 23, 2011, at 5:42 PM, erkan yanar wrote:

Moin,
Given: | 5.2.5-MariaDB-log |
    110423 18:16:22 PBXT 1.0.11-7 Pre-GA STATUS OUTPUT

I was converting a table from xtradb -> pbxt
Because of misconfiguration the system swapped and:
110423 05:10:33 [Error] SW-mysql_data void*
xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory

So I did it the -9 way.
Lets have a look into the errorlog then:

110423  5:11:06 Percona XtraDB (http://www.percona.com) 1.0.15-12.5
started; log sequence number 221799126990
110423  5:11:06 [Note] Recovering after a crash using tc.log
110423  5:11:06 [Note] Starting crash recovery...
110423 05:11:07 [Note] PBXT: Recovering from 1-69, bytes to read:
6844334523
110423 05:13:17 [Note] PBXT:  1  2  3  4  5  6  7  8  9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25
110423 06:17:46 [Note] PBXT: 26 27 28 29 30 31 32 33 34 35 36 37 38
39 40 41 42 43 44 45 46 47 48 49 50
110423 07:38:43 [Note] PBXT: 51 52 53 54 55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72 73 74 75
110423 08:59:01 [Note] PBXT: 76 77 78 79 80 81 82 83 84 85 86 87 88
89 90 91 92 93 94 95 96 97 98 99 100
110423 10:11:00 [Note] PBXT: Recovering complete at 191-11307935,
bytes read: 6844334523
110423 10:11:00 [Note] Table pdns.records: free row count (1) has
been set to the number of rows on the list: 1
110423 10:11:00 [Note] Crash recovery finished.

Imho quit slow .. then:
110423 10:11:02 [Note] Waiting for 'mysql_data' sweeper...
FYI: datadir := /data/mysql/mysql_data
This is the last entry. Does it mean it is still waiting?
In fact I can't access the table it just hangs doing a simple select
(limit 1)

So doing a strace (at 23 18:23:07 ) I see :
pid  9633] <... pwrite resumed> )      = 12800
[pid  9633] pwrite(7, "\241-\1'www.5eabfd0749382558294a429f"...,
11776, 3689697280) = 11776
[pid  9633] pwrite(7, ",-\1#28fca10a02a6064013d946a93c56"..., 11776,
3707686912) = 11776
[pid  9633] pwrite(7, "\3103\1#5bee153b03d21ba03338c17bab94"...,
13312, 3721564160 <unfinished ...>
[pid  8558] <... nanosleep resumed> NULL) = 0
[pid  8558] nanosleep({0, 10000000},  <unfinished ...>
[pid  9633] <... pwrite resumed> )      = 13312
[pid  9633] pwrite(7, "\2241\1#6453b2a23d7b45ba6199a68ebfa1"...,
12800, 3745730560) = 12800
[pid  9633] pwrite(7, "N-\1(blog.6aa66275aedaeb43c6cd62b"..., 11776,
3764375552) = 11776
[pid  9633] pwrite(7, "\237,\1#243877286abd33de621d0caf4bf5"...,
11776, 3767013376 <unfinished ...>
[pid  9634] <... nanosleep resumed> NULL) = 0
[pid  9634] nanosleep({0, 10000000},  <unfinished ...>
[pid  9633] <... pwrite resumed> )      = 11776
[pid  9633] pwrite(7, "\266-\1(blog.ed01180d84db6d59f9af453"...,
11776, 3777302528 <unfinished ...>
[pid  8558] <... nanosleep resumed> NULL) = 0
[pid  8558] nanosleep({0, 10000000},  <unfinished ...>
[pid 10561] <... nanosleep resumed> NULL) = 0
[pid 10561] nanosleep({0, 100000000},  <unfinished ...>
[pid  9633] <... pwrite resumed> )      = 11776
[pid  9633] pwrite(7, "\2740\1*schock.3ce6deea5e12d77de535d"...,
12800, 3791163392 <unfinished ...>
[pid  9634] <... nanosleep resumed> NULL) = 0
[pid  9634] nanosleep({0, 10000000}, ^C <unfinished ...>


This is the table data. Does this means the sweeper is still
working? Why can't I access the table?

root@localhost [pbxt]> show global variables like '%pbxt%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| pbxt_auto_increment_mode     | 0     |
| pbxt_checkpoint_frequency    | 24M   |
| pbxt_data_file_grow_size     | 50M   |
| pbxt_data_log_threshold      | 256M  |
| pbxt_flush_log_at_trx_commit | 0     |
| pbxt_garbage_threshold       | 50    |
| pbxt_index_cache_size        | 3G    |
| pbxt_log_buffer_size         | 512M  |
| pbxt_log_cache_size          | 1G    |
| pbxt_log_file_count          | 10    |
| pbxt_log_file_threshold      | 32MB  |
| pbxt_max_threads             | 2007  |
| pbxt_offline_log_function    | 0     |
| pbxt_record_cache_size       | 3G    |
| pbxt_row_file_grow_size      | 10M   |
| pbxt_support_xa              | ON    |
| pbxt_sweeper_priority        | 2     |
| pbxt_transaction_buffer_size | 8M    |
+------------------------------+-------+

select * from statistics;

+----+-----------------------+-------------+
|  1 | Current Time          |  1303576276 |
|  2 | Time Since Last Call  | 47879149862 |
|  3 | Commit Count          |           0 |
|  4 | Rollback Count        |           0 |
|  5 | Wait for Xact Count   |           0 |
|  6 | Dirty Xact Count      |           4 |
|  7 | Read Statements       |           0 |
|  8 | Write Statements      |           0 |
|  9 | Record Bytes Read     |  3420546564 |
| 10 | Record Bytes Written  |   442544326 |
| 11 | Record File Flushes   |           9 |
| 12 | Record Flush Time     |    30046591 |
| 13 | Record Cache Hits     |    28554990 |
| 14 | Record Cache Misses   |      234060 |
| 15 | Record Cache Frees    |      146298 |
| 16 | Record Cache Usage    |  2898603144 |
| 17 | Index Bytes Read      |  4162967552 |
| 18 | Index Bytes Written   |  2231254528 |
| 19 | Index File Flushes    |         159 |
| 20 | Index Flush Time      |  4939549599 |
| 21 | Index Cache Hits      |   744372857 |
| 22 | Index Cache Misses    |     1826952 |
| 23 | Index Cache Usage     |  3221225472 |
| 24 | Index Log Bytes In    |  3944771819 |
| 25 | Index Log Bytes Out   |   681640564 |
| 26 | Index Log File Syncs  |         639 |
| 27 | Index Log Sync Time   |  2048645036 |
| 28 | Xact Log Bytes In     |  2523484015 |
| 29 | Xact Log Bytes Out    |   873305600 |
| 30 | Xact Log File Syncs   |       10315 |
| 31 | Xact Log Sync Time    |  3357323411 |
| 32 | Xact Log Cache Hits   |       20337 |
| 33 | Xact Log Cache Misses |      209087 |
| 34 | Xact Log Cache Usage  |  1073708456 |
| 35 | Data Log Bytes In     |           0 |
| 36 | Data Log Bytes Out    |           0 |
| 37 | Data Log File Syncs   |           0 |
| 38 | Data Log Sync Time    |           0 |
| 39 | Bytes to Checkpoint   |  6818078139 |
| 40 | Log Bytes to Write    |   125051695 |
| 41 | Log Bytes to Sweep    |  6818078139 |
| 42 | Sweeper Wait on Xact  |           0 |
| 43 | Index Scan Count      |           1 |
| 44 | Table Scan Count      |           0 |
| 45 | Select Row Count      |           1 |
| 46 | Insert Row Count      |           0 |
| 47 | Update Row Count      |           0 |
| 48 | Delete Row Count      |           0 |
+----+-----------------------+-------------+

So why can't I access the table?
What did I wrong?

Regards
Erkan

--
über den grenzen muß die freiheit wohl wolkenlos sein

_______________________________________________
Mailing list: https://launchpad.net/~pbxt-discuss
Post to     : pbxt-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~pbxt-discuss
More help   : https://help.launchpad.net/ListHelp


--
über den grenzen muß die freiheit wohl wolkenlos sein



Follow ups

References