pbxt-discuss team mailing list archive
-
pbxt-discuss team
-
Mailing list archive
-
Message #00131
Re: Problems with PBXT
Moin Paul,
On Tue, Apr 26, 2011 at 10:27:11AM +0200, Paul McCullagh wrote:
>
> How long does this operation take?
depends on the memory left on the system. This is another behavior triggered be the same statement
root@localhost [pdns]> SELECT TABLE_NAME,INDEX_LENGTH,DATA_LENGTH FROM information_schema.TABLES where TABLE_NAME = 'records';
+------------+--------------+-------------+
| TABLE_NAME | INDEX_LENGTH | DATA_LENGTH |
+------------+--------------+-------------+
| records | 5684629504 | 12380356432 |
+------------+--------------+-------------+
oxymoron:~ # free -m
total used free shared buffers cached
Mem: 12010 3526 8484 0 5 2608
-/+ buffers/cache: 912 11097
Swap: 2047 1 2046
root@localhost [pdns]> select now(); insert into records_i SELECT * from records; select now();
+---------------------+
| now() |
+---------------------+
| 2011-04-26 12:06:32 |
+---------------------+
1 row in set (0.01 sec)
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/app/mysql/run/mysql.sock' (111)
ERROR: Can't connect to the server
Same time:
oxymoron:~ # vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 1084 8691712 5472 2670924 0 0 5 8 7 3 1 0 99 0 0
0 2 1084 7921916 5476 3015972 0 0 27292 9661 1797 1283 2 0 95 3 0
1 0 1084 6842644 5480 3268668 0 0 305 42661 1698 966 6 0 93 1 0
0 2 1084 5881704 5480 3534124 0 0 5529 38684 1776 1078 5 0 93 2 0
1 1 1084 5165524 5480 3856720 0 0 20224 22219 1938 1277 3 0 92 4 0
0 1 1084 4506692 5480 4152792 0 0 18664 26030 1925 1274 3 0 92 4 0
0 3 1084 3855564 5480 4444508 0 0 18446 23725 1914 1248 4 0 93 4 0
1 1 1084 3208820 5480 4737164 0 0 18200 24025 1903 1257 3 0 92 4 0
0 1 1084 2594752 5480 5011100 0 0 17561 24291 1935 1276 3 0 92 4 0
2 0 1084 2054100 5480 5164352 0 0 15307 23424 1842 1244 4 0 91 5 0
1 0 1084 1128240 5480 5164352 0 0 16 73608 2233 1256 6 0 93 0 0
2 0 1084 938616 5480 5192516 0 0 2837 63901 1839 1271 6 0 92 2 0
1 0 1084 522708 5480 5359420 0 0 16650 27594 1732 1356 4 0 90 6 0
1 0 1084 138232 5480 5513432 0 0 15381 24631 1632 1301 3 0 91 5 0
0 1 1084 35744 1200 5512868 0 0 14048 27467 1715 1313 3 0 92 5 0
0 1 1084 37856 1132 5511020 0 0 14601 27223 1712 1285 3 0 93 4 0
0 1 1084 38276 1132 5510872 0 0 14906 26874 1818 1295 3 0 93 4 0
0 2 1084 34448 1132 5513956 0 0 14720 26579 1842 1252 3 0 93 4 0
0 1 1084 36852 1132 5319328 0 0 14232 25499 2023 1372 3 0 93 4 0
1 0 1084 38676 1132 5125040 0 0 13928 26504 1980 1378 3 0 93 4 0
0 1 1084 33884 1088 5129804 0 0 14444 27538 1759 1303 3 0 93 4 0
0 2 1084 35944 1064 5126900 0 0 14900 26734 1757 1286 3 0 93 4 0
1 0 1084 37676 1080 5124676 0 0 14699 27387 1794 1309 3 0 93 4 0
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 2 1084 33988 1080 5078096 0 0 14403 26239 1803 1317 3 0 93 4 0
1 0 1084 37320 1080 4851364 0 0 15213 26896 1969 1470 3 0 93 4 0
0 1 1084 38064 1056 4792764 0 0 14746 28942 1797 1346 3 0 93 4 0
2 0 1084 36132 1036 4794592 0 0 15115 26748 1704 1302 3 0 93 4 0
0 1 1084 34784 980 4794976 0 0 14579 27944 1715 1308 3 0 92 4 0
0 1 1084 34800 888 4766560 0 0 14840 28162 1838 1363 3 0 93 3 0
0 3 1084 38924 760 4696972 0 0 14718 26414 1921 1398 3 0 93 3 0
2 0 1084 34116 760 4707056 0 0 15038 27688 1902 1342 3 0 93 3 0
0 1 1084 36844 760 4664440 0 0 13927 26201 1888 1342 3 0 93 3 0
110426 12:11:41 mysqld_safe Number of processes running now: 0
110426 12:11:41 mysqld_safe mysqld restarted
1 0 1084 5851496 916 4671840 0 0 1129 356 1611 925 5 1 94 0 0
Errorlog:
#v+
110426 12:11:40 [ERROR] mysqld: Out of memory (Needed 3431992 bytes)
110426 12:11:40 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
110426 12:11:40 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=0
read_buffer_size=131072
max_used_connections=1
max_threads=2002
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4381051 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fa7a8267680
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa8d22200f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0x9be9ae]
./bin/mysqld(handle_segfault+0x3a4) [0x60fe44]
/lib64/libpthread.so.0 [0x7fa8d1414a90]
./bin/mysqld(fill_record(THD*, Field**, List<Item>&, bool)+0x187) [0x65dc07]
./bin/mysqld(fill_record_n_invoke_before_triggers(THD*, Field**, List<Item>&, bool, Table_triggers_list*, trg_event_type)+0x2c) [0x65dcbc]
./bin/mysqld(select_insert::send_data(List<Item>&)+0x54) [0x696794]
./bin/mysqld [0x670959]
./bin/mysqld [0x6818ac]
./bin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x6d) [0x6819fd]
./bin/mysqld [0x68413c]
./bin/mysqld(JOIN::exec()+0x112c) [0x69129c]
./bin/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_sele
ct_lex*)+0x199) [0x692739]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x16f) [0x69308f]
./bin/mysqld(mysql_execute_command(THD*)+0x5452) [0x6209d2]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x621c39]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xb7b) [0x6227bb]
./bin/mysqld(do_command(THD*)+0x101) [0x6231b1]
./bin/mysqld(handle_one_connection+0xd2) [0x614862]
/lib64/libpthread.so.0 [0x7fa8d140d070]
/lib64/libc.so.6(clone+0x6d) [0x7fa8d0a1510d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1804ed8): insert into records_i SELECT * from records
Connection ID (thread ID): 1
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
110426 12:11:41 mysqld_safe Number of processes running now: 0
110426 12:11:41 mysqld_safe mysqld restarted
#v-
Even if this is not the error we had in the last email. It is triggerd from the same query. Ive seen both last days triggered from the same query.
The table records was created before in the same way from innodb to pbxt.
It is the only query on that database.
>
> I can see a potential problem if PBXT is executing many transactions
> at the same time as the long running transaction.
>
> A structure is held in memory for each transaction. This structure is
> not large, but could amount to millions if you are executing 1000s of
> operations per second, and the long transaction runs for an hour.
It was the only query running for about 310 seconds.
As I understood pbxt it only needs to know the pointer to the dataset of that transaction,
so that shouldn't (if Im right) need that much memory.
To make it sure I inserted from pbxt to pbxt:
root@localhost [pdns]> insert into records_p SELECT * from records;
ERROR 1297 (HY000): Got temporary error -1 'Cannot allocate memory' from PBXT
The most important part from vmstat:
0 2 824688 37544 652 2949040 142 31 1281 17772 2196 1938 1 0 87 11 0
1 3 824692 34328 652 2942156 291 0 509 14359 1468 724 0 0 84 15 0
0 1 824944 38560 652 2944216 92 25 258 8017 1669 935 0 0 86 14 0
0 3 827004 35752 648 2943260 216 226 906 40563 1761 838 1 0 87 12 0
0 2 827768 35072 636 2932224 1442 76 2523 47264 2457 1206 1 0 90 9 0
Error happended here
0 2 538876 2955436 656 2996796 2744 253 10766 1424 3117 1797 2 1 90 7 0
1 1 531120 3460992 656 3188120 967 0 13036 5700 2217 1831 5 0 88 7 0
1 1 531120 3195572 656 3369304 653 0 12858 5955 2108 1811 5 0 88 7 0
Errorlog:
110426 12:59:31 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
110426 12:59:31 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)
Hope this helps
Regards
Erkan
>
> But, of course, it would help to know where the system ran out of
> memory. Unfortunately that information is not yet provided, because no
> one has ever run into this problem.
>
> >
> >And in the error-log:
> >
> >110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)
> >errno (12): Cannot allocate memory
> >110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)
> >
> >Regards
> >Erkan
> >
> >
> >>
> >>On Apr 23, 2011, at 5:42 PM, erkan yanar wrote:
> >>
> >>>Moin,
> >>>Given: | 5.2.5-MariaDB-log |
> >>> 110423 18:16:22 PBXT 1.0.11-7 Pre-GA STATUS OUTPUT
> >>>
> >>>I was converting a table from xtradb -> pbxt
> >>>Because of misconfiguration the system swapped and:
> >>>110423 05:10:33 [Error] SW-mysql_data void*
> >>>xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
> >>>
> >>>So I did it the -9 way.
> >>>Lets have a look into the errorlog then:
> >>>
> >>>110423 5:11:06 Percona XtraDB (http://www.percona.com) 1.0.15-12.5
> >>>started; log sequence number 221799126990
> >>>110423 5:11:06 [Note] Recovering after a crash using tc.log
> >>>110423 5:11:06 [Note] Starting crash recovery...
> >>>110423 05:11:07 [Note] PBXT: Recovering from 1-69, bytes to read:
> >>>6844334523
> >>>110423 05:13:17 [Note] PBXT: 1 2 3 4 5 6 7 8 9 10 11 12 13
> >>>14 15 16 17 18 19 20 21 22 23 24 25
> >>>110423 06:17:46 [Note] PBXT: 26 27 28 29 30 31 32 33 34 35 36 37 38
> >>>39 40 41 42 43 44 45 46 47 48 49 50
> >>>110423 07:38:43 [Note] PBXT: 51 52 53 54 55 56 57 58 59 60 61 62 63
> >>>64 65 66 67 68 69 70 71 72 73 74 75
> >>>110423 08:59:01 [Note] PBXT: 76 77 78 79 80 81 82 83 84 85 86 87 88
> >>>89 90 91 92 93 94 95 96 97 98 99 100
> >>>110423 10:11:00 [Note] PBXT: Recovering complete at 191-11307935,
> >>>bytes read: 6844334523
> >>>110423 10:11:00 [Note] Table pdns.records: free row count (1) has
> >>>been set to the number of rows on the list: 1
> >>>110423 10:11:00 [Note] Crash recovery finished.
> >>>
> >>>Imho quit slow .. then:
> >>>110423 10:11:02 [Note] Waiting for 'mysql_data' sweeper...
> >>>FYI: datadir := /data/mysql/mysql_data
> >>>This is the last entry. Does it mean it is still waiting?
> >>>In fact I can't access the table it just hangs doing a simple select
> >>>(limit 1)
> >>>
> >>>So doing a strace (at 23 18:23:07 ) I see :
> >>>pid 9633] <... pwrite resumed> ) = 12800
> >>>[pid 9633] pwrite(7, "\241-\1'www.5eabfd0749382558294a429f"...,
> >>>11776, 3689697280) = 11776
> >>>[pid 9633] pwrite(7, ",-\1#28fca10a02a6064013d946a93c56"..., 11776,
> >>>3707686912) = 11776
> >>>[pid 9633] pwrite(7, "\3103\1#5bee153b03d21ba03338c17bab94"...,
> >>>13312, 3721564160 <unfinished ...>
> >>>[pid 8558] <... nanosleep resumed> NULL) = 0
> >>>[pid 8558] nanosleep({0, 10000000}, <unfinished ...>
> >>>[pid 9633] <... pwrite resumed> ) = 13312
> >>>[pid 9633] pwrite(7, "\2241\1#6453b2a23d7b45ba6199a68ebfa1"...,
> >>>12800, 3745730560) = 12800
> >>>[pid 9633] pwrite(7, "N-\1(blog.6aa66275aedaeb43c6cd62b"..., 11776,
> >>>3764375552) = 11776
> >>>[pid 9633] pwrite(7, "\237,\1#243877286abd33de621d0caf4bf5"...,
> >>>11776, 3767013376 <unfinished ...>
> >>>[pid 9634] <... nanosleep resumed> NULL) = 0
> >>>[pid 9634] nanosleep({0, 10000000}, <unfinished ...>
> >>>[pid 9633] <... pwrite resumed> ) = 11776
> >>>[pid 9633] pwrite(7, "\266-\1(blog.ed01180d84db6d59f9af453"...,
> >>>11776, 3777302528 <unfinished ...>
> >>>[pid 8558] <... nanosleep resumed> NULL) = 0
> >>>[pid 8558] nanosleep({0, 10000000}, <unfinished ...>
> >>>[pid 10561] <... nanosleep resumed> NULL) = 0
> >>>[pid 10561] nanosleep({0, 100000000}, <unfinished ...>
> >>>[pid 9633] <... pwrite resumed> ) = 11776
> >>>[pid 9633] pwrite(7, "\2740\1*schock.3ce6deea5e12d77de535d"...,
> >>>12800, 3791163392 <unfinished ...>
> >>>[pid 9634] <... nanosleep resumed> NULL) = 0
> >>>[pid 9634] nanosleep({0, 10000000}, ^C <unfinished ...>
> >>>
> >>>
> >>>This is the table data. Does this means the sweeper is still
> >>>working? Why can't I access the table?
> >>>
> >>>root@localhost [pbxt]> show global variables like '%pbxt%';
> >>>+------------------------------+-------+
> >>>| Variable_name | Value |
> >>>+------------------------------+-------+
> >>>| pbxt_auto_increment_mode | 0 |
> >>>| pbxt_checkpoint_frequency | 24M |
> >>>| pbxt_data_file_grow_size | 50M |
> >>>| pbxt_data_log_threshold | 256M |
> >>>| pbxt_flush_log_at_trx_commit | 0 |
> >>>| pbxt_garbage_threshold | 50 |
> >>>| pbxt_index_cache_size | 3G |
> >>>| pbxt_log_buffer_size | 512M |
> >>>| pbxt_log_cache_size | 1G |
> >>>| pbxt_log_file_count | 10 |
> >>>| pbxt_log_file_threshold | 32MB |
> >>>| pbxt_max_threads | 2007 |
> >>>| pbxt_offline_log_function | 0 |
> >>>| pbxt_record_cache_size | 3G |
> >>>| pbxt_row_file_grow_size | 10M |
> >>>| pbxt_support_xa | ON |
> >>>| pbxt_sweeper_priority | 2 |
> >>>| pbxt_transaction_buffer_size | 8M |
> >>>+------------------------------+-------+
> >>>
> >>>select * from statistics;
> >>>
> >>>+----+-----------------------+-------------+
> >>>| 1 | Current Time | 1303576276 |
> >>>| 2 | Time Since Last Call | 47879149862 |
> >>>| 3 | Commit Count | 0 |
> >>>| 4 | Rollback Count | 0 |
> >>>| 5 | Wait for Xact Count | 0 |
> >>>| 6 | Dirty Xact Count | 4 |
> >>>| 7 | Read Statements | 0 |
> >>>| 8 | Write Statements | 0 |
> >>>| 9 | Record Bytes Read | 3420546564 |
> >>>| 10 | Record Bytes Written | 442544326 |
> >>>| 11 | Record File Flushes | 9 |
> >>>| 12 | Record Flush Time | 30046591 |
> >>>| 13 | Record Cache Hits | 28554990 |
> >>>| 14 | Record Cache Misses | 234060 |
> >>>| 15 | Record Cache Frees | 146298 |
> >>>| 16 | Record Cache Usage | 2898603144 |
> >>>| 17 | Index Bytes Read | 4162967552 |
> >>>| 18 | Index Bytes Written | 2231254528 |
> >>>| 19 | Index File Flushes | 159 |
> >>>| 20 | Index Flush Time | 4939549599 |
> >>>| 21 | Index Cache Hits | 744372857 |
> >>>| 22 | Index Cache Misses | 1826952 |
> >>>| 23 | Index Cache Usage | 3221225472 |
> >>>| 24 | Index Log Bytes In | 3944771819 |
> >>>| 25 | Index Log Bytes Out | 681640564 |
> >>>| 26 | Index Log File Syncs | 639 |
> >>>| 27 | Index Log Sync Time | 2048645036 |
> >>>| 28 | Xact Log Bytes In | 2523484015 |
> >>>| 29 | Xact Log Bytes Out | 873305600 |
> >>>| 30 | Xact Log File Syncs | 10315 |
> >>>| 31 | Xact Log Sync Time | 3357323411 |
> >>>| 32 | Xact Log Cache Hits | 20337 |
> >>>| 33 | Xact Log Cache Misses | 209087 |
> >>>| 34 | Xact Log Cache Usage | 1073708456 |
> >>>| 35 | Data Log Bytes In | 0 |
> >>>| 36 | Data Log Bytes Out | 0 |
> >>>| 37 | Data Log File Syncs | 0 |
> >>>| 38 | Data Log Sync Time | 0 |
> >>>| 39 | Bytes to Checkpoint | 6818078139 |
> >>>| 40 | Log Bytes to Write | 125051695 |
> >>>| 41 | Log Bytes to Sweep | 6818078139 |
> >>>| 42 | Sweeper Wait on Xact | 0 |
> >>>| 43 | Index Scan Count | 1 |
> >>>| 44 | Table Scan Count | 0 |
> >>>| 45 | Select Row Count | 1 |
> >>>| 46 | Insert Row Count | 0 |
> >>>| 47 | Update Row Count | 0 |
> >>>| 48 | Delete Row Count | 0 |
> >>>+----+-----------------------+-------------+
> >>>
> >>>So why can't I access the table?
> >>>What did I wrong?
> >>>
> >>>Regards
> >>>Erkan
> >>>
> >>>--
> >>>über den grenzen muß die freiheit wohl wolkenlos sein
> >>>
> >>>_______________________________________________
> >>>Mailing list: https://launchpad.net/~pbxt-discuss
> >>>Post to : pbxt-discuss@xxxxxxxxxxxxxxxxxxx
> >>>Unsubscribe : https://launchpad.net/~pbxt-discuss
> >>>More help : https://help.launchpad.net/ListHelp
> >>
> >
> >--
> >über den grenzen muß die freiheit wohl wolkenlos sein
>
--
über den grenzen muß die freiheit wohl wolkenlos sein
References