← Back to team overview

pbxt-discuss team mailing list archive

Re: Problems with PBXT

 

Moin Paul,

On Tue, Apr 26, 2011 at 10:27:11AM +0200, Paul McCullagh wrote:
> 
> How long does this operation take?
depends on the memory left on the system. This is another behavior triggered be the same statement

root@localhost [pdns]> SELECT TABLE_NAME,INDEX_LENGTH,DATA_LENGTH FROM information_schema.TABLES where TABLE_NAME = 'records';
+------------+--------------+-------------+
| TABLE_NAME | INDEX_LENGTH | DATA_LENGTH |
+------------+--------------+-------------+
| records    |   5684629504 | 12380356432 |
+------------+--------------+-------------+



oxymoron:~ # free -m
             total       used       free     shared    buffers     cached
Mem:         12010       3526       8484          0          5       2608
-/+ buffers/cache:        912      11097
Swap:         2047          1       2046

root@localhost [pdns]> select now(); insert into records_i SELECT * from records; select now();

+---------------------+
| now()               |
+---------------------+
| 2011-04-26 12:06:32 |
+---------------------+
1 row in set (0.01 sec)

        ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/app/mysql/run/mysql.sock' (111)
ERROR: Can't connect to the server

Same time:

oxymoron:~ # vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0   1084 8691712   5472 2670924    0    0     5     8    7    3  1  0 99  0  0
 0  2   1084 7921916   5476 3015972    0    0 27292  9661 1797 1283  2  0 95  3  0
 1  0   1084 6842644   5480 3268668    0    0   305 42661 1698  966  6  0 93  1  0
 0  2   1084 5881704   5480 3534124    0    0  5529 38684 1776 1078  5  0 93  2  0
 1  1   1084 5165524   5480 3856720    0    0 20224 22219 1938 1277  3  0 92  4  0
 0  1   1084 4506692   5480 4152792    0    0 18664 26030 1925 1274  3  0 92  4  0
 0  3   1084 3855564   5480 4444508    0    0 18446 23725 1914 1248  4  0 93  4  0
 1  1   1084 3208820   5480 4737164    0    0 18200 24025 1903 1257  3  0 92  4  0
 0  1   1084 2594752   5480 5011100    0    0 17561 24291 1935 1276  3  0 92  4  0
 2  0   1084 2054100   5480 5164352    0    0 15307 23424 1842 1244  4  0 91  5  0
 1  0   1084 1128240   5480 5164352    0    0    16 73608 2233 1256  6  0 93  0  0
 2  0   1084 938616   5480 5192516    0    0  2837 63901 1839 1271  6  0 92  2  0
 1  0   1084 522708   5480 5359420    0    0 16650 27594 1732 1356  4  0 90  6  0
 1  0   1084 138232   5480 5513432    0    0 15381 24631 1632 1301  3  0 91  5  0
 0  1   1084  35744   1200 5512868    0    0 14048 27467 1715 1313  3  0 92  5  0
 0  1   1084  37856   1132 5511020    0    0 14601 27223 1712 1285  3  0 93  4  0
 0  1   1084  38276   1132 5510872    0    0 14906 26874 1818 1295  3  0 93  4  0
 0  2   1084  34448   1132 5513956    0    0 14720 26579 1842 1252  3  0 93  4  0
 0  1   1084  36852   1132 5319328    0    0 14232 25499 2023 1372  3  0 93  4  0
 1  0   1084  38676   1132 5125040    0    0 13928 26504 1980 1378  3  0 93  4  0
 0  1   1084  33884   1088 5129804    0    0 14444 27538 1759 1303  3  0 93  4  0
 0  2   1084  35944   1064 5126900    0    0 14900 26734 1757 1286  3  0 93  4  0
 1  0   1084  37676   1080 5124676    0    0 14699 27387 1794 1309  3  0 93  4  0
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2   1084  33988   1080 5078096    0    0 14403 26239 1803 1317  3  0 93  4  0
 1  0   1084  37320   1080 4851364    0    0 15213 26896 1969 1470  3  0 93  4  0
 0  1   1084  38064   1056 4792764    0    0 14746 28942 1797 1346  3  0 93  4  0
 2  0   1084  36132   1036 4794592    0    0 15115 26748 1704 1302  3  0 93  4  0
 0  1   1084  34784    980 4794976    0    0 14579 27944 1715 1308  3  0 92  4  0
 0  1   1084  34800    888 4766560    0    0 14840 28162 1838 1363  3  0 93  3  0
 0  3   1084  38924    760 4696972    0    0 14718 26414 1921 1398  3  0 93  3  0
 2  0   1084  34116    760 4707056    0    0 15038 27688 1902 1342  3  0 93  3  0
 0  1   1084  36844    760 4664440    0    0 13927 26201 1888 1342  3  0 93  3  0
110426 12:11:41 mysqld_safe Number of processes running now: 0
110426 12:11:41 mysqld_safe mysqld restarted
 1  0   1084 5851496    916 4671840    0    0  1129   356 1611  925  5  1 94  0  0

Errorlog:
#v+
110426 12:11:40 [ERROR] mysqld: Out of memory (Needed 3431992 bytes)
110426 12:11:40 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
110426 12:11:40 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=0
read_buffer_size=131072
max_used_connections=1
max_threads=2002
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4381051 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fa7a8267680
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa8d22200f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0x9be9ae]
./bin/mysqld(handle_segfault+0x3a4) [0x60fe44]
/lib64/libpthread.so.0 [0x7fa8d1414a90]
./bin/mysqld(fill_record(THD*, Field**, List<Item>&, bool)+0x187) [0x65dc07]
./bin/mysqld(fill_record_n_invoke_before_triggers(THD*, Field**, List<Item>&, bool, Table_triggers_list*, trg_event_type)+0x2c) [0x65dcbc]
./bin/mysqld(select_insert::send_data(List<Item>&)+0x54) [0x696794]
./bin/mysqld [0x670959]
./bin/mysqld [0x6818ac]
./bin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x6d) [0x6819fd]
./bin/mysqld [0x68413c]
./bin/mysqld(JOIN::exec()+0x112c) [0x69129c]
./bin/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_sele
ct_lex*)+0x199) [0x692739]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x16f) [0x69308f]
./bin/mysqld(mysql_execute_command(THD*)+0x5452) [0x6209d2]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x621c39]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xb7b) [0x6227bb]
./bin/mysqld(do_command(THD*)+0x101) [0x6231b1]
./bin/mysqld(handle_one_connection+0xd2) [0x614862]
/lib64/libpthread.so.0 [0x7fa8d140d070]
/lib64/libc.so.6(clone+0x6d) [0x7fa8d0a1510d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1804ed8): insert into records_i SELECT * from records
Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
110426 12:11:41 mysqld_safe Number of processes running now: 0
110426 12:11:41 mysqld_safe mysqld restarted
#v-


Even if this is not the error we had in the last email. It is triggerd from the same query. Ive seen both last days triggered from the same query. 
The table records was created before in the same way from innodb to pbxt.
It is the only query on that database.

> 
> I can see a potential problem if PBXT is executing many transactions  
> at the same time as the long running transaction.
> 
> A structure is held in memory for each transaction. This structure is  
> not large, but could amount to millions if you are executing 1000s of  
> operations per second, and the long transaction runs for an hour.

It was the only query running for about 310 seconds.
As I understood pbxt it only needs to know the pointer to the dataset of that transaction,
so that shouldn't (if Im right) need that much memory.


To make it sure I inserted from pbxt to pbxt:

root@localhost [pdns]> insert into records_p SELECT * from records;
ERROR 1297 (HY000): Got temporary error -1 'Cannot allocate memory' from PBXT

The most important part from vmstat:
 0  2 824688  37544    652 2949040  142   31  1281 17772 2196 1938  1  0 87 11  0
 1  3 824692  34328    652 2942156  291    0   509 14359 1468  724  0  0 84 15  0
 0  1 824944  38560    652 2944216   92   25   258  8017 1669  935  0  0 86 14  0
 0  3 827004  35752    648 2943260  216  226   906 40563 1761  838  1  0 87 12  0
 0  2 827768  35072    636 2932224 1442   76  2523 47264 2457 1206  1  0 90  9  0
Error happended here
 0  2 538876 2955436    656 2996796 2744  253 10766  1424 3117 1797  2  1 90  7  0
 1  1 531120 3460992    656 3188120  967    0 13036  5700 2217 1831  5  0 88  7  0
 1  1 531120 3195572    656 3369304  653    0 12858  5955 2108 1811  5  0 88  7  0

Errorlog:
110426 12:59:31 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
110426 12:59:31 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)


Hope this helps

Regards
Erkan


> 
> But, of course, it would help to know where the system ran out of  
> memory. Unfortunately that information is not yet provided, because no  
> one has ever run into this problem.
> 
> >
> >And in the error-log:
> >
> >110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)  
> >errno (12): Cannot allocate memory
> >110426 00:11:29 [Error] user_1 void* xt_malloc_ns(memory_xt.cc:156)
> >
> >Regards
> >Erkan
> >
> >
> >>
> >>On Apr 23, 2011, at 5:42 PM, erkan yanar wrote:
> >>
> >>>Moin,
> >>>Given: | 5.2.5-MariaDB-log |
> >>>    110423 18:16:22 PBXT 1.0.11-7 Pre-GA STATUS OUTPUT
> >>>
> >>>I was converting a table from xtradb -> pbxt
> >>>Because of misconfiguration the system swapped and:
> >>>110423 05:10:33 [Error] SW-mysql_data void*
> >>>xt_malloc_ns(memory_xt.cc:156) errno (12): Cannot allocate memory
> >>>
> >>>So I did it the -9 way.
> >>>Lets have a look into the errorlog then:
> >>>
> >>>110423  5:11:06 Percona XtraDB (http://www.percona.com) 1.0.15-12.5
> >>>started; log sequence number 221799126990
> >>>110423  5:11:06 [Note] Recovering after a crash using tc.log
> >>>110423  5:11:06 [Note] Starting crash recovery...
> >>>110423 05:11:07 [Note] PBXT: Recovering from 1-69, bytes to read:
> >>>6844334523
> >>>110423 05:13:17 [Note] PBXT:  1  2  3  4  5  6  7  8  9 10 11 12 13
> >>>14 15 16 17 18 19 20 21 22 23 24 25
> >>>110423 06:17:46 [Note] PBXT: 26 27 28 29 30 31 32 33 34 35 36 37 38
> >>>39 40 41 42 43 44 45 46 47 48 49 50
> >>>110423 07:38:43 [Note] PBXT: 51 52 53 54 55 56 57 58 59 60 61 62 63
> >>>64 65 66 67 68 69 70 71 72 73 74 75
> >>>110423 08:59:01 [Note] PBXT: 76 77 78 79 80 81 82 83 84 85 86 87 88
> >>>89 90 91 92 93 94 95 96 97 98 99 100
> >>>110423 10:11:00 [Note] PBXT: Recovering complete at 191-11307935,
> >>>bytes read: 6844334523
> >>>110423 10:11:00 [Note] Table pdns.records: free row count (1) has
> >>>been set to the number of rows on the list: 1
> >>>110423 10:11:00 [Note] Crash recovery finished.
> >>>
> >>>Imho quit slow .. then:
> >>>110423 10:11:02 [Note] Waiting for 'mysql_data' sweeper...
> >>>FYI: datadir := /data/mysql/mysql_data
> >>>This is the last entry. Does it mean it is still waiting?
> >>>In fact I can't access the table it just hangs doing a simple select
> >>>(limit 1)
> >>>
> >>>So doing a strace (at 23 18:23:07 ) I see :
> >>>pid  9633] <... pwrite resumed> )      = 12800
> >>>[pid  9633] pwrite(7, "\241-\1'www.5eabfd0749382558294a429f"...,
> >>>11776, 3689697280) = 11776
> >>>[pid  9633] pwrite(7, ",-\1#28fca10a02a6064013d946a93c56"..., 11776,
> >>>3707686912) = 11776
> >>>[pid  9633] pwrite(7, "\3103\1#5bee153b03d21ba03338c17bab94"...,
> >>>13312, 3721564160 <unfinished ...>
> >>>[pid  8558] <... nanosleep resumed> NULL) = 0
> >>>[pid  8558] nanosleep({0, 10000000},  <unfinished ...>
> >>>[pid  9633] <... pwrite resumed> )      = 13312
> >>>[pid  9633] pwrite(7, "\2241\1#6453b2a23d7b45ba6199a68ebfa1"...,
> >>>12800, 3745730560) = 12800
> >>>[pid  9633] pwrite(7, "N-\1(blog.6aa66275aedaeb43c6cd62b"..., 11776,
> >>>3764375552) = 11776
> >>>[pid  9633] pwrite(7, "\237,\1#243877286abd33de621d0caf4bf5"...,
> >>>11776, 3767013376 <unfinished ...>
> >>>[pid  9634] <... nanosleep resumed> NULL) = 0
> >>>[pid  9634] nanosleep({0, 10000000},  <unfinished ...>
> >>>[pid  9633] <... pwrite resumed> )      = 11776
> >>>[pid  9633] pwrite(7, "\266-\1(blog.ed01180d84db6d59f9af453"...,
> >>>11776, 3777302528 <unfinished ...>
> >>>[pid  8558] <... nanosleep resumed> NULL) = 0
> >>>[pid  8558] nanosleep({0, 10000000},  <unfinished ...>
> >>>[pid 10561] <... nanosleep resumed> NULL) = 0
> >>>[pid 10561] nanosleep({0, 100000000},  <unfinished ...>
> >>>[pid  9633] <... pwrite resumed> )      = 11776
> >>>[pid  9633] pwrite(7, "\2740\1*schock.3ce6deea5e12d77de535d"...,
> >>>12800, 3791163392 <unfinished ...>
> >>>[pid  9634] <... nanosleep resumed> NULL) = 0
> >>>[pid  9634] nanosleep({0, 10000000}, ^C <unfinished ...>
> >>>
> >>>
> >>>This is the table data. Does this means the sweeper is still
> >>>working? Why can't I access the table?
> >>>
> >>>root@localhost [pbxt]> show global variables like '%pbxt%';
> >>>+------------------------------+-------+
> >>>| Variable_name                | Value |
> >>>+------------------------------+-------+
> >>>| pbxt_auto_increment_mode     | 0     |
> >>>| pbxt_checkpoint_frequency    | 24M   |
> >>>| pbxt_data_file_grow_size     | 50M   |
> >>>| pbxt_data_log_threshold      | 256M  |
> >>>| pbxt_flush_log_at_trx_commit | 0     |
> >>>| pbxt_garbage_threshold       | 50    |
> >>>| pbxt_index_cache_size        | 3G    |
> >>>| pbxt_log_buffer_size         | 512M  |
> >>>| pbxt_log_cache_size          | 1G    |
> >>>| pbxt_log_file_count          | 10    |
> >>>| pbxt_log_file_threshold      | 32MB  |
> >>>| pbxt_max_threads             | 2007  |
> >>>| pbxt_offline_log_function    | 0     |
> >>>| pbxt_record_cache_size       | 3G    |
> >>>| pbxt_row_file_grow_size      | 10M   |
> >>>| pbxt_support_xa              | ON    |
> >>>| pbxt_sweeper_priority        | 2     |
> >>>| pbxt_transaction_buffer_size | 8M    |
> >>>+------------------------------+-------+
> >>>
> >>>select * from statistics;
> >>>
> >>>+----+-----------------------+-------------+
> >>>|  1 | Current Time          |  1303576276 |
> >>>|  2 | Time Since Last Call  | 47879149862 |
> >>>|  3 | Commit Count          |           0 |
> >>>|  4 | Rollback Count        |           0 |
> >>>|  5 | Wait for Xact Count   |           0 |
> >>>|  6 | Dirty Xact Count      |           4 |
> >>>|  7 | Read Statements       |           0 |
> >>>|  8 | Write Statements      |           0 |
> >>>|  9 | Record Bytes Read     |  3420546564 |
> >>>| 10 | Record Bytes Written  |   442544326 |
> >>>| 11 | Record File Flushes   |           9 |
> >>>| 12 | Record Flush Time     |    30046591 |
> >>>| 13 | Record Cache Hits     |    28554990 |
> >>>| 14 | Record Cache Misses   |      234060 |
> >>>| 15 | Record Cache Frees    |      146298 |
> >>>| 16 | Record Cache Usage    |  2898603144 |
> >>>| 17 | Index Bytes Read      |  4162967552 |
> >>>| 18 | Index Bytes Written   |  2231254528 |
> >>>| 19 | Index File Flushes    |         159 |
> >>>| 20 | Index Flush Time      |  4939549599 |
> >>>| 21 | Index Cache Hits      |   744372857 |
> >>>| 22 | Index Cache Misses    |     1826952 |
> >>>| 23 | Index Cache Usage     |  3221225472 |
> >>>| 24 | Index Log Bytes In    |  3944771819 |
> >>>| 25 | Index Log Bytes Out   |   681640564 |
> >>>| 26 | Index Log File Syncs  |         639 |
> >>>| 27 | Index Log Sync Time   |  2048645036 |
> >>>| 28 | Xact Log Bytes In     |  2523484015 |
> >>>| 29 | Xact Log Bytes Out    |   873305600 |
> >>>| 30 | Xact Log File Syncs   |       10315 |
> >>>| 31 | Xact Log Sync Time    |  3357323411 |
> >>>| 32 | Xact Log Cache Hits   |       20337 |
> >>>| 33 | Xact Log Cache Misses |      209087 |
> >>>| 34 | Xact Log Cache Usage  |  1073708456 |
> >>>| 35 | Data Log Bytes In     |           0 |
> >>>| 36 | Data Log Bytes Out    |           0 |
> >>>| 37 | Data Log File Syncs   |           0 |
> >>>| 38 | Data Log Sync Time    |           0 |
> >>>| 39 | Bytes to Checkpoint   |  6818078139 |
> >>>| 40 | Log Bytes to Write    |   125051695 |
> >>>| 41 | Log Bytes to Sweep    |  6818078139 |
> >>>| 42 | Sweeper Wait on Xact  |           0 |
> >>>| 43 | Index Scan Count      |           1 |
> >>>| 44 | Table Scan Count      |           0 |
> >>>| 45 | Select Row Count      |           1 |
> >>>| 46 | Insert Row Count      |           0 |
> >>>| 47 | Update Row Count      |           0 |
> >>>| 48 | Delete Row Count      |           0 |
> >>>+----+-----------------------+-------------+
> >>>
> >>>So why can't I access the table?
> >>>What did I wrong?
> >>>
> >>>Regards
> >>>Erkan
> >>>
> >>>-- 
> >>>über den grenzen muß die freiheit wohl wolkenlos sein
> >>>
> >>>_______________________________________________
> >>>Mailing list: https://launchpad.net/~pbxt-discuss
> >>>Post to     : pbxt-discuss@xxxxxxxxxxxxxxxxxxx
> >>>Unsubscribe : https://launchpad.net/~pbxt-discuss
> >>>More help   : https://help.launchpad.net/ListHelp
> >>
> >
> >-- 
> >über den grenzen muß die freiheit wohl wolkenlos sein
> 

-- 
über den grenzen muß die freiheit wohl wolkenlos sein 


References