← Back to team overview

pbxt-discuss team mailing list archive

Re: PBXT "hangs" sometimes but regularly

 

Hi Vlad,

There are a number of things which are strange about this test.

Firstly the flush takes 25 minutes. From the xtstat dump we can see that the main flushing on the index file stops at 101202 13:00:13. This is correct according to the "select now()" below (12:35:46 + 24 minutes).

This may not be the problem, but rather this:

1. Checkpointing does not seem to be happening, because to-chkpt is at 14.9G, and seems to remain there.

The lack of checkpoint may be explained be second bit of strange activity:

2. Lots of little flusing on the ilog, but no update to the index data:

ind ind ind ind ind ind ind ind ind ilog ilog ilog ilog ilog in out rd.ms wr.ms syncs/ms hits miss %use %dty in out rd.ms wr.ms syncs/ms 0 0 0 0 0/0 0 0 100 0.0 0 54 0 0 6/3 0 0 0 0 0/0 0 0 100 0.0 0 36 0 0 4/2 0 0 0 0 0/0 0 0 100 0.0 0 54 0 0 6/3

This is very strange and need to be investigated. Basically, the ilog should be written, then the ilog is read, and the index file is written. That normally concludes the flushing of the index.

It looks like the checkpointer thread is in a loop here, and does not get to the end of flushing the index, and so it does not get to the end of the checkpoint.

@Vlad or Erkan, could you please report this as a bug, with the information we have so far. Next we will have to see if we can repeat this strange index "non-flush" behavior. Maybe using a similar test, but a little less data.

On Dec 2, 2010, at 12:31 PM, erkan yanar wrote:

Hi Paul,

I repeatet it for you;)

mysql> select now(); truncate rein;
+---------------------+
| now()               |
+---------------------+
| 2010-12-02 12:35:46 |
+---------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (24 min 49.47 sec)

So it was successfull.

The xtstat-ouptut is attached. So maybe yo see something that might helpfull.

Regards
Erkan

P.S.
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 0.00 666.00 0.00 9040.00 13.57 186.48 327.86 1.50 100.00


On Thu, Dec 02, 2010 at 09:13:48AM +0100, Paul McCullagh wrote:
Hi Erkan,

Yes. Strange pause. But I cannot see what the engine is doing in that
time.

There must be activity in some missing statistics, for example,
record, row or index file writing.

Unfortunately, when displaying all statistics, the output is very
wide. What xtstat needs is a feature which logs all statistics but
just displays what you select.

Then, if you see a problem that does not show up in the displayed
statistics you can still go back and check the others at the time.

On Nov 30, 2010, at 3:05 AM, erkan yanar wrote:

Moin Paul

On Mon, Nov 29, 2010 at 10:23:38AM +0100, Paul McCullagh wrote:


The problem here is the index cache is full, and the thread which is flushing the index is hanging in sync (i.e. flush to disk). The rest of the system is hanging because it needs a clean page in the index.

I did quite a bit of work to improve this in 1.1 and 1.5.

But in general, if the index cache is really large, it need to be
flushed more regularly to ensure that too many changes do not
accumulate. Checkpointing should do this, but there is a bug in 1.0.

You checkpoint frequency is 128M, xtstat says to-chkpt = 1049M. This
means the checkpoint has not occurred as often as he should.

This problem should be fixed in 1.1.


While 1.5.02 Beta handles IO a lot better than 1.0.11-7, you may be
interested in that checkpointing issue:
pbxt_checkpoint_frequency    = 67108864

        time  time  xact  xact  xact  xact  stat  stat  ilog
ilog  ilog  ilog     ilog  xlog  xlog  xlog  xlog  xlog  xlog  xlog
xlog    to    to    to sweep   row   row   row   row
        curr  msec commt rollb waits dirty  read write    in   out
rd.ms wr.ms syncs/ms    in   out rd.ms wr.ms syncs sy.ms  hits  miss
chkpt write sweep waits   sel   ins   upd   del
101130 03:45:13  5001  198t     0     0 10.9t     0  198t  320M
0    83     0      0/0     0 21.1M     0    12     9  3576 69.6t
0  813M 9833K     0    13     0  198t     0     0
101130 03:45:18  5001  9501     0     0  5148     0  9501     0
0     0     0      0/0     0     0     0     0     0  5001     0
0  813M 9833K     0     1     0  9501     0     0
101130 03:45:23  5001     0     0     0  5148     0     0     0
0     0     0      0/0     0     0     0     0     0  5001     0
0  813M 9833K     0     0     0     0     0     0
101130 03:45:28  5001 80.1t     0     0 13.8t     0 80.1t     0
0     0     0      0/0     0 8192K     0     4     1  4993     1
0  814M 17.6M  115M     5     0 80.1t     0     0
101130 03:45:33  5001  179t     0     0 10.3t     0  179t     0
0     0     0      0/0 1244K 19.7M     0     8    20  4395  119t
39  837M 6772K     0    13     0  179t     0     0
101130 03:45:38  5001 12.7t     0     0  9084     0 12.7t     0
0     0     0      0/0     0     0     0     0     0  5001  5507
0  837M 3756K     0     1     0 12.7t     0     0
101130 03:45:43  5001 78.4t     0     0  8907     0 78.4t     0
0     0     0      0/0     0 8192K     0     3     1  4994 18.1t
0  840M 8192K     0     5     0 78.4t     0     0
101130 03:45:48  5001  156t     0     0  4410     0  156t     0
0     0     0      0/0     0 16.0M     0     8     2  4985 67.5t
0  856M 8192K     0    11     0  156t     0     0
101130 03:45:53  5001 78.9t     0     0  7213     0 78.9t     0
0     0     0      0/0     0 8192K     0     6     1  4991     1
0  864M 16.0M     0     7     0 78.9t     0     0


Best Regards
Erkan

--
über den grenzen muß die freiheit wohl wolkenlos sein


--
über den grenzen muß die freiheit wohl wolkenlos sein
<xtstat-truncate.log.gz>




Follow ups

References