← Back to team overview

pbxt-discuss team mailing list archive

Re: PBXT "hangs" sometimes but regularly

 

Hi Vlad, Hi Paul,

maybe it is a misleading observation. I never had any checkpointing "delay" with
tests having pbxt_flush_log_at_trx_commit=1 instead
pbxt_flush_log_at_trx_commit=2. 

Regards
Erkan

On Thu, Dec 02, 2010 at 03:31:34PM +0100, Paul McCullagh wrote:
> Hi Vlad,
> 
> There are a number of things which are strange about this test.
> 
> Firstly the flush takes 25 minutes. From the xtstat dump we can see  
> that the main flushing on the index file stops at 101202 13:00:13.  
> This is correct according to the "select now()" below (12:35:46 + 24  
> minutes).
> 
> This may not be the problem, but rather this:
> 
> 1. Checkpointing does not seem to be happening, because to-chkpt is at  
> 14.9G, and seems to remain there.
> 
> The lack of checkpoint may be explained be second bit of strange  
> activity:
> 
> 2. Lots of little flusing on the ilog, but no update to the index data:
> 
>    ind   ind   ind   ind      ind   ind   ind  ind  ind  ilog  ilog   
> ilog  ilog     ilog
>     in   out rd.ms wr.ms syncs/ms  hits  miss %use %dty    in   out  
> rd.ms wr.ms syncs/ms
>      0     0     0     0      0/0     0     0  100  0.0     0     
> 54     0     0      6/3
>      0     0     0     0      0/0     0     0  100  0.0     0     
> 36     0     0      4/2
>      0     0     0     0      0/0     0     0  100  0.0     0     
> 54     0     0      6/3
> 
> This is very strange and need to be investigated. Basically, the ilog  
> should be written, then the ilog is read, and the index file is  
> written. That normally concludes the flushing of the index.
> 
> It looks like the checkpointer thread is in a loop here, and does not  
> get to the end of flushing the index, and so it does not get to the  
> end of the checkpoint.
> 
> @Vlad or Erkan, could you please report this as a bug, with the  
> information we have so far. Next we will have to see if we can repeat  
> this strange index "non-flush" behavior. Maybe using a similar test,  
> but a little less data.
> 
> On Dec 2, 2010, at 12:31 PM, erkan yanar wrote:
> 
> >Hi Paul,
> >
> >I repeatet it for you;)
> >
> >mysql> select now(); truncate rein;
> >+---------------------+
> >| now()               |
> >+---------------------+
> >| 2010-12-02 12:35:46 |
> >+---------------------+
> >1 row in set (0.00 sec)
> >
> >Query OK, 0 rows affected (24 min 49.47 sec)
> >
> >So it was successfull.
> >
> >The xtstat-ouptut is attached. So maybe yo see something that might  
> >helpfull.
> >
> >Regards
> >Erkan
> >
> >P.S.
> >Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s  
> >avgrq-sz avgqu-sz   await  svctm  %util
> >dm-0              0.00     0.00    0.00  666.00     0.00  9040.00     
> >13.57   186.48  327.86   1.50 100.00
> >
> >
> >On Thu, Dec 02, 2010 at 09:13:48AM +0100, Paul McCullagh wrote:
> >>Hi Erkan,
> >>
> >>Yes. Strange pause. But I cannot see what the engine is doing in that
> >>time.
> >>
> >>There must be activity in some missing statistics, for example,
> >>record, row or index file writing.
> >>
> >>Unfortunately, when displaying all statistics, the output is very
> >>wide. What xtstat needs is a feature which logs all statistics but
> >>just displays what you select.
> >>
> >>Then, if you see a problem that does not show up in the displayed
> >>statistics you can still go back and check the others at the time.
> >>
> >>On Nov 30, 2010, at 3:05 AM, erkan yanar wrote:
> >>
> >>>Moin Paul
> >>>
> >>>On Mon, Nov 29, 2010 at 10:23:38AM +0100, Paul McCullagh wrote:
> >>>>
> >>>>
> >>>>The problem here is the index cache is full, and the thread which  
> >>>>is
> >>>>flushing the index is hanging in sync (i.e. flush to disk). The  
> >>>>rest
> >>>>of the system is hanging because it needs a clean page in the  
> >>>>index.
> >>>>
> >>>>I did quite a bit of work to improve this in 1.1 and 1.5.
> >>>>
> >>>>But in general, if the index cache is really large, it need to be
> >>>>flushed more regularly to ensure that too many changes do not
> >>>>accumulate. Checkpointing should do this, but there is a bug in  
> >>>>1.0.
> >>>>
> >>>>You checkpoint frequency is 128M, xtstat says to-chkpt = 1049M.  
> >>>>This
> >>>>means the checkpoint has not occurred as often as he should.
> >>>>
> >>>>This problem should be fixed in 1.1.
> >>>>
> >>>
> >>>While 1.5.02 Beta handles IO a lot better than 1.0.11-7, you may be
> >>>interested in that checkpointing issue:
> >>>pbxt_checkpoint_frequency    = 67108864
> >>>
> >>>        time  time  xact  xact  xact  xact  stat  stat  ilog
> >>>ilog  ilog  ilog     ilog  xlog  xlog  xlog  xlog  xlog  xlog  xlog
> >>>xlog    to    to    to sweep   row   row   row   row
> >>>        curr  msec commt rollb waits dirty  read write    in   out
> >>>rd.ms wr.ms syncs/ms    in   out rd.ms wr.ms syncs sy.ms  hits  miss
> >>>chkpt write sweep waits   sel   ins   upd   del
> >>>101130 03:45:13  5001  198t     0     0 10.9t     0  198t  320M
> >>>0    83     0      0/0     0 21.1M     0    12     9  3576 69.6t
> >>>0  813M 9833K     0    13     0  198t     0     0
> >>>101130 03:45:18  5001  9501     0     0  5148     0  9501     0
> >>>0     0     0      0/0     0     0     0     0     0  5001     0
> >>>0  813M 9833K     0     1     0  9501     0     0
> >>>101130 03:45:23  5001     0     0     0  5148     0     0     0
> >>>0     0     0      0/0     0     0     0     0     0  5001     0
> >>>0  813M 9833K     0     0     0     0     0     0
> >>>101130 03:45:28  5001 80.1t     0     0 13.8t     0 80.1t     0
> >>>0     0     0      0/0     0 8192K     0     4     1  4993     1
> >>>0  814M 17.6M  115M     5     0 80.1t     0     0
> >>>101130 03:45:33  5001  179t     0     0 10.3t     0  179t     0
> >>>0     0     0      0/0 1244K 19.7M     0     8    20  4395  119t
> >>>39  837M 6772K     0    13     0  179t     0     0
> >>>101130 03:45:38  5001 12.7t     0     0  9084     0 12.7t     0
> >>>0     0     0      0/0     0     0     0     0     0  5001  5507
> >>>0  837M 3756K     0     1     0 12.7t     0     0
> >>>101130 03:45:43  5001 78.4t     0     0  8907     0 78.4t     0
> >>>0     0     0      0/0     0 8192K     0     3     1  4994 18.1t
> >>>0  840M 8192K     0     5     0 78.4t     0     0
> >>>101130 03:45:48  5001  156t     0     0  4410     0  156t     0
> >>>0     0     0      0/0     0 16.0M     0     8     2  4985 67.5t
> >>>0  856M 8192K     0    11     0  156t     0     0
> >>>101130 03:45:53  5001 78.9t     0     0  7213     0 78.9t     0
> >>>0     0     0      0/0     0 8192K     0     6     1  4991     1
> >>>0  864M 16.0M     0     7     0 78.9t     0     0
> >>>
> >>>
> >>>Best Regards
> >>>Erkan
> >>>
> >>>-- 
> >>>über den grenzen muß die freiheit wohl wolkenlos sein
> >>
> >
> >-- 
> >über den grenzen muß die freiheit wohl wolkenlos sein
> ><xtstat-truncate.log.gz>
> 

-- 
über den grenzen muß die freiheit wohl wolkenlos sein 



References