← Back to team overview

dhis2-devs team mailing list archive

Re: Problems with detailed metadata export..

 

Hi Bob
In this system, there are only 277 users, but about 18000 data
elements. This problem surfaced after the upgrade to 2.17 and after
the injection of about 14,000 new data elements into the system.
Before, it was fine.

Hope it can be solved quickly, as even the loading of data elements in
the analysis tools is excruciatingly slow.

Best regards,
Jason


On Fri, Jan 23, 2015 at 1:10 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> Hi Jason
>
> Just spent a little time looking further at this. I think the stack dump has
> the answer after all.
>
> I believe the inefficiency is coming from the fact that for each object
> access there is a call to getAccess( T object ) in DefaultAclService.  The
> trouble is this in turn is calling currentUserService.getCurrentUser() each
> time.  You wouldn't think this is such a load but each time I look at a
> snapshot of the thread running it is busy in that method so its obviously
> significant.  I guess it is doing a string comparison search each time
> (userService.getUserCredentialsByUsername( username )), so with sufficient
> number of users in the system its accounting for the bulk of the time - a
> lot of users multiplied by a lot of dataelements and its going to be more
> problematic.  Do you have a lot of users?
>
> The solution has got to be to call  currentUserService.getCurrentUser() just
> once, and then use the getAccess( T object, User user ) method instead.  I'm
> guessing that will reduce the computation load for returning a list of
> dataelements hugely.
>
> The trouble is I am not 100% of where that getAccess call is coming from in
> this object sharing code.
>
> Morten will know better but I guess everything starts in the
> AbstractCrudController.  Given that currentUserService.getCurrentUser() is
> not such a lightweight operation we clearly need to be much more
> conservative in how often we use it.  Currently it is peppered all ove the
> place.  Can't we do this just once per session?  And from there on make sure
> to use getAccess( T object, User user ) ?
>
> I'd bet a pint of best that this would reduce the time for that call to list
> dataelements by 90%.
>
> Bob
>
> On 22 January 2015 at 10:58, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> Sorry Jason.  A quick look through isnt shedding light on me :-(
>>
>> Is this a test environment or did you run that against a running server
>> with other users?  There doesn't seem to be any tight loop like we saw with
>> the paging bug, so I guess what we have here is not so much as a logical
>> error as some sort of gross inefficiency somewhere.
>> One thing I notice is that almost all RUNNABLE threads are somehow engaged
>> with operations relating to user service.  getCurrentUser,
>> getUserCredentials etc.  Given that this is a static snapshot in time of
>> where exactly each thread is at, it seems odd that so many are caught up
>> chasing users instead of doing real work.  This might be a red herring or a
>> pointer to that gross inefficiency - it might be that if the user
>> credentials are being retrieved and tested each time the user is touching a
>> metadata object (and it it is maybe touching many of them with this metadata
>> export) that that is leading to excessive activity.  Do you have a vast
>> number of users in the system by any chance?
>>
>> On 22 January 2015 at 08:09, Jason Pickering <jason.p.pickering@xxxxxxxxx>
>> wrote:
>>>
>>> OK, is this better?  Stack attached.
>>>
>>> Tasks: 178 total,   1 running, 177 sleeping,   0 stopped,   0 zombie
>>> Cpu(s): 99.1%us,  0.1%sy,  0.0%ni,  0.4%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.3%st
>>> Mem:  16433100k total, 16330904k used,   102196k free,    71828k buffers
>>> Swap:   262140k total,    25108k used,   237032k free, 11411252k cached
>>>
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 28091 dhis      20   0 12.0g 3.6g 5916 S  796 23.0 328:48.52 java
>>>
>>>
>>> Command used was
>>>
>>> curl --verbose
>>> "http://XXXX/dhis/api/dataElements.json?links=false&paging=false"; -u
>>> XXXX:XXXX
>>>
>>> Regards,
>>> Jason
>>>
>>>
>>> On Wed, Jan 21, 2015 at 11:05 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>> wrote:
>>>>
>>>> Jason the cpu is not really "through the roof" here.  Can you try
>>>> another dump (while the cpu is through the roof)?
>>>>
>>>> The interesting threads to look at are the http-bio-8080 threads, which
>>>> are the tomcat worker threads.  If the server is not being accessed by
>>>> dozens of users most of them should be sitting in a WAITING state.  So then
>>>> it is interesting to look at those which are marked as RUNNABLE.  These are
>>>> the ones which are trying as hard as they can to get the attention of the
>>>> scheduler in order to chew cpu cycles.
>>>>
>>>> So if your process has become cpu bound then sometimes its possible to
>>>> figure out which thread it is and what it is trying to do.  This attached
>>>> stack dump doesn't reveal much unfortunately, but then again the cpu (from
>>>> top) is showing 62% idle.  So try and get the cpu burning up again and grab
>>>> 2 or 3 jstack samples.  If it is cpu bound then it might show.
>>>>
>>>>
>>>>
>>>> On 21 January 2015 at 19:28, Jason Pickering
>>>> <jason.p.pickering@xxxxxxxxx> wrote:
>>>>>
>>>>> Not really sure what is going on, but here is the stack dump. Maybe it
>>>>> will help.
>>>>>
>>>>> And some more info below.
>>>>>
>>>>>
>>>>> top - 21:26:40 up 477 days, 16:20,  2 users,  load average: 2.99, 2.05,
>>>>> 1.03
>>>>> Tasks: 171 total,   1 running, 170 sleeping,   0 stopped,   0 zombie
>>>>> Cpu(s): 38.3%us,  0.0%sy,  0.0%ni, 61.6%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>>> 0.1%st
>>>>> Mem:  16433100k total, 15416512k used,  1016588k free,    86276k
>>>>> buffers
>>>>> Swap:   262140k total,    18872k used,   243268k free, 12412748k cached
>>>>>
>>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>> 28091 dhis      20   0 11.8g 2.1g  13m S  307 13.2  29:39.99 java
>>>>>
>>>>>
>>>>> Regards,
>>>>> Jason
>>>>>
>>>>>
>>>>> On Wed, Jan 21, 2015 at 7:42 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> If it is running for 15 minutes then almost certainly it is stuck in
>>>>>> some kind of loop.  Find the pid of the running jvm and also the owner of
>>>>>> the java process.  Then run:
>>>>>>
>>>>>> sudo -u <process owner> jstack <pid>  > jstack.txt
>>>>>>
>>>>>> That might give an idea of what the threads are up to ...
>>>>>>
>>>>>> On 21 January 2015 at 18:26, Morten Olav Hansen <mortenoh@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> 18k is not really that much.. is it giving you any kind of errors?
>>>>>>> given the SL demo db... we can easily export much more objects that that..
>>>>>>>
>>>>>>> --
>>>>>>> Morten
>>>>>>>
>>>>>>> On Thu, Jan 22, 2015 at 12:42 AM, Bob Jolliffe
>>>>>>> <bobjolliffe@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Can you try and get a jstack dump while it is busy chomping cpu.
>>>>>>>> That will probably indicate where in the code it is spinning.  There was a
>>>>>>>> problem earlier with a weakness in the paging code but should have been
>>>>>>>> fixed by that build.
>>>>>>>>
>>>>>>>> On 21 January 2015 at 15:00, Jason Pickering
>>>>>>>> <jason.p.pickering@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In one DHIS2 instance, we have a rather large number of data
>>>>>>>>> elements, more than 18,000. During the detailed metadata export, a request
>>>>>>>>> is made to "api/dataElements.json?paging=false&links=false". This throws the
>>>>>>>>> CPU of the server through the roof, with the operation taking many minutes
>>>>>>>>> (has been running for 15 minutes now with no response).There are no errors
>>>>>>>>> in the logs or the browser. Build revision is not known, but  it is 2.17
>>>>>>>>> from from 2015-01-17 02:03.
>>>>>>>>>
>>>>>>>>> Any idea what might be going on here?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason P. Pickering
>>>>>>>>> email: jason.p.pickering@xxxxxxxxx
>>>>>>>>> tel:+46764147049
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason P. Pickering
>>>>> email: jason.p.pickering@xxxxxxxxx
>>>>> tel:+46764147049
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jason P. Pickering
>>> email: jason.p.pickering@xxxxxxxxx
>>> tel:+46764147049
>>
>>
>



-- 
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049


Follow ups

References