← Back to team overview

dhis2-devs team mailing list archive

Re: Importing large data files - server settings

 

Hi

FYI, I've just done another import of a 120mb xml-zip file - upload took
around 30 minutes and actual import around 18 minutes.

Regards
Calle

On 3 December 2015 at 20:34, Calle Hedberg <calle.hedberg@xxxxxxxxx> wrote:

> Hi
>
> By the way, using CSV should reduce file size and speed up the import, but
> there seems to be a bug somewhere in the CSV export: the total number of
> records exported was slightly higher than the actual number of data records
> selected (we are talking 3-4 duplicated records out of 8 million), and as a
> result the imports crashes as soon as the first duplicate record is
> encountered. This does not happen when exporting xml. I did drill into one
> of these duplicates and found it to be the "first" orgunit in the
> alphabetical list and also the "earliest" period in the source system.
>
> Regards
> Calle
>
>
>
> On 3 December 2015 at 20:28, Calle Hedberg <calle.hedberg@xxxxxxxxx>
> wrote:
>
>> Jason,
>>
>> I fully understand that, and it's only done infrequently (and outside
>> office hours if it's a production instance).
>>
>> The 75MB xml-zip file that I just uploaded had only around 8 mill records
>> - the upload took 20 minutes and the actual import around 13 minutes. No
>> problemo...
>>
>> Regards
>> Calle
>>
>> On 3 December 2015 at 20:12, Jason Pickering <jason.p.pickering@xxxxxxxxx
>> > wrote:
>>
>>> Hi Calle,
>>>
>>> I think you  would want to be very careful with this. If you change the
>>> maximum file size to 200 MB, this could potentially be an unzipped file of
>>> several (tens) of gigabytes , or several million rows of data. This could
>>> put significant stress on the server, and is the entire point of the
>>> restriction really, to prevent huge uploads from being imported.  If you
>>> have limited who can upload data to the server, it may be OK, but just be
>>> aware that a zip file of 200 MB, can be much much larger (by an order or
>>> magnitude or two), and result in a very long process.
>>>
>>> Regards,
>>>
>>> Jason
>>>
>>>
>>>
>>> On Thu, Dec 3, 2015, 18:52 Calle Hedberg <calle.hedberg@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> We are now standardising on 200M
>>>>
>>>> Regards
>>>> Calle
>>>>
>>>> On 3 December 2015 at 17:38, Alan Ivey <aivey@xxxxxxxxxxxxxx> wrote:
>>>>
>>>>> Also, it's worth noting that the default for "client_max_body_size" is
>>>>> only 1 MB:
>>>>> http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size .
>>>>> It will need to be increased on most deployments of DHIS2.
>>>>>
>>>>> On Thu, Dec 3, 2015 at 9:09 AM, Lars Helge Øverland <
>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>
>>>>>> If you are indeed using nginx, the "client_max_body_size" directive
>>>>>> is part of the installation docs example, can be increased as appropriate:
>>>>>>
>>>>>>
>>>>>> https://www.dhis2.org/doc/snapshot/en/implementer/html/ch08s04.html#d5e575
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 3, 2015 at 3:02 PM, Lars Helge Øverland <
>>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi Calle,
>>>>>>>
>>>>>>> I think this depends on the web server configuration. One can
>>>>>>> configure max file size for uploads in both the proxy (nginx, apache) and
>>>>>>> servlet container (tomcat
>>>>>>> <http://stackoverflow.com/questions/2947683/httprequest-maximum-allowable-size-in-tomcat>
>>>>>>> ).
>>>>>>>
>>>>>>> On nginx
>>>>>>> <http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size>
>>>>>>> the directive is:
>>>>>>>
>>>>>>> client_max_body_size 200M;
>>>>>>>
>>>>>>>
>>>>>>> regards,
>>>>>>>
>>>>>>> Lars
>>>>>>>
>>>>>>> On Thu, Dec 3, 2015 at 2:44 PM, Calle Hedberg <
>>>>>>> calle.hedberg@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have found that there is a limitation in file size when importing
>>>>>>>> data into our SERVER-based instances, while I have found no equivalent
>>>>>>>> limitation when importing large data files (e.g. XML format) into an
>>>>>>>> equivalent localhost instance. A few key aspects:
>>>>>>>>
>>>>>>>> 1. Both the server and localhost are running the latest version of
>>>>>>>> 2.20
>>>>>>>> 2. Both are running java 8 64 bits and tomcat 8.026 or 8.029
>>>>>>>> 3. Localhost tomcat has 4GB (min) and 8GB (max) allocated
>>>>>>>> 4. The server instance (running under Ubuntu Linux) has ~5.3GB RAM,
>>>>>>>> but increasing/decreasing RAM has no effect on the issue.
>>>>>>>>
>>>>>>>> The problem is related to the upload process.
>>>>>>>>
>>>>>>>> Example:
>>>>>>>> When importing a 75MB data file with around 8 mill data records
>>>>>>>> (XML, zipped) on localhost, the initial upload step is almost instantaneous
>>>>>>>> (2-3 seconds) and then the actual import starts (takes about 10 minutes
>>>>>>>> overall).
>>>>>>>>
>>>>>>>> When importing the same file to the equivalent instance on the
>>>>>>>> server, it takes around 30 seconds to reach 2% upload and then the upload
>>>>>>>> re-starts at 0% - this goes on ad infinitum.
>>>>>>>>
>>>>>>>> Smaller files - e.g. 10-20MB - will maybe import 15-20%, then reset
>>>>>>>> to 0% and start over.
>>>>>>>>
>>>>>>>> It seems to me that the problem is related to the DHIS2 web server
>>>>>>>> configuration, it do not allow sufficient time for the upload to happen.
>>>>>>>>
>>>>>>>> Any indications of how to fix this would be appreciated. While
>>>>>>>> dumping the server instance into localhost, import the data, and then
>>>>>>>> upload/restore the instance does work, it is a pain in the b....
>>>>>>>>
>>>>>>>> Regards from a sunny Cape Town
>>>>>>>> Calle
>>>>>>>>
>>>>>>>> *******************************************
>>>>>>>>
>>>>>>>> Calle Hedberg
>>>>>>>>
>>>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>>>
>>>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>>>
>>>>>>>> Cell: +27-82-853-5352
>>>>>>>>
>>>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>>>
>>>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>>>
>>>>>>>> Skype: calle_hedberg
>>>>>>>>
>>>>>>>> *******************************************
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lars Helge Øverland
>>>>>>> Lead developer, DHIS 2
>>>>>>> University of Oslo
>>>>>>> Skype: larshelgeoverland
>>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lars Helge Øverland
>>>>>> Lead developer, DHIS 2
>>>>>> University of Oslo
>>>>>> Skype: larshelgeoverland
>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *******************************************
>>>>
>>>> Calle Hedberg
>>>>
>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>
>>>> Tel/fax (home): +27-21-685-6472
>>>>
>>>> Cell: +27-82-853-5352
>>>>
>>>> Iridium SatPhone: +8816-315-19119
>>>>
>>>> Email: calle.hedberg@xxxxxxxxx
>>>>
>>>> Skype: calle_hedberg
>>>>
>>>> *******************************************
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
>>
>>
>> --
>>
>> *******************************************
>>
>> Calle Hedberg
>>
>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>
>> Tel/fax (home): +27-21-685-6472
>>
>> Cell: +27-82-853-5352
>>
>> Iridium SatPhone: +8816-315-19119
>>
>> Email: calle.hedberg@xxxxxxxxx
>>
>> Skype: calle_hedberg
>>
>> *******************************************
>>
>>
>
>
> --
>
> *******************************************
>
> Calle Hedberg
>
> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>
> Tel/fax (home): +27-21-685-6472
>
> Cell: +27-82-853-5352
>
> Iridium SatPhone: +8816-315-19119
>
> Email: calle.hedberg@xxxxxxxxx
>
> Skype: calle_hedberg
>
> *******************************************
>
>


-- 

*******************************************

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@xxxxxxxxx

Skype: calle_hedberg

*******************************************

Follow ups

References