← Back to team overview

dhis2-devs team mailing list archive

Re: Importing large data files - server settings

 

Hi

By the way, using CSV should reduce file size and speed up the import, but
there seems to be a bug somewhere in the CSV export: the total number of
records exported was slightly higher than the actual number of data records
selected (we are talking 3-4 duplicated records out of 8 million), and as a
result the imports crashes as soon as the first duplicate record is
encountered. This does not happen when exporting xml. I did drill into one
of these duplicates and found it to be the "first" orgunit in the
alphabetical list and also the "earliest" period in the source system.

Regards
Calle



On 3 December 2015 at 20:28, Calle Hedberg <calle.hedberg@xxxxxxxxx> wrote:

> Jason,
>
> I fully understand that, and it's only done infrequently (and outside
> office hours if it's a production instance).
>
> The 75MB xml-zip file that I just uploaded had only around 8 mill records
> - the upload took 20 minutes and the actual import around 13 minutes. No
> problemo...
>
> Regards
> Calle
>
> On 3 December 2015 at 20:12, Jason Pickering <jason.p.pickering@xxxxxxxxx>
> wrote:
>
>> Hi Calle,
>>
>> I think you  would want to be very careful with this. If you change the
>> maximum file size to 200 MB, this could potentially be an unzipped file of
>> several (tens) of gigabytes , or several million rows of data. This could
>> put significant stress on the server, and is the entire point of the
>> restriction really, to prevent huge uploads from being imported.  If you
>> have limited who can upload data to the server, it may be OK, but just be
>> aware that a zip file of 200 MB, can be much much larger (by an order or
>> magnitude or two), and result in a very long process.
>>
>> Regards,
>>
>> Jason
>>
>>
>>
>> On Thu, Dec 3, 2015, 18:52 Calle Hedberg <calle.hedberg@xxxxxxxxx> wrote:
>>
>>> Hi
>>>
>>> We are now standardising on 200M
>>>
>>> Regards
>>> Calle
>>>
>>> On 3 December 2015 at 17:38, Alan Ivey <aivey@xxxxxxxxxxxxxx> wrote:
>>>
>>>> Also, it's worth noting that the default for "client_max_body_size" is
>>>> only 1 MB:
>>>> http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size .
>>>> It will need to be increased on most deployments of DHIS2.
>>>>
>>>> On Thu, Dec 3, 2015 at 9:09 AM, Lars Helge Øverland <
>>>> larshelge@xxxxxxxxx> wrote:
>>>>
>>>>> If you are indeed using nginx, the "client_max_body_size" directive
>>>>> is part of the installation docs example, can be increased as appropriate:
>>>>>
>>>>>
>>>>> https://www.dhis2.org/doc/snapshot/en/implementer/html/ch08s04.html#d5e575
>>>>>
>>>>>
>>>>>
>>>>> Lars
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 3, 2015 at 3:02 PM, Lars Helge Øverland <
>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>
>>>>>> Hi Calle,
>>>>>>
>>>>>> I think this depends on the web server configuration. One can
>>>>>> configure max file size for uploads in both the proxy (nginx, apache) and
>>>>>> servlet container (tomcat
>>>>>> <http://stackoverflow.com/questions/2947683/httprequest-maximum-allowable-size-in-tomcat>
>>>>>> ).
>>>>>>
>>>>>> On nginx
>>>>>> <http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size>
>>>>>> the directive is:
>>>>>>
>>>>>> client_max_body_size 200M;
>>>>>>
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Thu, Dec 3, 2015 at 2:44 PM, Calle Hedberg <
>>>>>> calle.hedberg@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have found that there is a limitation in file size when importing
>>>>>>> data into our SERVER-based instances, while I have found no equivalent
>>>>>>> limitation when importing large data files (e.g. XML format) into an
>>>>>>> equivalent localhost instance. A few key aspects:
>>>>>>>
>>>>>>> 1. Both the server and localhost are running the latest version of
>>>>>>> 2.20
>>>>>>> 2. Both are running java 8 64 bits and tomcat 8.026 or 8.029
>>>>>>> 3. Localhost tomcat has 4GB (min) and 8GB (max) allocated
>>>>>>> 4. The server instance (running under Ubuntu Linux) has ~5.3GB RAM,
>>>>>>> but increasing/decreasing RAM has no effect on the issue.
>>>>>>>
>>>>>>> The problem is related to the upload process.
>>>>>>>
>>>>>>> Example:
>>>>>>> When importing a 75MB data file with around 8 mill data records
>>>>>>> (XML, zipped) on localhost, the initial upload step is almost instantaneous
>>>>>>> (2-3 seconds) and then the actual import starts (takes about 10 minutes
>>>>>>> overall).
>>>>>>>
>>>>>>> When importing the same file to the equivalent instance on the
>>>>>>> server, it takes around 30 seconds to reach 2% upload and then the upload
>>>>>>> re-starts at 0% - this goes on ad infinitum.
>>>>>>>
>>>>>>> Smaller files - e.g. 10-20MB - will maybe import 15-20%, then reset
>>>>>>> to 0% and start over.
>>>>>>>
>>>>>>> It seems to me that the problem is related to the DHIS2 web server
>>>>>>> configuration, it do not allow sufficient time for the upload to happen.
>>>>>>>
>>>>>>> Any indications of how to fix this would be appreciated. While
>>>>>>> dumping the server instance into localhost, import the data, and then
>>>>>>> upload/restore the instance does work, it is a pain in the b....
>>>>>>>
>>>>>>> Regards from a sunny Cape Town
>>>>>>> Calle
>>>>>>>
>>>>>>> *******************************************
>>>>>>>
>>>>>>> Calle Hedberg
>>>>>>>
>>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>>
>>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>>
>>>>>>> Cell: +27-82-853-5352
>>>>>>>
>>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>>
>>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>>
>>>>>>> Skype: calle_hedberg
>>>>>>>
>>>>>>> *******************************************
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lars Helge Øverland
>>>>>> Lead developer, DHIS 2
>>>>>> University of Oslo
>>>>>> Skype: larshelgeoverland
>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lars Helge Øverland
>>>>> Lead developer, DHIS 2
>>>>> University of Oslo
>>>>> Skype: larshelgeoverland
>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *******************************************
>>>
>>> Calle Hedberg
>>>
>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>
>>> Tel/fax (home): +27-21-685-6472
>>>
>>> Cell: +27-82-853-5352
>>>
>>> Iridium SatPhone: +8816-315-19119
>>>
>>> Email: calle.hedberg@xxxxxxxxx
>>>
>>> Skype: calle_hedberg
>>>
>>> *******************************************
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>
>
> --
>
> *******************************************
>
> Calle Hedberg
>
> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>
> Tel/fax (home): +27-21-685-6472
>
> Cell: +27-82-853-5352
>
> Iridium SatPhone: +8816-315-19119
>
> Email: calle.hedberg@xxxxxxxxx
>
> Skype: calle_hedberg
>
> *******************************************
>
>


-- 

*******************************************

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@xxxxxxxxx

Skype: calle_hedberg

*******************************************

Follow ups

References