← Back to team overview

dhis2-devs team mailing list archive

Re: Importing large data files - server settings

 

Lars,

NOTE that I used the "skip checking" option, because I was importing into
an instance where I had deleted all records in the datavalue table.

I will try to reproduce it on demo - if it does not happen there, I can
make the "culprit" instance available. I initially noticed those few
duplicates because I was checking the number of records in the datavalue
table against the number of rows in the CSV file - then when importing, it
would bomb out when encountering the first duplicate.

Regards
Calle

On 7 December 2015 at 14:31, Lars Helge Øverland <larshelge@xxxxxxxxx>
wrote:

> Hi Calle,
>
> that's strange - I just tested by importing a CSV data value file on the
> demo
> <https://play.dhis2.org/demo/dhis-web-importexport/displayImportDataValueForm.action> and
> duplicates were ignored.  What version are you on? Can you reproduce on
> demo?
>
> regards,
>
> Lars
>
>
> On Fri, Dec 4, 2015 at 11:31 AM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
> wrote:
>
>> Hi
>>
>> FYI, I've just done another import of a 120mb xml-zip file - upload took
>> around 30 minutes and actual import around 18 minutes.
>>
>> Regards
>> Calle
>>
>> On 3 December 2015 at 20:34, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>> wrote:
>>
>>> Hi
>>>
>>> By the way, using CSV should reduce file size and speed up the import,
>>> but there seems to be a bug somewhere in the CSV export: the total number
>>> of records exported was slightly higher than the actual number of data
>>> records selected (we are talking 3-4 duplicated records out of 8 million),
>>> and as a result the imports crashes as soon as the first duplicate record
>>> is encountered. This does not happen when exporting xml. I did drill into
>>> one of these duplicates and found it to be the "first" orgunit in the
>>> alphabetical list and also the "earliest" period in the source system.
>>>
>>> Regards
>>> Calle
>>>
>>>
>>>
>>> On 3 December 2015 at 20:28, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>>> wrote:
>>>
>>>> Jason,
>>>>
>>>> I fully understand that, and it's only done infrequently (and outside
>>>> office hours if it's a production instance).
>>>>
>>>> The 75MB xml-zip file that I just uploaded had only around 8 mill
>>>> records - the upload took 20 minutes and the actual import around 13
>>>> minutes. No problemo...
>>>>
>>>> Regards
>>>> Calle
>>>>
>>>> On 3 December 2015 at 20:12, Jason Pickering <
>>>> jason.p.pickering@xxxxxxxxx> wrote:
>>>>
>>>>> Hi Calle,
>>>>>
>>>>> I think you  would want to be very careful with this. If you change
>>>>> the maximum file size to 200 MB, this could potentially be an unzipped file
>>>>> of several (tens) of gigabytes , or several million rows of data. This
>>>>> could put significant stress on the server, and is the entire point of the
>>>>> restriction really, to prevent huge uploads from being imported.  If you
>>>>> have limited who can upload data to the server, it may be OK, but just be
>>>>> aware that a zip file of 200 MB, can be much much larger (by an order or
>>>>> magnitude or two), and result in a very long process.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 3, 2015, 18:52 Calle Hedberg <calle.hedberg@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> We are now standardising on 200M
>>>>>>
>>>>>> Regards
>>>>>> Calle
>>>>>>
>>>>>> On 3 December 2015 at 17:38, Alan Ivey <aivey@xxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Also, it's worth noting that the default for "client_max_body_size"
>>>>>>> is only 1 MB:
>>>>>>> http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size .
>>>>>>> It will need to be increased on most deployments of DHIS2.
>>>>>>>
>>>>>>> On Thu, Dec 3, 2015 at 9:09 AM, Lars Helge Øverland <
>>>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> If you are indeed using nginx, the "client_max_body_size" directive
>>>>>>>> is part of the installation docs example, can be increased as appropriate:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://www.dhis2.org/doc/snapshot/en/implementer/html/ch08s04.html#d5e575
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Lars
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2015 at 3:02 PM, Lars Helge Øverland <
>>>>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>> Hi Calle,
>>>>>>>>>
>>>>>>>>> I think this depends on the web server configuration. One can
>>>>>>>>> configure max file size for uploads in both the proxy (nginx, apache) and
>>>>>>>>> servlet container (tomcat
>>>>>>>>> <http://stackoverflow.com/questions/2947683/httprequest-maximum-allowable-size-in-tomcat>
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> On nginx
>>>>>>>>> <http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size>
>>>>>>>>> the directive is:
>>>>>>>>>
>>>>>>>>> client_max_body_size 200M;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>>
>>>>>>>>> Lars
>>>>>>>>>
>>>>>>>>> On Thu, Dec 3, 2015 at 2:44 PM, Calle Hedberg <
>>>>>>>>> calle.hedberg@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have found that there is a limitation in file size when
>>>>>>>>>> importing data into our SERVER-based instances, while I have found no
>>>>>>>>>> equivalent limitation when importing large data files (e.g. XML format)
>>>>>>>>>> into an equivalent localhost instance. A few key aspects:
>>>>>>>>>>
>>>>>>>>>> 1. Both the server and localhost are running the latest version
>>>>>>>>>> of 2.20
>>>>>>>>>> 2. Both are running java 8 64 bits and tomcat 8.026 or 8.029
>>>>>>>>>> 3. Localhost tomcat has 4GB (min) and 8GB (max) allocated
>>>>>>>>>> 4. The server instance (running under Ubuntu Linux) has ~5.3GB
>>>>>>>>>> RAM, but increasing/decreasing RAM has no effect on the issue.
>>>>>>>>>>
>>>>>>>>>> The problem is related to the upload process.
>>>>>>>>>>
>>>>>>>>>> Example:
>>>>>>>>>> When importing a 75MB data file with around 8 mill data records
>>>>>>>>>> (XML, zipped) on localhost, the initial upload step is almost instantaneous
>>>>>>>>>> (2-3 seconds) and then the actual import starts (takes about 10 minutes
>>>>>>>>>> overall).
>>>>>>>>>>
>>>>>>>>>> When importing the same file to the equivalent instance on the
>>>>>>>>>> server, it takes around 30 seconds to reach 2% upload and then the upload
>>>>>>>>>> re-starts at 0% - this goes on ad infinitum.
>>>>>>>>>>
>>>>>>>>>> Smaller files - e.g. 10-20MB - will maybe import 15-20%, then
>>>>>>>>>> reset to 0% and start over.
>>>>>>>>>>
>>>>>>>>>> It seems to me that the problem is related to the DHIS2 web
>>>>>>>>>> server configuration, it do not allow sufficient time for the upload to
>>>>>>>>>> happen.
>>>>>>>>>>
>>>>>>>>>> Any indications of how to fix this would be appreciated. While
>>>>>>>>>> dumping the server instance into localhost, import the data, and then
>>>>>>>>>> upload/restore the instance does work, it is a pain in the b....
>>>>>>>>>>
>>>>>>>>>> Regards from a sunny Cape Town
>>>>>>>>>> Calle
>>>>>>>>>>
>>>>>>>>>> *******************************************
>>>>>>>>>>
>>>>>>>>>> Calle Hedberg
>>>>>>>>>>
>>>>>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>>>>>
>>>>>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>>>>>
>>>>>>>>>> Cell: +27-82-853-5352
>>>>>>>>>>
>>>>>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>>>>>
>>>>>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>>>>>
>>>>>>>>>> Skype: calle_hedberg
>>>>>>>>>>
>>>>>>>>>> *******************************************
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Lars Helge Øverland
>>>>>>>>> Lead developer, DHIS 2
>>>>>>>>> University of Oslo
>>>>>>>>> Skype: larshelgeoverland
>>>>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lars Helge Øverland
>>>>>>>> Lead developer, DHIS 2
>>>>>>>> University of Oslo
>>>>>>>> Skype: larshelgeoverland
>>>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *******************************************
>>>>>>
>>>>>> Calle Hedberg
>>>>>>
>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>
>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>
>>>>>> Cell: +27-82-853-5352
>>>>>>
>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>
>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>
>>>>>> Skype: calle_hedberg
>>>>>>
>>>>>> *******************************************
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *******************************************
>>>>
>>>> Calle Hedberg
>>>>
>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>
>>>> Tel/fax (home): +27-21-685-6472
>>>>
>>>> Cell: +27-82-853-5352
>>>>
>>>> Iridium SatPhone: +8816-315-19119
>>>>
>>>> Email: calle.hedberg@xxxxxxxxx
>>>>
>>>> Skype: calle_hedberg
>>>>
>>>> *******************************************
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *******************************************
>>>
>>> Calle Hedberg
>>>
>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>
>>> Tel/fax (home): +27-21-685-6472
>>>
>>> Cell: +27-82-853-5352
>>>
>>> Iridium SatPhone: +8816-315-19119
>>>
>>> Email: calle.hedberg@xxxxxxxxx
>>>
>>> Skype: calle_hedberg
>>>
>>> *******************************************
>>>
>>>
>>
>>
>> --
>>
>> *******************************************
>>
>> Calle Hedberg
>>
>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>
>> Tel/fax (home): +27-21-685-6472
>>
>> Cell: +27-82-853-5352
>>
>> Iridium SatPhone: +8816-315-19119
>>
>> Email: calle.hedberg@xxxxxxxxx
>>
>> Skype: calle_hedberg
>>
>> *******************************************
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Lars Helge Øverland
> Lead developer, DHIS 2
> University of Oslo
> Skype: larshelgeoverland
> http://www.dhis2.org <https://www.dhis2.org>
>
>


-- 

*******************************************

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@xxxxxxxxx

Skype: calle_hedberg

*******************************************

References