← Back to team overview

dhis2-devs team mailing list archive

Re: Importing large data files - server settings

 

Hi Calle,

that's strange - I just tested by importing a CSV data value file on the
demo
<https://play.dhis2.org/demo/dhis-web-importexport/displayImportDataValueForm.action>
and
duplicates were ignored.  What version are you on? Can you reproduce on
demo?

regards,

Lars


On Fri, Dec 4, 2015 at 11:31 AM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
wrote:

> Hi
>
> FYI, I've just done another import of a 120mb xml-zip file - upload took
> around 30 minutes and actual import around 18 minutes.
>
> Regards
> Calle
>
> On 3 December 2015 at 20:34, Calle Hedberg <calle.hedberg@xxxxxxxxx>
> wrote:
>
>> Hi
>>
>> By the way, using CSV should reduce file size and speed up the import,
>> but there seems to be a bug somewhere in the CSV export: the total number
>> of records exported was slightly higher than the actual number of data
>> records selected (we are talking 3-4 duplicated records out of 8 million),
>> and as a result the imports crashes as soon as the first duplicate record
>> is encountered. This does not happen when exporting xml. I did drill into
>> one of these duplicates and found it to be the "first" orgunit in the
>> alphabetical list and also the "earliest" period in the source system.
>>
>> Regards
>> Calle
>>
>>
>>
>> On 3 December 2015 at 20:28, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>> wrote:
>>
>>> Jason,
>>>
>>> I fully understand that, and it's only done infrequently (and outside
>>> office hours if it's a production instance).
>>>
>>> The 75MB xml-zip file that I just uploaded had only around 8 mill
>>> records - the upload took 20 minutes and the actual import around 13
>>> minutes. No problemo...
>>>
>>> Regards
>>> Calle
>>>
>>> On 3 December 2015 at 20:12, Jason Pickering <
>>> jason.p.pickering@xxxxxxxxx> wrote:
>>>
>>>> Hi Calle,
>>>>
>>>> I think you  would want to be very careful with this. If you change the
>>>> maximum file size to 200 MB, this could potentially be an unzipped file of
>>>> several (tens) of gigabytes , or several million rows of data. This could
>>>> put significant stress on the server, and is the entire point of the
>>>> restriction really, to prevent huge uploads from being imported.  If you
>>>> have limited who can upload data to the server, it may be OK, but just be
>>>> aware that a zip file of 200 MB, can be much much larger (by an order or
>>>> magnitude or two), and result in a very long process.
>>>>
>>>> Regards,
>>>>
>>>> Jason
>>>>
>>>>
>>>>
>>>> On Thu, Dec 3, 2015, 18:52 Calle Hedberg <calle.hedberg@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> We are now standardising on 200M
>>>>>
>>>>> Regards
>>>>> Calle
>>>>>
>>>>> On 3 December 2015 at 17:38, Alan Ivey <aivey@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Also, it's worth noting that the default for "client_max_body_size"
>>>>>> is only 1 MB:
>>>>>> http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size .
>>>>>> It will need to be increased on most deployments of DHIS2.
>>>>>>
>>>>>> On Thu, Dec 3, 2015 at 9:09 AM, Lars Helge Øverland <
>>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> If you are indeed using nginx, the "client_max_body_size" directive
>>>>>>> is part of the installation docs example, can be increased as appropriate:
>>>>>>>
>>>>>>>
>>>>>>> https://www.dhis2.org/doc/snapshot/en/implementer/html/ch08s04.html#d5e575
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Lars
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 3, 2015 at 3:02 PM, Lars Helge Øverland <
>>>>>>> larshelge@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> Hi Calle,
>>>>>>>>
>>>>>>>> I think this depends on the web server configuration. One can
>>>>>>>> configure max file size for uploads in both the proxy (nginx, apache) and
>>>>>>>> servlet container (tomcat
>>>>>>>> <http://stackoverflow.com/questions/2947683/httprequest-maximum-allowable-size-in-tomcat>
>>>>>>>> ).
>>>>>>>>
>>>>>>>> On nginx
>>>>>>>> <http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size>
>>>>>>>> the directive is:
>>>>>>>>
>>>>>>>> client_max_body_size 200M;
>>>>>>>>
>>>>>>>>
>>>>>>>> regards,
>>>>>>>>
>>>>>>>> Lars
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2015 at 2:44 PM, Calle Hedberg <
>>>>>>>> calle.hedberg@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have found that there is a limitation in file size when
>>>>>>>>> importing data into our SERVER-based instances, while I have found no
>>>>>>>>> equivalent limitation when importing large data files (e.g. XML format)
>>>>>>>>> into an equivalent localhost instance. A few key aspects:
>>>>>>>>>
>>>>>>>>> 1. Both the server and localhost are running the latest version of
>>>>>>>>> 2.20
>>>>>>>>> 2. Both are running java 8 64 bits and tomcat 8.026 or 8.029
>>>>>>>>> 3. Localhost tomcat has 4GB (min) and 8GB (max) allocated
>>>>>>>>> 4. The server instance (running under Ubuntu Linux) has ~5.3GB
>>>>>>>>> RAM, but increasing/decreasing RAM has no effect on the issue.
>>>>>>>>>
>>>>>>>>> The problem is related to the upload process.
>>>>>>>>>
>>>>>>>>> Example:
>>>>>>>>> When importing a 75MB data file with around 8 mill data records
>>>>>>>>> (XML, zipped) on localhost, the initial upload step is almost instantaneous
>>>>>>>>> (2-3 seconds) and then the actual import starts (takes about 10 minutes
>>>>>>>>> overall).
>>>>>>>>>
>>>>>>>>> When importing the same file to the equivalent instance on the
>>>>>>>>> server, it takes around 30 seconds to reach 2% upload and then the upload
>>>>>>>>> re-starts at 0% - this goes on ad infinitum.
>>>>>>>>>
>>>>>>>>> Smaller files - e.g. 10-20MB - will maybe import 15-20%, then
>>>>>>>>> reset to 0% and start over.
>>>>>>>>>
>>>>>>>>> It seems to me that the problem is related to the DHIS2 web server
>>>>>>>>> configuration, it do not allow sufficient time for the upload to happen.
>>>>>>>>>
>>>>>>>>> Any indications of how to fix this would be appreciated. While
>>>>>>>>> dumping the server instance into localhost, import the data, and then
>>>>>>>>> upload/restore the instance does work, it is a pain in the b....
>>>>>>>>>
>>>>>>>>> Regards from a sunny Cape Town
>>>>>>>>> Calle
>>>>>>>>>
>>>>>>>>> *******************************************
>>>>>>>>>
>>>>>>>>> Calle Hedberg
>>>>>>>>>
>>>>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>>>>
>>>>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>>>>
>>>>>>>>> Cell: +27-82-853-5352
>>>>>>>>>
>>>>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>>>>
>>>>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>>>>
>>>>>>>>> Skype: calle_hedberg
>>>>>>>>>
>>>>>>>>> *******************************************
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lars Helge Øverland
>>>>>>>> Lead developer, DHIS 2
>>>>>>>> University of Oslo
>>>>>>>> Skype: larshelgeoverland
>>>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lars Helge Øverland
>>>>>>> Lead developer, DHIS 2
>>>>>>> University of Oslo
>>>>>>> Skype: larshelgeoverland
>>>>>>> http://www.dhis2.org <https://www.dhis2.org>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *******************************************
>>>>>
>>>>> Calle Hedberg
>>>>>
>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>
>>>>> Tel/fax (home): +27-21-685-6472
>>>>>
>>>>> Cell: +27-82-853-5352
>>>>>
>>>>> Iridium SatPhone: +8816-315-19119
>>>>>
>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>
>>>>> Skype: calle_hedberg
>>>>>
>>>>> *******************************************
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *******************************************
>>>
>>> Calle Hedberg
>>>
>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>
>>> Tel/fax (home): +27-21-685-6472
>>>
>>> Cell: +27-82-853-5352
>>>
>>> Iridium SatPhone: +8816-315-19119
>>>
>>> Email: calle.hedberg@xxxxxxxxx
>>>
>>> Skype: calle_hedberg
>>>
>>> *******************************************
>>>
>>>
>>
>>
>> --
>>
>> *******************************************
>>
>> Calle Hedberg
>>
>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>
>> Tel/fax (home): +27-21-685-6472
>>
>> Cell: +27-82-853-5352
>>
>> Iridium SatPhone: +8816-315-19119
>>
>> Email: calle.hedberg@xxxxxxxxx
>>
>> Skype: calle_hedberg
>>
>> *******************************************
>>
>>
>
>
> --
>
> *******************************************
>
> Calle Hedberg
>
> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>
> Tel/fax (home): +27-21-685-6472
>
> Cell: +27-82-853-5352
>
> Iridium SatPhone: +8816-315-19119
>
> Email: calle.hedberg@xxxxxxxxx
>
> Skype: calle_hedberg
>
> *******************************************
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

Follow ups

References