← Back to team overview

dhis2-devs-core team mailing list archive

Re: ADX data import proposal

 

Sure that will be an easy enough thing to refactor later.  There are a
couple of sensible options.  For the moment I want to get it
functional and I'll ensure the memory doesn't go pop.


On 18 June 2015 at 20:54, Lars Helge Øverland <larshelge@xxxxxxxxx> wrote:
> Hi okay, yes its maybe no ideal solution here. I think I would favor a
> PipedOutputStream/PipedInputStream pair with a separate thread over an
> in-memory DOM.
>
> Do we really need a separate threadpool? We fork off threads many places in
> the system already, e.g. with parallel analytics queries. I thought as long
> as its limited to one of a few per process it should be handled by the JVM.
> But I might be wrong.
>
>
>
>
>
> On Thu, Jun 18, 2015 at 8:46 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> Hi Lars
>>
>> The problem is the dataValuSetService requires an an inputstream to
>> feed off.  There are only 2 ways to provide an inputstream that I can
>> think of.  Either create a pipe or buffer (eg with a string).
>>
>> Creating a pipe is doable but then you also need to create a separate
>> thread to read it which is another resource to manage (eg with a pool)
>> but that seemed like more effort than it is worth.
>>
>> What I can do short term as a defensive measure is to place a limit on
>> the number of datavalues which can be buffered for a single
>> datavalueset.  That way it should not be possible to explode the
>> memory.  I'll do that soon.
>>
>> Note that in "normal" use this should not be a problem as a single adx
>> group corresponds to the data for one orgunit, for one period - what
>> is envisaged typically is a single dataset's worth.
>>
>> The other "alternative" is not to use the datavalueSetService at all
>> but just duplicate the code.
>>
>> Bob
>>
>> On 18 June 2015 at 15:22, Lars Helge Øverland <larshelge@xxxxxxxxx> wrote:
>> > Hi Bob,
>> >
>> > as you say this creates a hard limit on memory. Now all it will take to
>> > bring down a DHIS 2 instance is now to submit a sufficiently large
>> > import
>> > file. Seems like this will provide head-aches for server admins ;) Can
>> > we
>> > find a stream-based solution which scales well?
>> >
>> > Lars
>> >
>> >
>> > On Thu, Jun 18, 2015 at 2:49 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>> > wrote:
>> >>
>> >> WIP committed and slight adjustment of strategy ...
>> >>
>> >> I was not comfortable with creating a new thread just to pipe from adx
>> >> to
>> >> dxf.
>> >>
>> >> So instead, for each adx group corresponding to a dataValueSet with
>> >> orgUnit, period (and potentially atributeOptionCombo), I create a
>> >> dataValueSet DOM document and present that to the dxf2 stream importer
>> >> as a stream.  Given that this data is bound by a single orgunit and
>> >> period I don't think the DOM document is going to break the memory
>> >> bank.
>> >>
>> >> Basic conversion to dxf2 is working fine.
>> >>
>> >> Next task is to "implode" the categories.
>> >>
>> >> A luta Continua.
>> >>
>> >> On 12 June 2015 at 13:40, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>> >> > Hi
>> >> >
>> >> > As yoou have seen I have already started to commit a few bits of code
>> >> > in support of the ADX implementation.  I hadn't been planning to do
>> >> > this so will proceed quite slowly, but let me outline the approach I
>> >> > am considering for your comment and suggestion.
>> >> >
>> >> > 1.  Currently we have a datavaueset service which can import dxf2
>> >> > data
>> >> > from an inputstream.
>> >> >
>> >> > 2.  I would like to use that existing service and place the adx
>> >> > service as a thin veneer above it rather than create a lot of
>> >> > duplicated code.
>> >> >
>> >> > 3.  The adx data importer would read its adx input from a stream and
>> >> > convert that into a dxf2 stream.  The main tasks it would need to
>> >> > perform are:
>> >> > (i)  convert periods into dxf2 format
>> >> > (ii) lookup catoptcombos and attributeoptioncombos for the dimensions
>> >> > in the adx message
>> >> > All other attributes and ImportOptions would be passed through
>> >> > directly to the dxf2 datavalueset service.
>> >> >
>> >> > 4.  In order to present the resulting dxf2 to the service as an
>> >> > InputStream it would have to use PipeReader/PipeWriter combination
>> >> > (Something Lars will recall from earlier dxf1 code).  The equivalent
>> >> > alternative would be to post the dxf2 datasets backout to the REST
>> >> > endpoint but that seems wasteful and more awkward.
>> >> >
>> >> > Does that approach sound reasonable?
>> >> >
>> >> > I have some lingering uncertainty about the best way to deal with
>> >> > ImportSummary.  The adx data is naturally grouped by orgunit/period.
>> >> > So I would likely split the stream and post each as a separate dxf2
>> >> > datavalueset.  So probably this would imply collecting the results
>> >> > into an <ImportSummaries ... /> element.  ADX is currently silent on
>> >> > the result message as it deliberately does not define the transaction
>> >> > (just the message) so we have some latitude here to do whatever is
>> >> > best.  The above is my best suggestion.
>> >> >
>> >> > Cheers
>> >> > Bob
>> >>
>> >> --
>> >> Mailing list: https://launchpad.net/~dhis2-devs-core
>> >> Post to     : dhis2-devs-core@xxxxxxxxxxxxxxxxxxx
>> >> Unsubscribe : https://launchpad.net/~dhis2-devs-core
>> >> More help   : https://help.launchpad.net/ListHelp
>> >
>> >
>> >
>> >
>> > --
>> > Lars Helge Øverland
>> > Lead developer, DHIS 2
>> > University of Oslo
>> > Skype: larshelgeoverland
>> > http://www.dhis2.org
>> >
>
>
>
>
> --
> Lars Helge Øverland
> Lead developer, DHIS 2
> University of Oslo
> Skype: larshelgeoverland
> http://www.dhis2.org
>


Follow ups

References