← Back to team overview

dhis2-devs-core team mailing list archive

Re: ADX data import proposal

 

Hi okay, yes its maybe no ideal solution here. I think I would favor
a PipedOutputStream/PipedInputStream pair with a separate thread over an
in-memory DOM.

Do we really need a separate threadpool? We fork off threads many places in
the system already, e.g. with parallel analytics queries. I thought as long
as its limited to one of a few per process it should be handled by the JVM.
But I might be wrong.





On Thu, Jun 18, 2015 at 8:46 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:

> Hi Lars
>
> The problem is the dataValuSetService requires an an inputstream to
> feed off.  There are only 2 ways to provide an inputstream that I can
> think of.  Either create a pipe or buffer (eg with a string).
>
> Creating a pipe is doable but then you also need to create a separate
> thread to read it which is another resource to manage (eg with a pool)
> but that seemed like more effort than it is worth.
>
> What I can do short term as a defensive measure is to place a limit on
> the number of datavalues which can be buffered for a single
> datavalueset.  That way it should not be possible to explode the
> memory.  I'll do that soon.
>
> Note that in "normal" use this should not be a problem as a single adx
> group corresponds to the data for one orgunit, for one period - what
> is envisaged typically is a single dataset's worth.
>
> The other "alternative" is not to use the datavalueSetService at all
> but just duplicate the code.
>
> Bob
>
> On 18 June 2015 at 15:22, Lars Helge Øverland <larshelge@xxxxxxxxx> wrote:
> > Hi Bob,
> >
> > as you say this creates a hard limit on memory. Now all it will take to
> > bring down a DHIS 2 instance is now to submit a sufficiently large import
> > file. Seems like this will provide head-aches for server admins ;) Can we
> > find a stream-based solution which scales well?
> >
> > Lars
> >
> >
> > On Thu, Jun 18, 2015 at 2:49 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> >>
> >> WIP committed and slight adjustment of strategy ...
> >>
> >> I was not comfortable with creating a new thread just to pipe from adx
> to
> >> dxf.
> >>
> >> So instead, for each adx group corresponding to a dataValueSet with
> >> orgUnit, period (and potentially atributeOptionCombo), I create a
> >> dataValueSet DOM document and present that to the dxf2 stream importer
> >> as a stream.  Given that this data is bound by a single orgunit and
> >> period I don't think the DOM document is going to break the memory
> >> bank.
> >>
> >> Basic conversion to dxf2 is working fine.
> >>
> >> Next task is to "implode" the categories.
> >>
> >> A luta Continua.
> >>
> >> On 12 June 2015 at 13:40, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> >> > Hi
> >> >
> >> > As yoou have seen I have already started to commit a few bits of code
> >> > in support of the ADX implementation.  I hadn't been planning to do
> >> > this so will proceed quite slowly, but let me outline the approach I
> >> > am considering for your comment and suggestion.
> >> >
> >> > 1.  Currently we have a datavaueset service which can import dxf2 data
> >> > from an inputstream.
> >> >
> >> > 2.  I would like to use that existing service and place the adx
> >> > service as a thin veneer above it rather than create a lot of
> >> > duplicated code.
> >> >
> >> > 3.  The adx data importer would read its adx input from a stream and
> >> > convert that into a dxf2 stream.  The main tasks it would need to
> >> > perform are:
> >> > (i)  convert periods into dxf2 format
> >> > (ii) lookup catoptcombos and attributeoptioncombos for the dimensions
> >> > in the adx message
> >> > All other attributes and ImportOptions would be passed through
> >> > directly to the dxf2 datavalueset service.
> >> >
> >> > 4.  In order to present the resulting dxf2 to the service as an
> >> > InputStream it would have to use PipeReader/PipeWriter combination
> >> > (Something Lars will recall from earlier dxf1 code).  The equivalent
> >> > alternative would be to post the dxf2 datasets backout to the REST
> >> > endpoint but that seems wasteful and more awkward.
> >> >
> >> > Does that approach sound reasonable?
> >> >
> >> > I have some lingering uncertainty about the best way to deal with
> >> > ImportSummary.  The adx data is naturally grouped by orgunit/period.
> >> > So I would likely split the stream and post each as a separate dxf2
> >> > datavalueset.  So probably this would imply collecting the results
> >> > into an <ImportSummaries ... /> element.  ADX is currently silent on
> >> > the result message as it deliberately does not define the transaction
> >> > (just the message) so we have some latitude here to do whatever is
> >> > best.  The above is my best suggestion.
> >> >
> >> > Cheers
> >> > Bob
> >>
> >> --
> >> Mailing list: https://launchpad.net/~dhis2-devs-core
> >> Post to     : dhis2-devs-core@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-devs-core
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
> >
> >
> > --
> > Lars Helge Øverland
> > Lead developer, DHIS 2
> > University of Oslo
> > Skype: larshelgeoverland
> > http://www.dhis2.org
> >
>



-- 
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

Follow ups

References