dhis2-users team mailing list archive
-
dhis2-users team
-
Mailing list archive
-
Message #10467
Re: [Dhis2-devs] Creation of CategoryOptionCombinations
It just seems like if you have five million cat combos, you would need many
more orders of magnitudes of data to support them. If the data was imported
as events, instead of aggregate, you would not need to explicitly create
all of those dimensions, but could still create aggregate figures from
them.
It just feels like there is no way all of those cat combos are going to be
filled, unless you really have a TON of data.
Regards,
Jason
On Wed, Jun 8, 2016 at 2:36 PM, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> Hi Jason,
>
> importing aggregate date into data-sets (see my reply to Lars yesterday
> evening:
> https://lists.launchpad.net/dhis2-users/msg10452.html)
>
> Again: the problem is not the import, but the combination of category
> options.
> Maybe it would already help a lot, if those bombastic strings for the names
> wouldn't be created for categoryOptionCombinations.
>
> Thanks for good ideas,
>
> Uwe
>
> ---
> > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 8. Juni 2016 um
> 09:09
> > geschrieben:
> >
> >
> > Hi Uwe,
> >
> > Are you importing this as aggregate data or as events?
> >
> > Regards,
> > Jason
> >
> >
> > On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen <morten@xxxxxxxxx>
> wrote:
> >
> > > Just to make sure, we are talking about the same thing: the problem
> does
> > >> not
> > >> appear during import, but when generating of all possible combinations
> > >> (when
> > >> saving the CategoryCombination or when manually evoking the update of
> > >> categoryOptionCombinations)
> > >>
> > >
> > > Ah, sorry.. I was thinking it was the import that was slow.. so that
> part
> > > is ok?
> > >
> > >
> > >> so I can still use /api/metadata without version to call the current
> > >> api-version?
> > >>
> > >
> > > That will give you the legacy importer, so going forward you would
> need to
> > > use /api/{version}/{endpoint}, we will have more
> > > info about it in the release notes.
> > >
> > > And no, the UI is not switched to new importer yet (in 2.24), not 100%
> it
> > > will...
> > >
> > >
> > >>
> > >> Thanks for your replies at this time of the day :-)
> > >>
> > >> Regards, Uwe
> > >>
> > >> ---
> > >>
> > >>
> > >> > Morten Olav Hansen <morten@xxxxxxxxx> hat am 7. Juni 2016 um 19:28
> > >> > geschrieben:
> > >> >
> > >> >
> > >> > Hi Uwe
> > >> >
> > >> > The improvements are mainly for speed and validation. Yes, we are
> now
> > >> (in
> > >> > 2.24) introducing versioned web-api, so that endpoint importer will
> be
> > >> > available until 2.26 (we will support 3 versions). In 2.24, the same
> > >> > endpoint is available at /api/24/metadata.
> > >> >
> > >> > If you are using cURL, or another utility.. the import part would
> be the
> > >> > same, but the UI in 2.23 can not be used, as it's hardcoded to
> legacy
> > >> > importer.
> > >> >
> > >> > --
> > >> > Morten Olav Hansen
> > >> > Senior Engineer, DHIS 2
> > >> > University of Oslo
> > >> > http://www.dhis2.org
> > >> >
> > >> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> > >> >
> > >> > > Hi Morten,
> > >> > >
> > >> > > no, i didn't. What would be the procedure for that? Importing
> > >> Categories,
> > >> > > Options and CategoryCombinations via api and having DHIS2
> generate the
> > >> > > CategoryOptionCombinations? Would that bring about any change at
> all
> > >> or
> > >> > > does the
> > >> > > importer use different libs for generating the COCs?
> > >> > >
> > >> > > btw. is the 23 in the api link valid for future dhis2 versions? I
> > >> noticed
> > >> > > it in
> > >> > > a few api descriptions recently ...
> > >> > >
> > >> > > Regards, Uwe
> > >> > >
> > >> > > > Morten Olav Hansen <morten@xxxxxxxxx> hat am 7. Juni 2016 um
> 18:50
> > >> > > > geschrieben:
> > >> > > >
> > >> > > >
> > >> > > > Hi Uwe
> > >> > > >
> > >> > > > Did you try out new importer? Available as /api/23/metadata in
> 2.23
> > >> > > >
> > >> > > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> > >> > > >
> > >> > > > > Dear devs,
> > >> > > > >
> > >> > > > > I am experiencing problems when handling category
> combinations.
> > >> Our
> > >> > > > > protoype
> > >> > > > > with 5 dimensions went through the process of generating
> > >> > > > > categoryOptionCombinations (~20.000 records) quite well. 7
> > >> dimensions
> > >> > > > > (~400.000)
> > >> > > > > worked as well, although it took a very long time.
> > >> > > > >
> > >> > > > > Now we defined the next datamodel with 10 dimensions
> (expecting
> > >> ~5Mio
> > >> > > > > categoryOptionCombinations) and the process dies without
> further
> > >> > > notice.
> > >> > > > > Last
> > >> > > > > words in catalina.out:
> > >> > > > > * INFO 2016-06-07 13:29:33,783 Building object-bridge maps
> > >> > > (preheatCache:
> > >> > > > > true,
> > >> > > > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > >> > > > > * INFO 2016-06-07 13:29:36,779 Building object-bridge maps
> took
> > >> 2.99
> > >> > > > > seconds.
> > >> > > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > >> > > > > * INFO 2016-06-07 13:29:36,896 'admin' update
> > >> > > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
> > >> Membership,
> > >> > > uid:
> > >> > > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > >> > > > >
> > >> > > > > Ten dimensions with not extraordinarily big option sets is
> > >> actually not
> > >> > > > > unusual
> > >> > > > > and rather slim for multi-dimensional data-models in data
> > >> warehouses,
> > >> > > so
> > >> > > > > I'd
> > >> > > > > expect DHIS2 to be able to handle this easily.
> > >> > > > >
> > >> > > > > Could of course be a memory problem (tried up to 14g for
> tomcat
> > >> on a
> > >> > > 4-core
> > >> > > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start
> experimenting
> > >> with
> > >> > > other
> > >> > > > > parameters, I am hoping to get some hints on known
> limitations or
> > >> > > > > workarounds
> > >> > > > > from you (not allowed: reducing the number of options or
> > >> categories,
> > >> > > > > sql-hacks
> > >> > > > > :-) ). Is there any info on whether optimizations on this
> process
> > >> are
> > >> > > being
> > >> > > > > planned in the kernel?
> > >> > > > >
> > >> > > > > Some observations on the process:
> > >> > > > >
> > >> > > > > * during generation (either when saving the
> categoryCombination
> > >> or in
> > >> > > the
> > >> > > > > data
> > >> > > > > maintenance menu):
> > >> > > > > - long names - cOCs are generated with generated names that
> are
> > >> getting
> > >> > > > > extremely long as they are mere concats of the involved
> > >> > > categoryOptions.
> > >> > > > > Could
> > >> > > > > there be an option to just use the codes as basis or to leave
> > >> away the
> > >> > > > > names
> > >> > > > > completely? Could be one reason for a memory problem and
> > >> performance
> > >> > > > > issues.
> > >> > > > > - long log entries - every single entry is logged in
> catalina.out
> > >> with
> > >> > > > > several
> > >> > > > > lines of text, causing catalina to become extremely big.
> > >> > > > > - during execution lots of Java-memory are being used and no
> > >> DB-memory,
> > >> > > > > which
> > >> > > > > looks to me as if all the logic is happening in the java
> machine.
> > >> It
> > >> > > might
> > >> > > > > be
> > >> > > > > more usefull to transfer more logic into SQLs to the DB (e.g.
> use
> > >> DB
> > >> > > > > cross-joins
> > >> > > > > for combining options) as the DB will be more efficient.
> > >> > > > > - because of the log entries I assume that every single
> > >> combination is
> > >> > > > > being
> > >> > > > > persisted into the DB with a single SQL statement, causing
> > >> millions of
> > >> > > > > single
> > >> > > > > SQL requests. Prefer batch SQL instead of single record
> > >> processing.
> > >> > > > >
> > >> > > > > * during import/export of categoryOptionCombinations:
> > >> > > > > - prefer batch SQL instead of single record processing
> > >> > > > > - huge log entries in catalina.out due to several lines of
> text
> > >> per
> > >> > > > > combination
> > >> > > > >
> > >> > > > > I'd be very happy about comments.
> > >> > > > >
> > >> > > > > Thanks in advance,
> > >> > > > >
> > >> > > > > Uwe
> > >> > > > >
> > >> > > > > _______________________________________________
> > >> > > > > Mailing list: https://launchpad.net/~dhis2-users
> > >> > > > > Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx <javascript:;>
> > >> > > > > Unsubscribe : https://launchpad.net/~dhis2-users
> > >> > > > > More help : https://help.launchpad.net/ListHelp
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > --
> > >> > > > Morten Olav Hansen
> > >> > > > Senior Engineer, DHIS 2
> > >> > > > University of Oslo
> > >> > > > http://www.dhis2.org
> > >> > >
> > >>
> > >
> > >
> > > _______________________________________________
> > > Mailing list: https://launchpad.net/~dhis2-devs
> > > Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> > > Unsubscribe : https://launchpad.net/~dhis2-devs
> > > More help : https://help.launchpad.net/ListHelp
> > >
> > >
> >
> >
> > --
> > Jason P. Pickering
> > email: jason.p.pickering@xxxxxxxxx
> > tel:+46764147049
>
--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049
Follow ups
References