← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis2-users] Creation of CategoryOptionCombinations

 

I am not talking about tracker, but rather anonymous events. So, again, I
have no idea what your data looks like, but I will take a stab.

Age: As an integer or  if you have it, the date of birth
Gender: As an option set (Male/Female)
JobGroup: As an option set
Insurance scheme: As an option set
Weight: As an integer, I guess...
Size: ??
FeesPaid: As numeric

The advantage as representing this as events is that Age, Gender, Job
Group, Insurance scheme can be used to aggregate "FeesPaid" in the event
reports, but without explicitly defining the dimensions. Thus you only
create the dimensions (and database index size) you actually need, and
don't end up with  many empty cat option combos, but rather can simply
count the events across those dimensions in the event reports.

Again, no idea what you data looks like, it just seems that maybe you are
choosing a difficult way to represent the data, especially, if you are
going to end up with a lot of cat option combos which don't have any data.

Regards,
jason


On Wed, Jun 8, 2016 at 4:31 PM, Uwe Wahser <uwe@xxxxxxxxx> wrote:

> Hi Jason,
>
> just to clarify: it's 1 CategoryCombo with ten Categories resulting in 50
> Mio
>  CategoryOptionCombos (I misspelled this before). Theoretically this must
> be
> multiplied by the number of dataelements in the dataset, the number of
> orgunits
> and the number of periods (daily over 50 Years) to get the number of
> expected
> dataValues.
>
> In reality this number of dataValues will not be reached as there are
> functional
> dependencies between options, thus leaving lots of combinations empty.
> Actually
> I cannot predict just how many combinations (aka records) will pop up from
> the
> group by SQL on the Source-System. In our current prototype with 5
> Categories in
> the CatCombo we are getting 4 Mio values in total, from which only 10.000
> have
> to be updated every day - which is a very reasonable number. I am actually
> hoping for similar numbers with the extended 10-dim version because of
> those
> functional dependencies.
>
> The idea of using the tracker is interesting, although I'd have to get
> used to
> the idea of using a granular level to upload aggregated data and rethink
> the
> whole model. I think, I'd rather try to reduce the number of categories
> first (I
> am currently down to 10Mio COCs and it seems to work).
>
> How do you estimate the chances to get rid of some of the heavy things from
> DHIS2 core when generating categoryOptionCombinations? I am especially
> thinking
> of the extraordinary long names and the huge log-entires for every new
> categoryOptionCombination (currently over 3000 characters log for each).
> This
> would already take a lot of data-volume out of the generation process.
>
> Regards, Uwe
>
>
> > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 8. Juni 2016 um
> 15:44
> > geschrieben:
> >
> >
> > It just seems like if you have five million cat combos, you would need
> many
> > more orders of magnitudes of data to support them. If the data was
> imported
> > as events, instead of aggregate, you would not need to explicitly create
> > all of those dimensions, but could still create aggregate figures from
> > them.
> >
> > It just feels like there is no way all of those cat combos are going to
> be
> > filled, unless you really have a TON of data.
> >
> > Regards,
> > Jason
> >
> >
> >
> > On Wed, Jun 8, 2016 at 2:36 PM, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> >
> > > Hi Jason,
> > >
> > > importing aggregate date into data-sets (see my reply to Lars yesterday
> > > evening:
> > > https://lists.launchpad.net/dhis2-users/msg10452.html)
> > >
> > > Again: the problem is not the import, but the combination of category
> > > options.
> > > Maybe it would already help a lot, if those bombastic strings for the
> names
> > > wouldn't be created for categoryOptionCombinations.
> > >
> > > Thanks for good ideas,
> > >
> > > Uwe
> > >
> > > ---
> > > > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 8. Juni 2016 um
> > > 09:09
> > > > geschrieben:
> > > >
> > > >
> > > > Hi Uwe,
> > > >
> > > > Are you importing this as aggregate data or as events?
> > > >
> > > > Regards,
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jun 8, 2016 at 2:27 AM, Morten Olav Hansen <morten@xxxxxxxxx
> >
> > > wrote:
> > > >
> > > > > Just to make sure, we are talking about the same thing: the problem
> > > does
> > > > >> not
> > > > >> appear during import, but when generating of all possible
> combinations
> > > > >> (when
> > > > >> saving the CategoryCombination or when manually evoking the
> update of
> > > > >> categoryOptionCombinations)
> > > > >>
> > > > >
> > > > > Ah, sorry.. I was thinking it was the import that was slow.. so
> that
> > > part
> > > > > is ok?
> > > > >
> > > > >
> > > > >> so I can still use /api/metadata without version to call the
> current
> > > > >> api-version?
> > > > >>
> > > > >
> > > > > That will give you the legacy importer, so going forward you would
> > > need to
> > > > > use /api/{version}/{endpoint}, we will have more
> > > > > info about it in the release notes.
> > > > >
> > > > > And no, the UI is not switched to new importer yet (in 2.24), not
> 100%
> > > it
> > > > > will...
> > > > >
> > > > >
> > > > >>
> > > > >> Thanks for your replies at this time of the day :-)
> > > > >>
> > > > >> Regards, Uwe
> > > > >>
> > > > >> ---
> > > > >>
> > > > >>
> > > > >> > Morten Olav Hansen <morten@xxxxxxxxx> hat am 7. Juni 2016 um
> 19:28
> > > > >> > geschrieben:
> > > > >> >
> > > > >> >
> > > > >> > Hi Uwe
> > > > >> >
> > > > >> > The improvements are mainly for speed and validation. Yes, we
> are
> > > now
> > > > >> (in
> > > > >> > 2.24) introducing versioned web-api, so that endpoint importer
> will
> > > be
> > > > >> > available until 2.26 (we will support 3 versions). In 2.24, the
> same
> > > > >> > endpoint is available at /api/24/metadata.
> > > > >> >
> > > > >> > If you are using cURL, or another utility.. the import part
> would
> > > be the
> > > > >> > same, but the UI in 2.23 can not be used, as it's hardcoded to
> > > legacy
> > > > >> > importer.
> > > > >> >
> > > > >> > --
> > > > >> > Morten Olav Hansen
> > > > >> > Senior Engineer, DHIS 2
> > > > >> > University of Oslo
> > > > >> > http://www.dhis2.org
> > > > >> >
> > > > >> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@xxxxxxxxx>
> wrote:
> > > > >> >
> > > > >> > > Hi Morten,
> > > > >> > >
> > > > >> > > no, i didn't. What would be the procedure for that? Importing
> > > > >> Categories,
> > > > >> > > Options and CategoryCombinations via api and having DHIS2
> > > generate the
> > > > >> > > CategoryOptionCombinations? Would that bring about any change
> at
> > > all
> > > > >> or
> > > > >> > > does the
> > > > >> > > importer use different libs for generating the COCs?
> > > > >> > >
> > > > >> > > btw. is the 23 in the api link valid for future dhis2
> versions? I
> > > > >> noticed
> > > > >> > > it in
> > > > >> > > a few api descriptions recently ...
> > > > >> > >
> > > > >> > > Regards, Uwe
> > > > >> > >
> > > > >> > > > Morten Olav Hansen <morten@xxxxxxxxx> hat am 7. Juni 2016
> um
> > > 18:50
> > > > >> > > > geschrieben:
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Hi Uwe
> > > > >> > > >
> > > > >> > > > Did you try out new importer? Available as /api/23/metadata
> in
> > > 2.23
> > > > >> > > >
> > > > >> > > > On Tuesday, 7 June 2016, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> > > > >> > > >
> > > > >> > > > > Dear devs,
> > > > >> > > > >
> > > > >> > > > > I am experiencing problems when handling category
> > > combinations.
> > > > >> Our
> > > > >> > > > > protoype
> > > > >> > > > > with 5 dimensions went through the process of generating
> > > > >> > > > > categoryOptionCombinations (~20.000 records) quite well. 7
> > > > >> dimensions
> > > > >> > > > > (~400.000)
> > > > >> > > > > worked as well, although it took a very long time.
> > > > >> > > > >
> > > > >> > > > > Now we defined the next datamodel with 10 dimensions
> > > (expecting
> > > > >> ~5Mio
> > > > >> > > > > categoryOptionCombinations) and the process dies without
> > > further
> > > > >> > > notice.
> > > > >> > > > > Last
> > > > >> > > > > words in catalina.out:
> > > > >> > > > > * INFO  2016-06-07 13:29:33,783 Building object-bridge
> maps
> > > > >> > > (preheatCache:
> > > > >> > > > > true,
> > > > >> > > > > 3 classes). (DefaultObjectBridge.java
> [http-bio-8180-exec-15])
> > > > >> > > > > * INFO  2016-06-07 13:29:36,779 Building object-bridge
> maps
> > > took
> > > > >> 2.99
> > > > >> > > > > seconds.
> > > > >> > > > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> > > > >> > > > > * INFO  2016-06-07 13:29:36,896 'admin' update
> > > > >> > > > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name:
> > > > >> Membership,
> > > > >> > > uid:
> > > > >> > > > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> > > > >> > > > >
> > > > >> > > > > Ten dimensions with not extraordinarily big option sets is
> > > > >> actually not
> > > > >> > > > > unusual
> > > > >> > > > > and rather slim for multi-dimensional data-models in data
> > > > >> warehouses,
> > > > >> > > so
> > > > >> > > > > I'd
> > > > >> > > > > expect DHIS2 to be able to handle this easily.
> > > > >> > > > >
> > > > >> > > > > Could of course be a memory problem (tried up to 14g for
> > > tomcat
> > > > >> on a
> > > > >> > > 4-core
> > > > >> > > > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start
> > > experimenting
> > > > >> with
> > > > >> > > other
> > > > >> > > > > parameters, I am hoping to get some hints on known
> > > limitations or
> > > > >> > > > > workarounds
> > > > >> > > > > from you (not allowed: reducing the number of options or
> > > > >> categories,
> > > > >> > > > > sql-hacks
> > > > >> > > > > :-) ). Is there any info on whether optimizations on this
> > > process
> > > > >> are
> > > > >> > > being
> > > > >> > > > > planned in the kernel?
> > > > >> > > > >
> > > > >> > > > > Some observations on the process:
> > > > >> > > > >
> > > > >> > > > > * during generation (either when saving the
> > > categoryCombination
> > > > >> or in
> > > > >> > > the
> > > > >> > > > > data
> > > > >> > > > > maintenance menu):
> > > > >> > > > > - long names - cOCs are generated with generated names
> that
> > > are
> > > > >> getting
> > > > >> > > > > extremely long as they are mere concats of the involved
> > > > >> > > categoryOptions.
> > > > >> > > > > Could
> > > > >> > > > > there be an option to just use the codes as basis or to
> leave
> > > > >> away the
> > > > >> > > > > names
> > > > >> > > > > completely? Could be one reason for a memory problem and
> > > > >> performance
> > > > >> > > > > issues.
> > > > >> > > > > - long log entries - every single entry is logged in
> > > catalina.out
> > > > >> with
> > > > >> > > > > several
> > > > >> > > > > lines of text, causing catalina to become extremely big.
> > > > >> > > > > - during execution lots of Java-memory are being used and
> no
> > > > >> DB-memory,
> > > > >> > > > > which
> > > > >> > > > > looks to me as if all the logic is happening in the java
> > > machine.
> > > > >> It
> > > > >> > > might
> > > > >> > > > > be
> > > > >> > > > > more usefull to transfer more logic into SQLs to the DB
> (e.g.
> > > use
> > > > >> DB
> > > > >> > > > > cross-joins
> > > > >> > > > > for combining options) as the DB will be more efficient.
> > > > >> > > > > - because of the log entries I assume that every single
> > > > >> combination is
> > > > >> > > > > being
> > > > >> > > > > persisted into the DB with a single SQL statement, causing
> > > > >> millions of
> > > > >> > > > > single
> > > > >> > > > > SQL requests. Prefer batch SQL instead of single record
> > > > >> processing.
> > > > >> > > > >
> > > > >> > > > > * during import/export of categoryOptionCombinations:
> > > > >> > > > > - prefer batch SQL instead of single record processing
> > > > >> > > > > - huge log entries in catalina.out due to several lines of
> > > text
> > > > >> per
> > > > >> > > > > combination
> > > > >> > > > >
> > > > >> > > > > I'd be very happy about comments.
> > > > >> > > > >
> > > > >> > > > > Thanks in advance,
> > > > >> > > > >
> > > > >> > > > > Uwe
> > > > >> > > > >
> > > > >> > > > > _______________________________________________
> > > > >> > > > > Mailing list: https://launchpad.net/~dhis2-users
> > > > >> > > > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> <javascript:;>
> > > > >> > > > > Unsubscribe : https://launchpad.net/~dhis2-users
> > > > >> > > > > More help   : https://help.launchpad.net/ListHelp
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > --
> > > > >> > > > Morten Olav Hansen
> > > > >> > > > Senior Engineer, DHIS 2
> > > > >> > > > University of Oslo
> > > > >> > > > http://www.dhis2.org
> > > > >> > >
> > > > >>
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Mailing list: https://launchpad.net/~dhis2-devs
> > > > > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> > > > > Unsubscribe : https://launchpad.net/~dhis2-devs
> > > > > More help   : https://help.launchpad.net/ListHelp
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Jason P. Pickering
> > > > email: jason.p.pickering@xxxxxxxxx
> > > > tel:+46764147049
> > >
> >
> >
> >
> > --
> > Jason P. Pickering
> > email: jason.p.pickering@xxxxxxxxx
> > tel:+46764147049
>



-- 
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049

Follow ups

References