← Back to team overview

dhis2-users team mailing list archive

Re: [Dhis2-devs] Creation of CategoryOptionCombinations

 

Hi Lars,

I'm not using them for data-element disaggregation, I also understand that this
would be a bit awkward, as this would technically result in millions of
dataElements, if I understood the concept correctly. 

I am using them as categoryCombination for a dataSet. In my current
understanding this is the one structure in DHIS2 that comes closest to classical
cubes with facts (aka dataelements) and dimensions (aka categories). Usecase
would be aggregated patient data with dataElements like {Headcount, Weight,
Size, FeesPaid} in the dataset with categories like {AgeGroup, Gender, JobGroup,
InsuranceScheme}. The data are not entered via data entry form (which would also
be awkward), but imported via api from a group-by sql on a source system with
relational DB. This enables reports/pivots/graphs like "Number of patients by
AgeGroup and Gender" or "Fees Collected by JobGroup and InsuranceScheme" or any
other combination of the categories. Works nicely for five dimensions so far.

Of course I could think about creating dataElements per AgeGroup or even
AgeGroup/Gender combination (which makes a lot of sense in the context of manual
data entry from manual summary reports), but for the sake of flexibility I
prefer datamodels with few dataElements and lots of dimensions.

Hope that clarifies the scenario a bit ...

Thanks,

Uwe

> Lars Helge Øverland <lars@xxxxxxxxx> hat am 7. Juni 2016 um 20:20 geschrieben:
> 
> 
> Hi Uwe,
> 
> I agree ten dimensions is not much per se, but you might say 10 categories
> for disaggregation per data element is a lot. Would it be possible to
> redesign the model a bit, and rely more on data element group sets + groups
> where you classify your data elements? This as opposed to having everything
> as categories / options.
> 
> 5 million option combos I think will in any case take some time to generate
> and maintain. If you are willing to share some more info on your use-case
> perhaps someone can offer some views.
> 
> regards,
> 
> Lars
> 
> 
> On Tue, Jun 7, 2016 at 12:28 PM, Morten Olav Hansen <morten@xxxxxxxxx>
> wrote:
> 
> > Hi Uwe
> >
> > The improvements are mainly for speed and validation. Yes, we are now (in
> > 2.24) introducing versioned web-api, so that endpoint importer will be
> > available until 2.26 (we will support 3 versions). In 2.24, the same
> > endpoint is available at /api/24/metadata.
> >
> > If you are using cURL, or another utility.. the import part would be the
> > same, but the UI in 2.23 can not be used, as it's hardcoded to legacy
> > importer.
> >
> > --
> > Morten Olav Hansen
> > Senior Engineer, DHIS 2
> > University of Oslo
> > http://www.dhis2.org
> >
> > On Tue, Jun 7, 2016 at 11:25 PM, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> >
> >> Hi Morten,
> >>
> >> no, i didn't. What would be the procedure for that? Importing Categories,
> >> Options and CategoryCombinations via api and having DHIS2 generate the
> >> CategoryOptionCombinations? Would that bring about any change at all or
> >> does the
> >> importer use different libs for generating the COCs?
> >>
> >> btw. is the 23 in the api link valid for future dhis2 versions? I noticed
> >> it in
> >> a few api descriptions recently ...
> >>
> >> Regards, Uwe
> >>
> >> > Morten Olav Hansen <morten@xxxxxxxxx> hat am 7. Juni 2016 um 18:50
> >> > geschrieben:
> >> >
> >> >
> >> > Hi Uwe
> >> >
> >> > Did you try out new importer? Available as /api/23/metadata in 2.23
> >> >
> >> > On Tuesday, 7 June 2016, Uwe Wahser <uwe@xxxxxxxxx> wrote:
> >> >
> >> > > Dear devs,
> >> > >
> >> > > I am experiencing problems when handling category combinations. Our
> >> > > protoype
> >> > > with 5 dimensions went through the process of generating
> >> > > categoryOptionCombinations (~20.000 records) quite well. 7 dimensions
> >> > > (~400.000)
> >> > > worked as well, although it took a very long time.
> >> > >
> >> > > Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
> >> > > categoryOptionCombinations) and the process dies without further
> >> notice.
> >> > > Last
> >> > > words in catalina.out:
> >> > > * INFO  2016-06-07 13:29:33,783 Building object-bridge maps
> >> (preheatCache:
> >> > > true,
> >> > > 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
> >> > > * INFO  2016-06-07 13:29:36,779 Building object-bridge maps took 2.99
> >> > > seconds.
> >> > > (DefaultObjectBridge.java [http-bio-8180-exec-15])
> >> > > * INFO  2016-06-07 13:29:36,896 'admin' update
> >> > > org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership,
> >> uid:
> >> > > SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])
> >> > >
> >> > > Ten dimensions with not extraordinarily big option sets is actually
> >> not
> >> > > unusual
> >> > > and rather slim for multi-dimensional data-models in data warehouses,
> >> so
> >> > > I'd
> >> > > expect DHIS2 to be able to handle this easily.
> >> > >
> >> > > Could of course be a memory problem (tried up to 14g for tomcat on a
> >> 4-core
> >> > > Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with
> >> other
> >> > > parameters, I am hoping to get some hints on known limitations or
> >> > > workarounds
> >> > > from you (not allowed: reducing the number of options or categories,
> >> > > sql-hacks
> >> > > :-) ). Is there any info on whether optimizations on this process are
> >> being
> >> > > planned in the kernel?
> >> > >
> >> > > Some observations on the process:
> >> > >
> >> > > * during generation (either when saving the categoryCombination or in
> >> the
> >> > > data
> >> > > maintenance menu):
> >> > > - long names - cOCs are generated with generated names that are
> >> getting
> >> > > extremely long as they are mere concats of the involved
> >> categoryOptions.
> >> > > Could
> >> > > there be an option to just use the codes as basis or to leave away the
> >> > > names
> >> > > completely? Could be one reason for a memory problem and performance
> >> > > issues.
> >> > > - long log entries - every single entry is logged in catalina.out with
> >> > > several
> >> > > lines of text, causing catalina to become extremely big.
> >> > > - during execution lots of Java-memory are being used and no
> >> DB-memory,
> >> > > which
> >> > > looks to me as if all the logic is happening in the java machine. It
> >> might
> >> > > be
> >> > > more usefull to transfer more logic into SQLs to the DB (e.g. use DB
> >> > > cross-joins
> >> > > for combining options) as the DB will be more efficient.
> >> > > - because of the log entries I assume that every single combination is
> >> > > being
> >> > > persisted into the DB with a single SQL statement, causing millions of
> >> > > single
> >> > > SQL requests. Prefer batch SQL instead of single record processing.
> >> > >
> >> > > * during import/export of categoryOptionCombinations:
> >> > > - prefer batch SQL instead of single record processing
> >> > > - huge log entries in catalina.out due to several lines of text per
> >> > > combination
> >> > >
> >> > > I'd be very happy about comments.
> >> > >
> >> > > Thanks in advance,
> >> > >
> >> > > Uwe
> >> > >
> >> > > _______________________________________________
> >> > > Mailing list: https://launchpad.net/~dhis2-users
> >> > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx <javascript:;>
> >> > > Unsubscribe : https://launchpad.net/~dhis2-users
> >> > > More help   : https://help.launchpad.net/ListHelp
> >> > >
> >> >
> >> >
> >> > --
> >> > --
> >> > Morten Olav Hansen
> >> > Senior Engineer, DHIS 2
> >> > University of Oslo
> >> > http://www.dhis2.org
> >>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs
> > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-devs
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
> 
> 
> -- 
> Lars Helge Øverland
> Lead developer, DHIS 2
> University of Oslo
> Skype: larshelgeoverland
> lars@xxxxxxxxx
> http://www.dhis2.org <https://www.dhis2.org/>


References