← Back to team overview

dhis2-devs team mailing list archive

Re: Pivot tables

 

2010/10/25 Lars Helge Øverland <larshelge@xxxxxxxxx>:
>
> Hi thanks for the nice update, interesting. The thinking sounds very
> reasonable to me.
> Ola and I will work a bit on the optimization of SQL view. One issue here is
> that when writing "generic" sql queries for tables like
> orgunitgroupsetstructure, one needs to select * (all) columns in order to be
> portable because the table has each column per groupset, ie. its dynamic.
> But I guess we can fix these things when generating the SQL from Java /
> DHIS.

Yes. The more we optimize these (in terms of number of fields) the
less enormous the excel file will be.  And given that excel holds all
in memory we must do the best we can.

> Re size, a good thing is that a typical pivot table won't contain the whole
> database, rather the data for a district and its facilities. We want to be
> able to compare with other districts (maybe inside the same province only)
> but for those we only need the aggregated data at the district level (not
> facility data) - which means a lot less data.

I'm trying to focus on the particular problems related to enormous
spreadsheets.  If they are small we don't really have much of a
problem either way.  But if we can deal with the really large ones
then we can maybe deal with medium large ones quite snappily.

> Its also interesting to see how excel uses a "minimal" xml (<r><x
> v="0"/></r> etc). We could consider that for DXF 2 datavalues.
> Size/performance vs readability.

This is an OOXML "feature" which has been heavily criticized by almost
all in the xml world.  Not being religious, I am in two minds about
it.  I think it might be reasonable for the datavalues in dxf, but I'd
stick my heels in if we started doing the same thing for metadata.  In
fact I think there is just such a minimal syntax which I have
suggested to Jo some time back as being maybe useful for a lightweight
mobile version of sdmx.  I wouldn't have a problem with:

<dv de='32' o='3' p='4' d=''2' />  <!-- where 'd' = disaggregation -->

(I'd still like to see the other attributes, comment, createdby etc
being aggregated up to a 'datavalueset' but that's another story)

Regarding ooxml I think there is a better justification doing it for a
data oriented document format like a spreadsheet, than there is for
word docs and presentations.  Certainly M$ has achieved significant
performance benefits to Excel in terms of load times.  File size is
not that significant a benefit because the compressed files shouldn't
be that different.  What also really helps is the use of
sharedstrings.  You can see that they do a similar shared strings type
optimization in the pivotcache.  All of which allows us to stretch the
use of spreadsheets into domains where they shouldn't really go.

Cheers
Bob

>
>
> Lars
>



Follow ups

References