dhis2-devs team mailing list archive

Thread
Date
Re: Pivot tables

To: Bob Jolliffe <bobjolliffe@xxxxxxxxx>
From: Lars Helge Øverland <larshelge@xxxxxxxxx>
Date: Fri, 22 Oct 2010 12:14:49 +0200
Cc: DHIS 2 developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <AANLkTi=AdH=9FfF6Q3Y=Ecp-O0Mb0c1Ow9gxMGLCMc2j@mail.gmail.com>
Hi Bob

Sorry for not returning to this earlier, been battling with windows 64 bit
odbc drivers.

2010/10/22 Bob Jolliffe <bobjolliffe@xxxxxxxxx>

> Hi Lars, Ola and all
>
> Thinking about exporting pivot tables .. we need to pen up some
> requirements which we can get into a blueprint.  While we are
> considering that, I've given some thoughts below.
>
> My initial thought was that we would construct a pivot table with a
> jdbc connection like we would normally do manually, and then simply
> populate the pivot cache so that the spreadsheet would be
> "free-standing".  On reflection that is probably neither the best nor
> the simplest thing to do.  Playing around with excel (in wine :-)  I
> figured it is actually much more straightforward to have two sheets;
> one for the data and one for the pivot table.  The pivot table would
> operate on the data in the other worksheet.  This is more explicit
> than simply hiding the data in the pivot cache.  Though I need to test
> this a bit more.  If the pivotcache is an obligatory data store then
> it might not make sense to carry the data twice.  The pivot table
> constructs are fairly complex but not outlandishly so - once I'm done
> unpicking, I will write up a short description of the anatomy of a
> pivot table so we can see what needs to be done.
>
>
This sounds just fine to me.


> One possibility would be that the pivot table could be generated as
> part of the export of a report table ie. optionally export to excel or
> export to excel-with-pivot.
>
> Given that there is no advance indication of the columns in a report
> table, a challenge will be how and when to define the pivot model -
> ie. the pivot fields -  and how to persist that model (we don't want
> users to have to recreate the pivot model each time).  This will
> probably require an extra object in our data model (PivotModel) which
> defines the row and column pivot fields and data field, where a
> ReportTable can have a 0-* relationship with 0 or more PivotModels.  A
> possible beneficial side effect of this could be that we also leave
> open the ability to generate the pivot table rendition with things
> other than excel.  I don't see this as an immediate requirement but
> useful to have the pivotmodel abstraction anyway.  In reality the
> design of the pivotmodel will be based on what excel requires.
>
> Generating the excel spreadsheet off the reporttable+pivotmodel will
> produce an OOXML xlsx file - actually a bundle of xml streams which
> need to be zipped.  Its a bit unfortunate that M$ defines a flat file
> single xml for other OPC office documents, but sadly not excel.
> Dealing with a single flat file is much easier than zip containers
> with multiple streams but so be it.  At least they end up smaller.
> Though given that these reporttables can be very large I don't think
> an xslt approach is really the most efficient at least for producing
> the data worksheet.  It might make sense if the source data was xml,
> but even then probably not.  So simple iteration through the table
> rows with a stax writer will work best.  The second sheet (the pivot
> sheet) would just be a serialization of the pivot model.
>
> We will probably have to implement the excel optimization of
> sharedstrings (where literal strings are not written into the
> spreadsheet - just index values from a table).  This adds complexity
> but I think we are talking pretty large files here.  If we don't do
> the sharedstrings optimization, Excel will do it anyway the first time
> it opens and saves the file, but I suspect we will have to do it up
> front.
>
> Regarding UI and PivotModel I want to take a look at our existing work
> on web pivot tables to see whether we can't leverage some of this.  If
> we follow roughly the outline above the user would have the option to
> define 0 or more pivottable definitions at the time of creating, or
> editing, a reporttable.  What do you think?  Is the above close to
> what you guys have in mind.  Lets hammer out some requirements and
> create the blueprint.
>
>
We won't base this on report tables as (at least I) can't see any huge
benefits. The excel pivot tables themselves are more capable of having
dimensions on columns, rows, filters than our report tables. In fact we
usually use a SQL view as datasource for the pivot tables - which has a
fixed number of columns. The view is based on the aggregateddatavalue table
and joins in the dataelement, orgunit, period and periodtype tables. In
addition we join in a series of resource tables to get information about the
orgunit structure, dataelement groupset structure, orgunit groupset
structure (additional dimensional information!) to make the pivot table more
valuable for analysis. (This also goes for indicators.)

Since this should be more of a offline analysis tool I think exporting all
dataelements/indicators will be most appropriate. The user interface could
simply request a start and endate (and/or relative periods), indicators vs
dataelements, parent organisation unit and organisation unit level. Ola will
maybe have some views here...

When it comes to SQL views we have recently implemented a function for this
where SQL views can be persisted as an application object and re-generated
at any time. This means we can compile in a few default SQL views which can
be used for this in DHIS. As you say an iteration over the rows in this view
with eg. a stax writer would work.

Ola/I have made some samples.

Demo database with SQL views:
http://folk.uio.no/larshelg/files/dhis2sl.backup
Small pivot table based on the aggregateddatavalue orgunit level 2 view:
http://folk.uio.no/larshelg/files/dataou2.xlsx
Bigger pivot table based on the aggregateddatavalue orgunit level 3 view:
http://folk.uio.no/larshelg/files/dataou3.xlsx

If you want to create more pivot tables install postgres on windows here:
http://www.postgresql.org/download/
Then install psql odbc windows here:
http://www.postgresql.org/ftp/odbc/versions/msi/
You can create a data source directly from Excel by going to Data - From
other sources - From Microsoft Query and choosing the postgres odbc driver
under 2. (Stay clear of x64! :)

Your thoughts on shared string optimizations and zipped stream bundles sound
fine to me, in any case you know better here:)

regards, Lars
Follow ups

Re: Pivot tables
From: Ola Hodne Titlestad, 2010-10-22
References

Pivot tables
From: Bob Jolliffe, 2010-10-22