dhis2-devs team mailing list archive

Thread
Date
Pivot tables

To: Lars Helge Øverland <larshelge@xxxxxxxxx>
From: Bob Jolliffe <bobjolliffe@xxxxxxxxx>
Date: Fri, 22 Oct 2010 09:43:48 +0100
Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
Hi Lars, Ola and all

Thinking about exporting pivot tables .. we need to pen up some
requirements which we can get into a blueprint.  While we are
considering that, I've given some thoughts below.

My initial thought was that we would construct a pivot table with a
jdbc connection like we would normally do manually, and then simply
populate the pivot cache so that the spreadsheet would be
"free-standing".  On reflection that is probably neither the best nor
the simplest thing to do.  Playing around with excel (in wine :-)  I
figured it is actually much more straightforward to have two sheets;
one for the data and one for the pivot table.  The pivot table would
operate on the data in the other worksheet.  This is more explicit
than simply hiding the data in the pivot cache.  Though I need to test
this a bit more.  If the pivotcache is an obligatory data store then
it might not make sense to carry the data twice.  The pivot table
constructs are fairly complex but not outlandishly so - once I'm done
unpicking, I will write up a short description of the anatomy of a
pivot table so we can see what needs to be done.

One possibility would be that the pivot table could be generated as
part of the export of a report table ie. optionally export to excel or
export to excel-with-pivot.

Given that there is no advance indication of the columns in a report
table, a challenge will be how and when to define the pivot model -
ie. the pivot fields -  and how to persist that model (we don't want
users to have to recreate the pivot model each time).  This will
probably require an extra object in our data model (PivotModel) which
defines the row and column pivot fields and data field, where a
ReportTable can have a 0-* relationship with 0 or more PivotModels.  A
possible beneficial side effect of this could be that we also leave
open the ability to generate the pivot table rendition with things
other than excel.  I don't see this as an immediate requirement but
useful to have the pivotmodel abstraction anyway.  In reality the
design of the pivotmodel will be based on what excel requires.

Generating the excel spreadsheet off the reporttable+pivotmodel will
produce an OOXML xlsx file - actually a bundle of xml streams which
need to be zipped.  Its a bit unfortunate that M$ defines a flat file
single xml for other OPC office documents, but sadly not excel.
Dealing with a single flat file is much easier than zip containers
with multiple streams but so be it.  At least they end up smaller.
Though given that these reporttables can be very large I don't think
an xslt approach is really the most efficient at least for producing
the data worksheet.  It might make sense if the source data was xml,
but even then probably not.  So simple iteration through the table
rows with a stax writer will work best.  The second sheet (the pivot
sheet) would just be a serialization of the pivot model.

We will probably have to implement the excel optimization of
sharedstrings (where literal strings are not written into the
spreadsheet - just index values from a table).  This adds complexity
but I think we are talking pretty large files here.  If we don't do
the sharedstrings optimization, Excel will do it anyway the first time
it opens and saves the file, but I suspect we will have to do it up
front.

Regarding UI and PivotModel I want to take a look at our existing work
on web pivot tables to see whether we can't leverage some of this.  If
we follow roughly the outline above the user would have the option to
define 0 or more pivottable definitions at the time of creating, or
editing, a reporttable.  What do you think?  Is the above close to
what you guys have in mind.  Lets hammer out some requirements and
create the blueprint.

Regards
Bob
Follow ups

Re: Pivot tables
From: Lars Helge Øverland, 2010-10-22