← Back to team overview

dhis2-devs team mailing list archive

Re: Pivot tables

 

2010/10/27 Bob Jolliffe <bobjolliffe@xxxxxxxxx>

> Attached is a simple skeleton overview diagram of how it *can* be
> done.  I'm not sure if I'm 100% comfortable with the approach yet but
> I can verify that it works (caveat below).
>
> From the diagram, the point ultimately is to populate the
> pivottablecache of the pivottable within the local excel spreadsheet
> with the data from the view in the remote database.  This is what you
> would normally do (in a connected setting) by setting an odbc type
> connection to the postgres database.
>
> What I have done is instead to create a connection to a local csv
> file.  This is done by creating a odc connection description file and
> pointing the pivottable at that.  Selected nippet from example odc
> connection file:
> <xml id='msodc'>
>  <odc:OfficeDataConnection
>  xmlns:odc="urn:schemas-microsoft-com:office:odc"
>  xmlns="http://www.w3.org/TR/REC-html40";>
>  <odc:Connection odc:Type="OLEDB">
>   <odc:ConnectionString>Provider=MSDASQL.1;Persist Security
> Info=False;Data Source=csv;Extended
>
> Properties=&quot;DSN=csv;DefaultDir=C:\Users\bobj\Documents\excel;DriverId=27;FIL=text;MaxBufferSize=2048;PageTimeout=5;&quot;;Initial
> Catalog=C:\Users\bobj\Documents\excel</odc:ConnectionString>
>   <odc:CommandType>Table</odc:CommandType>
>
> <odc:CommandText>`C:\Users\bobj\Documents\excel\test.csv`</odc:CommandText>
>  </odc:Connection>
>  </odc:OfficeDataConnection>
> </xml>
>
> I've taken a 30MB csv file dumped from the
> _VIEW_PIVOTSOURCEROUTINEDATAOU3ALL view of Ola's sample data.
> (Grabbed from h2 with: CALL CSVWRITE('test.csv', 'SELECT * FROM
> _VIEW_PIVOTSOURCEROUTINEDATAOU3ALL');  )
>
> Using the odc above, refreshing the pivottable causes a copy from the
> csv file into the pivot cache which seems pretty quick (4-5 seconds) -
> certainly quicker than pulling it via tcp-odbc-jdbc or whatever.
>
> Note that worksheet can have more than one pivottable and more than
> one pivotcache and that more than one pivottable can share the same
> pivot cache.  I haven't gone into those details.
>
> So the upside is local pivottable refresh is quick and appending
> monthly csv snippets is trivial.  Another positive spinoff in havng
> the datset as a local csv is that you can also pivot with all kinds of
> other tools other than excel (ranging from primitive awk to fancy 'r'
> reshaping - both of which I think I prefer to excel :-) ).  But
> managing this setup is fairly straightforward - 3 files:  the excel
> file, the odc and the csv files which grow with monthly updates (I
> know there must also be a facility to amend changed data but we'll
> cross that bridge ..) .   And there are of course downsides.
>
> These csv files are really pretty big and hugely sub optimal.  The
> normalization of the database has been flattened out for convenience
> of pivoting so we are dealing with tens of thousands of repeated
> strings (datelement names, ou names etc).  From a transport
> perspective this is maybe not too bad - I'm guessing a reasonable
> compression might factor out some of this.   And appending monthly
> chunks wouldn't be too bad.  But they are pretty big files to manage.
>  But I think they are maybe more robust than depending solely on the
> pivotcache as a local primary store.
>
> That's it for now.  Thought to sleep on is whether we can't somehow
> just maintain a local database which replicates the necessary tables
> from the online server.  But its complicated.  Maintaining a couple
> (hopefully not too many) csv files might be easier.
>


Nice. Explicitly separating presentation from data with csv sounds more
manageable. I think we can live with "yearly" pivot tables, in that these
things won't keep growing forever.


>
> Bob
>
> PS.  The caveat.  My csv source for pivottables sort of works but all
> my datavalues are being treated as zero :-(  I think this is probably
> to do with setting the datatype of the field somehow.  I think its
> fixable.
>

Might be related to the fact that DataValue.value is a String. Maybe Ola has
some experience?

Follow ups

References