← Back to team overview

dhis2-devs team mailing list archive

Re: Pivot tables

 

2010/10/27 Lars Helge Øverland <larshelge@xxxxxxxxx>:
>
>
> 2010/10/27 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>
>> Attached is a simple skeleton overview diagram of how it *can* be
>> done.  I'm not sure if I'm 100% comfortable with the approach yet but
>> I can verify that it works (caveat below).
>>
>> From the diagram, the point ultimately is to populate the
>> pivottablecache of the pivottable within the local excel spreadsheet
>> with the data from the view in the remote database.  This is what you
>> would normally do (in a connected setting) by setting an odbc type
>> connection to the postgres database.
>>
>> What I have done is instead to create a connection to a local csv
>> file.  This is done by creating a odc connection description file and
>> pointing the pivottable at that.  Selected nippet from example odc
>> connection file:
>> <xml id='msodc'>
>>  <odc:OfficeDataConnection
>>  xmlns:odc="urn:schemas-microsoft-com:office:odc"
>>  xmlns="http://www.w3.org/TR/REC-html40";>
>>  <odc:Connection odc:Type="OLEDB">
>>   <odc:ConnectionString>Provider=MSDASQL.1;Persist Security
>> Info=False;Data Source=csv;Extended
>>
>> Properties=&quot;DSN=csv;DefaultDir=C:\Users\bobj\Documents\excel;DriverId=27;FIL=text;MaxBufferSize=2048;PageTimeout=5;&quot;;Initial
>> Catalog=C:\Users\bobj\Documents\excel</odc:ConnectionString>
>>   <odc:CommandType>Table</odc:CommandType>
>>
>> <odc:CommandText>`C:\Users\bobj\Documents\excel\test.csv`</odc:CommandText>
>>  </odc:Connection>
>>  </odc:OfficeDataConnection>
>> </xml>
>>
>> I've taken a 30MB csv file dumped from the
>> _VIEW_PIVOTSOURCEROUTINEDATAOU3ALL view of Ola's sample data.
>> (Grabbed from h2 with: CALL CSVWRITE('test.csv', 'SELECT * FROM
>> _VIEW_PIVOTSOURCEROUTINEDATAOU3ALL');  )
>>
>> Using the odc above, refreshing the pivottable causes a copy from the
>> csv file into the pivot cache which seems pretty quick (4-5 seconds) -
>> certainly quicker than pulling it via tcp-odbc-jdbc or whatever.
>>
>> Note that worksheet can have more than one pivottable and more than
>> one pivotcache and that more than one pivottable can share the same
>> pivot cache.  I haven't gone into those details.
>>
>> So the upside is local pivottable refresh is quick and appending
>> monthly csv snippets is trivial.  Another positive spinoff in havng
>> the datset as a local csv is that you can also pivot with all kinds of
>> other tools other than excel (ranging from primitive awk to fancy 'r'
>> reshaping - both of which I think I prefer to excel :-) ).  But
>> managing this setup is fairly straightforward - 3 files:  the excel
>> file, the odc and the csv files which grow with monthly updates (I
>> know there must also be a facility to amend changed data but we'll
>> cross that bridge ..) .   And there are of course downsides.
>>
>> These csv files are really pretty big and hugely sub optimal.  The
>> normalization of the database has been flattened out for convenience
>> of pivoting so we are dealing with tens of thousands of repeated
>> strings (datelement names, ou names etc).  From a transport
>> perspective this is maybe not too bad - I'm guessing a reasonable
>> compression might factor out some of this.   And appending monthly
>> chunks wouldn't be too bad.  But they are pretty big files to manage.
>>  But I think they are maybe more robust than depending solely on the
>> pivotcache as a local primary store.
>>
>> That's it for now.  Thought to sleep on is whether we can't somehow
>> just maintain a local database which replicates the necessary tables
>> from the online server.  But its complicated.  Maintaining a couple
>> (hopefully not too many) csv files might be easier.
>
>
> Nice. Explicitly separating presentation from data with csv sounds more
> manageable. I think we can live with "yearly" pivot tables, in that these
> things won't keep growing forever.
>
>>
>> Bob
>>
>> PS.  The caveat.  My csv source for pivottables sort of works but all
>> my datavalues are being treated as zero :-(  I think this is probably
>> to do with setting the datatype of the field somehow.  I think its
>> fixable.
>
> Might be related to the fact that DataValue.value is a String. Maybe Ola has
> some experience?

Yes it has to do with interpreting fields as numeric rather than
string.  csv driver defaults to string assumption.  Haven't figured
the best place to tell it otherwise yet.  But I think Ola has ..

>



Follow ups

References