dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #08123
Re: Pivot tables
BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; }Hi
Bob,
Export (ALL) failed with the error below. This began on upgrade
from previous to dhis2.0.5.
org.hibernate.InstantiationException: Cannot instantiate abstract
class or interface: org.hisp.dhis.source.Source
at
org.hibernate.tuple.PojoInstantiator.instantiate(PojoInstantiator.java:101)
at
org.hibernate.tuple.PojoInstantiator.instantiate(PojoInstantiator.java:123)
at
org.hibernate.tuple.entity.AbstractEntityTuplizer.instantiate(AbstractEntityTuplizer.java:606)
at
org.hibernate.persister.entity.AbstractEntityPersister.instantiate(AbstractEntityPersister.java:3888)
at
org.hibernate.impl.SessionImpl.instantiate(SessionImpl.java:1412)
at
org.hibernate.impl.SessionImpl.instantiate(SessionImpl.java:1401)
at
org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1342)
at org.hibernate.loader.Loader.getRow(Loader.java:1251)
at
org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:619)
at org.hibernate.loader.Loader.doQuery(Loader.java:745)
at
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:270)
at org.hibernate.loader.Loader.doList(Loader.java:2294)
at
org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2172)
at org.hibernate.loader.Loader.list(Loader.java:2167)
at
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:119)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1706)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347)
at
org.hisp.dhis.source.hibernate.HibernateSourceStore.getAllSources(HibernateSourceStore.java:98)
at
org.hisp.dhis.organisationunit.DefaultOrganisationUnitService.getAllOrganisationUnits(DefaultOrganisationUnitService.java:164)
at
org.hisp.dhis.organisationunit.DefaultOrganisationUnitService.getOrganisationUnits(DefaultOrganisationUnitService.java:169)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:309)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at
org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy12.getOrganisationUnits(Unknown Source)
at
org.hisp.dhis.importexport.dxf.converter.OrganisationUnitConverter.write(OrganisationUnitConverter.java:115)
at
org.hisp.dhis.importexport.ExportPipeThread.doRun(ExportPipeThread.java:164)
at
org.hisp.dhis.system.process.OpenSessionThread.run(OpenSessionThread.java:58)
Exception in thread "Thread-21" java.lang.RuntimeException: Writing
failedorg.hibernate.InstantiationException: Cannot instantiate
abstract class or interface: org.hisp.dhis.source.Source
at
org.hisp.dhis.importexport.ExportPipeThread.doRun(ExportPipeThread.java:188)
at
org.hisp.dhis.system.process.OpenSessionThread.run(OpenSessionThread.java:58)
On Fri 22/10/10 1:10 PM , Ola Hodne Titlestad olati@xxxxxxxxxx
sent:
2010/10/22 Lars Helge Øverland
Hi Bob
Sorry for not returning to this earlier, been battling with windows
64 bit odbc drivers.
2010/10/22 Bob Jolliffe
Hi Lars, Ola and all
Thinking about exporting pivot tables .. we need to pen up some
requirements which we can get into a blueprint. While we are
considering that, I've given some thoughts below.
My initial thought was that we would construct a pivot table with a
jdbc connection like we would normally do manually, and then simply
populate the pivot cache so that the spreadsheet would be
"free-standing". On reflection that is probably neither the best
nor
the simplest thing to do. Playing around with excel (in wine :-) I
figured it is actually much more straightforward to have two sheets;
one for the data and one for the pivot table. The pivot table would
operate on the data in the other worksheet. This is more explicit
than simply hiding the data in the pivot cache. Though I need to
test
this a bit more. If the pivotcache is an obligatory data store then
it might not make sense to carry the data twice. The pivot table
constructs are fairly complex but not outlandishly so - once I'm
done
unpicking, I will write up a short description of the anatomy of a
pivot table so we can see what needs to be done.
This sounds just fine to me.
This might work, just have in mind that we will then need several
pairs of data+pivot worksheets as we want to have multiple pivot
tables in an Excel file. Also have in mind that we can talk about up
to around 1.500.000 rows (that's when my laptop starts to complain at
least) of data here, so these data sheets will be large. Typically,
number of rows of data will be somewhere around 200-600K.
One possibility would be that the pivot table could be generated
as
part of the export of a report table ie. optionally export to excel
or
export to excel-with-pivot.
Given that there is no advance indication of the columns in a report
table, a challenge will be how and when to define the pivot model -
ie. the pivot fields - and how to persist that model (we don't want
users to have to recreate the pivot model each time). This will
probably require an extra object in our data model (PivotModel)
which
defines the row and column pivot fields and data field, where a
ReportTable can have a 0-* relationship with 0 or more PivotModels.
A
possible beneficial side effect of this could be that we also leave
open the ability to generate the pivot table rendition with things
other than excel. I don't see this as an immediate requirement but
useful to have the pivotmodel abstraction anyway. In reality the
design of the pivotmodel will be based on what excel requires.
Generating the excel spreadsheet off the reporttable+pivotmodel will
produce an OOXML xlsx file - actually a bundle of xml streams which
need to be zipped. Its a bit unfortunate that M$ defines a flat
file
single xml for other OPC office documents, but sadly not excel.
Dealing with a single flat file is much easier than zip containers
with multiple streams but so be it. At least they end up smaller.
Though given that these reporttables can be very large I don't think
an xslt approach is really the most efficient at least for producing
the data worksheet. It might make sense if the source data was xml,
but even then probably not. So simple iteration through the table
rows with a stax writer will work best. The second sheet (the pivot
sheet) would just be a serialization of the pivot model.
We will probably have to implement the excel optimization of
sharedstrings (where literal strings are not written into the
spreadsheet - just index values from a table). This adds complexity
but I think we are talking pretty large files here. If we don't do
the sharedstrings optimization, Excel will do it anyway the first
time
it opens and saves the file, but I suspect we will have to do it up
front.
Regarding UI and PivotModel I want to take a look at our existing
work
on web pivot tables to see whether we can't leverage some of this.
If
we follow roughly the outline above the user would have the option
to
define 0 or more pivottable definitions at the time of creating, or
editing, a reporttable. What do you think? Is the above close to
what you guys have in mind. Lets hammer out some requirements and
create the blueprint.
We won't base this on report tables as (at least I) can't see any
huge benefits. The excel pivot tables themselves are more capable of
having dimensions on columns, rows, filters than our report tables.
In fact we usually use a SQL view as datasource for the pivot tables
- which has a fixed number of columns. The view is based on the
aggregateddatavalue table and joins in the dataelement, orgunit,
period and periodtype tables. In addition we join in a series of
resource tables to get information about the orgunit structure,
dataelement groupset structure, orgunit groupset structure
(additional dimensional information!) to make the pivot table more
valuable for analysis. (This also goes for indicators.)
Since this should be more of a offline analysis tool I think
exporting all dataelements/indicators will be most appropriate. The
user interface could simply request a start and endate (and/or
relative periods), indicators vs dataelements, parent organisation
unit and organisation unit level. Ola will maybe have some views
here...
Agree with Lars, I don't think report tables should be used as
source for pivot tables, at least not for the standard "ALL data"
pivot tables. For more ad-hoc use the pivots generated off a report
table might be useful as an export option, but let's start with the
most common use cases, the big pivot tables with all the data for
browsing and data analysis.
The pivot source views (sql) that we have used e.g. in the newly
distributed pivots for the demo database can be found in an email to
the list on Sep 21, together with the pivots and the database dumps:
http://www.mail-archive.com/dhis2-devs@xxxxxxxxxxxxxxxxxxx/msg07341.html
[3] (from Sep 21)
These views have been stable for a long time and describe the fields
of the basic pivot tables that are needed for DHIS data analysis.
These set of fields go back as far as DHIS 1.3, so the basic pivot
tables are stable and we know their structure, basically the core
data dimensions of DHIS2 (see chapter 2 in user manual).
What always will be variables are the orgunit level of aggregation
for the data or indicator values and the period type. You will see
these two are specified at the end of every where clause in the pivot
views. These are used as a filters when fetching data from the data
mart.
If you open the pivot table for download in that email you can
follow my reasoning below:Typically we have two tables for monthly
(raw data+indicator data) for the lowest level plus at least two more
for a higher level. E.g. in the demo based on SL we use the health
facility level (level 4) and the Chiefdom level (3). The reason for
using Chiefdom level as well is that these tables are much faster to
manipulate since there are a lot less data when shaving away the
lowest level. How many levels and which levels to put in the pivot
table will vary from place to place. In addition to these at least
four tables, there is often at least one table for population data,
which has a yearly period type and therefore needs a separate view
and table. So for the demo database we have 5 basic tables which
covers all the data for data elements and indicators. This set of
pivot tables is what I would define as a minimum pivot table setup
for a DHIS database. Over time (depending on the amount of data
collected) it might be necessary to split up these basic tables by
year since they can get too big for excel to handle. Such a split can
be facilitated by a simple from and to filter on the start date in the
period field in the sql.
Then there can be many more customised pivot tables which make use
of additional dimensions to the data like the data element
categories, and data element and orgunit group sets. The simplest
approach there is to (still talking sql) do a select * from the
various resource tables. Then you will have a data source for all
other customised tables (using the same period type and aggregation
level). Working with data element categories in pivot tables it
doesn't make sense to use too many at the same time, and rather work
on data elements that share the same categories (within or across
datasets). You can see in the demo pivot table file that we have set
up multiple tables making use of data element categories (EPI, RCH,
HIV etc.). These are all using the same source data (chiefdom level
monthly raw data), but are using different pivot fields (columns in
the result set). The total number of available fields are the same
for all these custom chiefdom tables, but we are only using a small
subset of these fields in each table. This means that the same data
source could also be used for the basic tables (the two chiefdom
tables at least), and just select even fewer fields in the table. The
users can also easily add more fields to a table in Excel, using drag
and drop functionality.
It seems to me that there is a need to separate between the source
data and the pivot structure, to reuse the source data in multiple
tables. I think that is how Excel is thinking too, but it doesn't
always seem to work that way, e.g. when creating a copy of an
existing worksheet with a pivot table. I am not sure how much of
this customisation we need to support inside DHIS2 and how much we
can leave to Excel (which has nice tools for copying tables, drag and
drop pivot fields etc.). First we need to come up with something that
works for the basic tables and then look at these more custom use
cases.
When it comes to pivot tables using period types other than monthly
or yearly (both covered in the basic tables listed above), like
six-monthly, quarterly or weekly these are also important to support.
The pivot fields are the same as in the basic tables, it is just a
different periodtype filter on the data source, so these might be
easier to accommodate.
Looking at the next steps, to me the key new features that I would
like to see with regards to pivot tables are:
1) a pivot table update service, where users that are on the online
server (no local install) can update their local pivot tables against
the online DHIS server Since a pivot table can contain months and
years of data we do not want to download the full tables every month,
but simply append the latest month of data to the existing local pivot
tables
2) a pivot generator, where the users can download a new pivot
table file directly off a DHIS2 server without having to do any local
configuration etc. , and then later use the update service to update
their tables
with 1) being the most critical and urgent, but I realise that we
might need 2) first, in order to deal with a standard set of tables
and pivot data sources in 1).
Ola --------- When it comes to SQL views we have recently
implemented a function for this where SQL views can be persisted as
an application object and re-generated at any time. This means we can
compile in a few default SQL views which can be used for this in DHIS.
As you say an iteration over the rows in this view with eg. a stax
writer would work.
Ola/I have made some samples.
Demo database with SQL views:
http://folk.uio.no/larshelg/files/dhis2sl.backup [4] Small pivot
table based on the aggregateddatavalue orgunit level 2 view:
http://folk.uio.no/larshelg/files/dataou2.xlsx [5]Bigger pivot table
based on the aggregateddatavalue orgunit level 3 view:
http://folk.uio.no/larshelg/files/dataou3.xlsx [6]
If you want to create more pivot tables install postgres on windows
here: http://www.postgresql.org/download/ [7] Then install psql odbc
windows here: http://www.postgresql.org/ftp/odbc/versions/msi/ [8]
You can create a data source directly from Excel by going to Data -
>From other sources - From Microsoft Query and choosing the postgres
odbc driver under 2. (Stay clear of x64! :)
Your thoughts on shared string optimizations and zipped stream
bundles sound fine to me, in any case you know better here:)
regards, Lars
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs [9]
Post to :
Unsubscribe : https://launchpad.net/~dhis2-devs [11]
More help : https://help.launchpad.net/ListHelp [12]
Links:
------
[3]
http://www.mail-archive.com/dhis2-devs@xxxxxxxxxxxxxxxxxxx/msg07341.html
[4] http://folk.uio.no/larshelg/files/dhis2sl.backup
[5] http://folk.uio.no/larshelg/files/dataou2.xlsx
[6] http://folk.uio.no/larshelg/files/dataou3.xlsx
[7] http://www.postgresql.org/download/
[8] http://www.postgresql.org/ftp/odbc/versions/msi/
[9] https://launchpad.net/%7Edhis2-devs
[11] https://launchpad.net/%7Edhis2-devs
[12] https://help.launchpad.net/ListHelp