dhis2-devs team mailing list archive

Thread
Date
Fwd: [OPENMRS-DEV] Pivot Agent

To: DHIS 2 developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
From: Knut Staring <knutst@xxxxxxxxx>
Date: Fri, 26 Feb 2010 22:44:13 +0100
In-reply-to: <4B861E6A.3020409@openmrs.org>
Interesting summary on issues around reporting from OpenMRS which are
relevant to the DHIS community as well. Bob and Ryan has responded to that
list with a status update on the OMRS-DHIS integration using SDMX.

---------- Forwarded message ----------
From: Justin Miranda <justin@xxxxxxxxxxx>
Date: Thu, Feb 25, 2010 at 7:53 AM
Subject: Re: [OPENMRS-DEV] Pivot Agent
To: openmrs-devel-l@xxxxxxxxxxxxxxxxxx


 Andy

Thanks for bringing this up again.  The short answer is:  no.  :(

The long epic answer is that we don't really have a clear path at the
moment, but there are lots of off-the-shelf solutions out there and probably
enough interest amongst the community to explore those solutions
collaboratively.  There's a "data warehouse" project somewhere on the
OpenMRS roadmap, but unfortunately no resources or time to work on it, so I
think it's a matter of getting a bunch of implementers and developers
together to work on it.  And not to burst anyone's bubble, but the
"one-size-fits-all" data warehouse solution probably doesn't exist.  There
are far too many complex reporting use cases that need to be solved.   So,
in most cases, the best we can hope for is being able to collaborate and
share some ETL code/scripts, some dimensional data modeling techniques, and
a whole lot of best practices.

With that said, I'm really excited about working on a DWH project in the
future.  I have spent more time than I care to admit researching and
evaluating different data warehousing concepts and solutions.  I was hoping
to get a better understanding for when/if we try to roll our own within the
OpenMRS, but while I've pushed to build this kind of tool within the
reporting framework in the past, I think it would be the least ideal
solution in the long run.  Therefore, I think implementers should work with
developers to explore some of the ETL / data warehouse solutions that exist
(I've noted some of the better open source systems below).

With respect to solving this within the reporting framework, we've already
tried to roll our own flat data model solution with the Data Export tool
that Ben wrote.  And while the data export tool is simple, elegant, and
somewhat expressive, it doesn't scale well over medium to large data sets
(on the order of a few 1000 to 10,000 patients).  Mike, Darius and I have
spent a considerable amount of time talking through and trying to design a
de-normalized ("flat model") solution within the reporting framework that is
scalable.  But unfortunately, we've realized over and over again that the
scalability issues are too difficult to overcome with a homegrown Java-based
solution.

For one, a transactional data model will never perform well when we scale to
large datasets and require queries with lots of joins, self-joins, and
sub-queries.  And secondly, even if the time to pull the data was
reasonable, executing the complex logic that our reports require (at the
Java layer) will also be a performance bottleneck.  The logic service will
work well for individual patients, but not for large cohorts of patients.
The idea is to add a cache to help improve performance, but I don't think
anyone would argue that this would be an ideal solution.  That's not a flaw
within the design of the logic service, but rather a consequence of using a
normalized data model (especially one that uses EAV to the extent that we
do).  And there's not much we can do within a Java application to improve
the performance.  The fact is, reporting on a transactional system just
doesn't scale well.  And there's no amount of code we could write to solve
those problems -- in fact, we've realized that we'd essentially be
re-writing a database system if we tried.

That doesn't mean we're not going to provide some essential tools for
flattening the data -- like a simple front end to the logic service that
allows users to execute a set of logic rules for a set of patients.  Or a
module that runs ETL scripts to transform data into a
de-normalized/dimensional data model (both of which have been on my
back-burner for a while).  Those tools will eventually be developed (I
hope), but I think our whole notion of reporting needs to be turned on its
head and we need to re-focus on designing / building data warehouses for
specific implementations.  It seems clear to me that the answer is (1) we
need to stop trying to boil the ocean, building our own generalizable
frameworks and (2) we need to work together on solving collective reporting
problems through the use of existing BI tools, and then sharing our
successes back to the community.

I'm not exactly sure what your specific end goals are, but it's probably
safe to say that ALL of the tools below should be included in your reporting
solution.

    * OpenMRS reporting framework for indicator reports, exportable data
sets, and simple patient lists (like data quality reports).
    * DHIS 2.0 (plus an OpenMRS indicator export tool) for storing/querying
aggregate data.
    * Business Intelligence suites (like Pentaho) to provide all aspects of
reporting (dashboards, ETL, OLAP, ad-hoc query tools, etc).
    * Column-oriented databases (*) like InfiniDB, InfoBright, MonetDB,
LucidDB.

Unfortunately, right now, the first bullet is the only reporting solution
that is actively being worked on.  As a community, we need to realize that
this is not going to be solved by developers alone.  I think we need to put
our heads together and try to work toward robust DWH solutions for specific
implementation that integrate existing tools in ways that can be replicated,
rather than trying to continue to roll our own generalizable reporting
frameworks that try to solve everyone's problems.

Apologies for the long email -- thanks to those that actually read this far.
:)

Justin

(*) http://en.wikipedia.org/wiki/Column-oriented_DBMS


Andrew Kanter wrote:

 Folks, I have mentioned this before, but am running into this issue again
and again... and I am not yet familiar with the ins and outs of the new
reporting framework... However, have we considered an overall agent which
would produce separate relational tables which can be used for reporting?
What we need to do is have these created in a separate mysql DB for
permission purposes (only allow ODBC access to de-identified data, for
example). The pivoting from the EAV database to flat file tables is
critical, and eventually we also want this as a way to create a centralized
data warehouse.

 Any further ideas? Hamish, is this where you are going with the reporting
framework?  We are still on 1.4.6 (avoiding the GUIDs so we can still work
directly with the DB).

 Thanks!
 Andy
--------------------
Andrew S. Kanter, MD MPH

- Director of Health Information Systems/Medical Informatics
Millennium Villages Project, Earth Institute, Columbia University
- Asst. Prof. of Clinical Biomedical Informatics and Clinical Epidemiology
Columbia University


Email: andrew.kanter@xxxxxxxxxxxxxxxxx
Mobile: +1 (646) 469-2421
Office: +1 (212) 305-4842
Skype: akanter-ippnw
Yahoo: andy_kanter

  ------------------------------
Click here to unsubscribe<LISTSERV@xxxxxxxxxxxxxxxxxx?body=SIGNOFF%20openmrs-devel-l>from
OpenMRS Developers' mailing list


-- 
Justin C. Miranda
Software Developer
Partners In Health
www.pih.org

*Stand With Haiti:* www.standwithhaiti.org
* Please donate now to support our earthquake relief efforts in Haiti*
  ------------------------------
Click here to unsubscribe<LISTSERV@xxxxxxxxxxxxxxxxxx?body=SIGNOFF%20openmrs-devel-l>from
OpenMRS Developers' mailing list



-- 
Cheers,
Knut Staring