dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #12655
[Branch ~dhis2-documenters/dhis2/dhis2-docbook-docs] Rev 342: Added chapter on organisation units
------------------------------------------------------------
revno: 342
committer: Lars Helge Overland <larshelge@xxxxxxxxx>
branch nick: dhis2-docbook-docs
timestamp: Sat 2011-06-18 22:20:19 +0200
message:
Added chapter on organisation units
added:
src/docbkx/en/dhis2_implementation_guide_organisation_units.xml
src/docbkx/en/resources/images/implementation_guide/organisation_unit_hiearchy.png
modified:
src/docbkx/en/dhis2_implementation_guide_data_warehouse.xml
src/docbkx/en/dhis2_implementation_guide_en.xml
--
lp:~dhis2-documenters/dhis2/dhis2-docbook-docs
https://code.launchpad.net/~dhis2-documenters/dhis2/dhis2-docbook-docs
Your team DHIS 2 developers is subscribed to branch lp:~dhis2-documenters/dhis2/dhis2-docbook-docs.
To unsubscribe from this branch go to https://code.launchpad.net/~dhis2-documenters/dhis2/dhis2-docbook-docs/+edit-subscription
=== modified file 'src/docbkx/en/dhis2_implementation_guide_data_warehouse.xml'
--- src/docbkx/en/dhis2_implementation_guide_data_warehouse.xml 2011-06-18 17:48:25 +0000
+++ src/docbkx/en/dhis2_implementation_guide_data_warehouse.xml 2011-06-18 20:20:19 +0000
@@ -1,5 +1,47 @@
<?xml version='1.0' encoding='UTF-8'?>
+<!-- This document was created with Syntext Serna Free. -->
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" []>
<chapter>
<title>DHIS 2 as Data Warehouse</title>
+ <para>This chapter will discuss the role and place of the DHIS 2 application in a system architecture context. It will show that DHIS 2 can serve the purpose of both a data warehouse and an operational system.</para>
+ <section>
+ <title>Data warehouses and operational systems</title>
+ <para>A <emphasis role="italic">data warehouse</emphasis> is commonly understood as a database used for analysis. Typically data is uploaded from various operational / transactional systems. Before data is loaded into the data warehouse it usually goes through various stages where it is cleaned for anomalies and redundancy and transformed to conform with the overall structure of the integrated database. Data is then made available for use by analysis, also known under terms such as<emphasis role="italic"> data mining </emphasis>and <emphasis role="italic">online analytical processing</emphasis>. The data warehouse design is optimized for speed of data retrieval and analysis. To improve performance the data storage is often redundant in the sense that the data is stored both in its most granular form and in an aggregated (summarized) form.</para>
+ <para>A <emphasis role="italic">transactional system</emphasis> (or <emphasis role="italic">operational system</emphasis> from a data warehouse perspective) is a system that collects, stores and modifies low level data. This system is typically used on a day-to-day basis for data entry and validation. The design is optimized for fast insert and update performance.</para>
+ <para>There are several benefits of maintaining a data warehouse, some of them being:</para>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="italic">Consistency:</emphasis> It provides a common data model for all relevant data and acts as an abstraction over a potentially high number of data sources and feeding systems which makes it a lot easier to perform analysis.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Reliability:</emphasis> It is detached from the sources where the data originated from and is hence not affected if data in the operational systems is purged or lost.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Analysis performance:</emphasis> It is designed for maximum performance for data retrieval and analysis in contrast to operational system which are often optimized for data capture.</para>
+ </listitem>
+ </itemizedlist>
+ <para>There are however also significant challenges with a data warehouse approach:</para>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="italic">High cost:</emphasis> There is a high cost associated with moving data from various sources into a common data warehouse, especially when the operational systems are not similar in nature. Often long-term existing systems (referred to as legacy systems) put heavy constraints on the data transformation process.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Data validity:</emphasis> The process of moving data into the data warehouse is often complex and hence often not performed at regular and timely intervals. This will then leave the data users with out-dated and irrelevant data not suitable for planning and informed decision making.</para>
+ </listitem>
+ </itemizedlist>
+ <para>Due to the mentioned challenges it has lately become increasingly popular to merge the functions of the data warehouse and operational system, either into a single system which performs both tasks or with tightly integrated systems hosted together. With this approach the system provides functionality for data capture and validation as well as data analysis and manages the process of converting low-level atomic data into aggregate data suitable for analysis. This sets high standards for the system and its design as it must provide appropriate performance for both of those functions; however advances in hardware and parallel processing is increasingly making such an approach feasible.</para>
+ <para>In this regard, the DHIS 2 application is designed to serve as a tool for both data capture, validation, analysis and presentation of data. It provides modules for all of the mentioned aspects, including data entry functionality and a wide array of analysis tools such as reports, charts, maps, pivot tables and dashboard. </para>
+ </section>
+ <section>
+ <title>Aggregation strategies in DHIS 2</title>
+ <para>DHIS 2 is designed to run in low-end environments which puts certain restrictions on the performance. Two strategies for aggregation of data is offered:<emphasis role="italic"> Real-time aggregation</emphasis> means that the system will generate aggregated data on-the-fly based on the low-level data every time a report is requested. This implies that the aggregate data will reflect the the very latest captured data and is useful if producing reports immediately after data entry has been done is a priority. The downside is that this will not perform adequately on an online server where the database contains a large number of records and there is high user concurrency.</para>
+ <para><emphasis role="italic">Batch aggregation</emphasis> means that the system will generate aggregated data every night for a defined time-span (typically the last two years) based on the low-level data and write this data to a data mart. A data mart is a data store optimized for meeting the most common user requests for data analysis. The DHIS 2 data mart contains data aggregated in the<emphasis role="italic"> space dimension</emphasis> (the organisation unit hierarchy), <emphasis role="italic">time dimension</emphasis> (over multiple periods) and for <emphasis role="italic">indicator formulas</emphasis> (mathematical expressions including data elements). This strategy for aggregation provides great performance even in high-concurrency environments since most requests for analysis can be served with a single, simple database query against the data mart. The aggregation engine in DHIS 2 is capable of processing low-level data in the multi-millions and manage most national-level databases, and it can be said to provide <emphasis role="italic">near real-time access</emphasis> to aggregate data. The downside of this approach is that captured data will not be available for aggregated analysis until the next day. However, for a routine system like DHIS 2 where data is typically collected with a monthly periodicity this is not a significant problem.</para>
+ <para><emphasis role="italic">Hint</emphasis>: The aggregation strategy can be set in âSettingsâ - âSystem settingsâ, while scheduling of data mart exports can be enabled in âReportingâ - âSchedulingâ.</para>
+ </section>
+ <section>
+ <title>Data storage approach</title>
+ <para>There are two leading approaches for storing data in a data warehouse, namely the <emphasis role="italic">normalized</emphasis> and <emphasis role="italic">dimensional</emphasis> approach. DHIS 2 lends a bit from the former but mostly from the latter. In the dimensional approach the data is partitioned into <emphasis role="italic">dimensions</emphasis> and <emphasis role="italic">facts</emphasis>. Facts generally refers to transactional numeric data while dimensions are the reference data that gives context and meaning to the data. The strict rules of this approach makes it easy for users to understand the data warehouse structure and provides for good performance since few tables must be combined to produce meaningful analysis, while it on the other hand might make the system less flexible and harder to change.</para>
+ <para>
+In DHIS the facts corresponds to the data value object in the data model. The data value captures data as numbers, yes/no or text. The <emphasis role="italic">compulsory dimensions</emphasis> which give meaning to the facts are the <emphasis role="italic">data element</emphasis>, <emphasis role="italic">organisation unit hierarchy</emphasis> and <emphasis role="italic">period</emphasis> dimensions. These dimensions are referred to as compulsory since they must be provided for all stored data records. DHIS 2 also has a custom dimensional model which makes it possible to represent any kind of dimensionality. This model must be defined prior to data capture. DHIS 2 also has a flexible model of groups and group sets which makes it possible to add custom dimensionality to the compulsory dimensions after data capture has taken place. You can read more about dimensionality in DHIS 2 in the chapter by the same name.</para>
+ </section>
</chapter>
=== modified file 'src/docbkx/en/dhis2_implementation_guide_en.xml'
--- src/docbkx/en/dhis2_implementation_guide_en.xml 2011-06-18 17:48:25 +0000
+++ src/docbkx/en/dhis2_implementation_guide_en.xml 2011-06-18 20:20:19 +0000
@@ -41,9 +41,10 @@
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../build.properties" encoding="UTF-8"/>
</bookinfo>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_conceptual_design_principles.xml" encoding="UTF-8"/>
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_users_and_user_roles.xml" encoding="UTF-8"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_organisation_units.xml" encoding="UTF-8"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_data_elements_and_custom_dimensions.xml" encoding="UTF-8"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_indicators.xml" encoding="UTF-8"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_users_and_user_roles.xml" encoding="UTF-8"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_data_analysis_tools_overview.xml" encoding="UTF-8"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_pivot_tables_and_mydatamart.xml" encoding="UTF-8"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="dhis2_implementation_guide_integration.xml" encoding="UTF-8"/>
=== added file 'src/docbkx/en/dhis2_implementation_guide_organisation_units.xml'
--- src/docbkx/en/dhis2_implementation_guide_organisation_units.xml 1970-01-01 00:00:00 +0000
+++ src/docbkx/en/dhis2_implementation_guide_organisation_units.xml 2011-06-18 20:20:19 +0000
@@ -0,0 +1,36 @@
+<?xml version='1.0' encoding='UTF-8'?>
+<!-- This document was created with Syntext Serna Free. -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" []>
+<chapter>
+ <title>Organisation units</title>
+ <para>In DHIS 2 the location of the data, the geographical context, is represented as organisational units. Organisational units can be either a health facility or department/sub-unit providing services or an administrative unit representing a geographical area (e.g. a health district). </para>
+ <para>Organisation units are located within a hierarchy, also referred to as a tree. The hierarchy will reflect the health administrative structure and its levels. Typical levels in such a hierarchy are the national, province, district and facility levels. In DHIS 2 there is a single organisational hierarchy so the way this is defined and mapped to the reality needs careful consideration. Which geographical areas and levels that are defined in the main organisational hierarchy will have major impact on the usability and performance of the application. Additionally, there are ways of addressing alternative hierarchies and levels as explained in the section called Organisation unit groups and group sets further down.</para>
+ <section>
+ <title>Organisation unit hierarchy design</title>
+ <para>The process of designing a sensible organisation unit hierarchy has many aspects:</para>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="italic">Include all reporting health facilities:</emphasis> All health facilities which contribute to the national data collection should be included in the system. Facilities of all kinds of ownership should be incorporated, including private, public, NGO and faith-oriented facilities. Often private facilities constitute half of the total number of facilities in a country and have policies for data reporting imposed on them, which means that incorporating data from such facilities are necessary to get realistic, national aggregate figures.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Emphasize the health administrative hierarchy:</emphasis> A country typically has multiple administrative hierarchies which are often not well coordinated nor harmonized. When considering what to emphasize when designing the DHIS 2 database one should keep in mind what areas are most interesting and will be most frequently requested for data analysis. DHIS 2 is primarily managing health data and performing analysis based on the health administrative structure. This implies that even if adjustments might be made to cater for areas such as finance and local government, the point of departure for the DHIS 2 organisation unit hierarchy should be the health administrative areas. </para>
+ </listitem>
+ </itemizedlist>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="italic">Limit the number of organisation unit hierarchy levels:</emphasis> To cater for analysis requirements coming from various organisational bodies such as local government and the treasury, it is tempting to include all of these areas as organisation units in the DHIS 2 database. However, due to performance considerations one should try to limit the organisation unit hierarchy levels to the smallest possible number. The hierarchy is used as the basis for aggregation of data to be presented in any of the reporting tools, so when producing aggregate data for the higher levels, the DHIS 2 application must search for and add together data registered for all organisation units located further down the hierarchy. Increasing the number of organisation units will hence negatively impact the performance of the application and an excessively large number might become a significant problem in that regard. In addition, a central part in most of the analysis tools in DHIS 2 is based around dynamically selecting the âparentâ organisation unit of those which are intended to be included. For instance, one would want to select a province and have the districts belonging to that province included in the report. If the district level is the most interesting level from an analysis point of view and several hierarchy levels exist between this and the province level, this kind of report will be rendered unusable. When building up the hierarchy, one should focus on the levels that will be used frequently in reports and data analysis and leave out levels that are rarely or never used as this will have an impact on both the performance and usability of the application. </para>
+ </listitem>
+ </itemizedlist>
+ <para>Another guiding principle for designing the hierarchy is to avoid connecting levels that have close to one-to-one parent-child ratios, meaning that for instance a district (parent) should have on average more than one local council (child) at the level below before it make sense to add a local council level to the hierarchy. Parent-child ratios from 1:4 or more are much more useful for data analysis purposes as one can start to look at e.g. how a districtâs data is distributed in the different sub-areas and how these vary. Such drill-down exercises are not very useful when the level below has the same target population and the same serving health facilities as the parent. </para>
+ <para>Skipping geographical levels when mapping the reality to the DHIS 2 organisation unit hierarchy can be difficult and can easily lead to resistance among certain stakeholders, but one should have in mind that there are actually ways of producing reports based on geographical levels that are not part of the organisational hierarchy in DHIS 2, as will be explained in the next section.</para>
+ </section>
+ <section>
+ <title>Organisation unit groups and group sets</title>
+ <para>In DHIS 2, organisation units can be grouped in organisation unit groups, and these groups can be further organised into group sets, and together they can mimic an alternative organisational hierarchy which can be used when creating reports and other data output. In addition to representing alternative geographical locations not part of the main hierarchy, these groups are useful for assigning classification schemes to health facilities, e.g. based on the type or ownership of the facilities. Any number of group sets and groups can be defined in the application through the user interface, and all these are defined locally for each DHIS 2 database. </para>
+ <para>An example illustrates this best: Typically one would want to provide analysis based on the ownership of the facilities. In that case one would create a group for each ownership type, for instance âMoHâ, âPrivateâ and âNGOâ. All facilities in the database must then be classified and assigned to one and only one of these three groups. Next one would create a group set called âOwnershipâ to which the three groups above are assigned, as illustrated in the figure below. </para>
+ <graphic fileref="resources/images/implementation_guide/organisation_unit_hiearchy.png" align="center"/>
+ <para>In a similar way one can create a group set for an additional administrative level, e.g. local councils. All local councils must be defined as organisation unit groups and then assigned to a group set called âLocal Councilâ. The final step is then to assign all health facilities to one and only one of the local council groups. This then enables the DHIS 2 to produce aggregate reports by each local council (adding together the data for all assigned health facilities) without having to include the local council level in the main organisational hierarchy. The same approach can be followed for any additional administrative or geographical level that is needed, with one group set per additional level. Before one can go ahead and design this in DHIS 2, a mapping between the areas of the additional geographical level and the health facilities serving in each area is needed.</para>
+ <para>A key property of the group set concept in DHIS 2 to understand is <emphasis role="italic">exclusivity</emphasis>, which implies that an organisation unit can be member of exactly one of the groups in a group set. A violation of this rule would lead to duplication of data when aggregating health facility data by the different groups, as a facility assigned to two groups in the same group set would be counted twice.</para>
+ <para>With this structure in place, DHIS 2 can provide aggregated data for each of the organisation unit ownership types through the âOrganisation unit group set reportâ in âReportingâ module or through the Excel pivot table third-party tool. For instance one can view and compare utilisation rates aggregated by the different types of ownership (e.g. MoH, Private, NGO). In addition, DHIS 2 can provide statistics of the distribution of facilities in âOrganisation unit distribution reportâ in âReportingâ module. For instance one can view how many facilities exist under any given organisation unit in the hierarchy for each of the various ownership types. In the GIS module, given that health facility coordinates have been registered in the system, one can view the locations of the different types of health facilities (with different symbols for each type), and also combine this information with a other map layer showing indicators e.g. by district.</para>
+ </section>
+</chapter>
=== added file 'src/docbkx/en/resources/images/implementation_guide/organisation_unit_hiearchy.png'
Binary files src/docbkx/en/resources/images/implementation_guide/organisation_unit_hiearchy.png 1970-01-01 00:00:00 +0000 and src/docbkx/en/resources/images/implementation_guide/organisation_unit_hiearchy.png 2011-06-18 20:20:19 +0000 differ