dhis2-users team mailing list archive
-
dhis2-users team
-
Mailing list archive
-
Message #00207
Re: Data Import tools for DHIS2
Hello again,
Let me again underline that the process of adding legacy data to DHIS2
or any information system consists of three parts - Extraction,
Transformation and Loading. I agree very much with Roger and Jason
that this process will differ for each country, and that one needs to
use a set of tools that are suited to ETL work and have thorough
knowledge of the data that is to be loaded and the model that one
wants to load into.
I do think doing a bit of manual entry of the legacy data into a test
instance of DHIS 2 (not the master copy) is a good exercise for
understanding what is needed in each particular case. There is no way
around a (manual) analytical phase, where one actively has to
look at the data available and their quality.
Still, there are certainly aspects of the process that are common to
all cases, and I think we could develop more support for those parts,
and I think the example files from Kenya can be very instructive in
this regard:
1) Extraction
There are over 20 Excel files for just Baringo district. Since you
have 149 districts, that means around 3000 Excel files in total, just
for 2009 and 2010. Each file type has its own structure, which
hopefully is quite constant across the districts. For this, I
recommend using a Python script to extract the data from the thousands
of files. I have started on such a script, but it needs a lot more
work -
and must be tailored to each of the formats.
2) Transformation
It is likely that not all the existing data will fit the new data
model well. For this cleaning stage, a database GUI such as PgAdmin
III or Access can be helpful, and I propose that the script mentioned
above would produce a CSV file for easy upload to an empty database
(not DHIS2).
There are bound to be quality issues, as Roger referred to. The data
need to go through a cleaning process, and the metadata must be
extracted on their own - i.e. Orgunit hierarchy, the Data Elements and
the dimensional breakdown (age, sex etc.). There are no ways to
automate all the cleaning and analysis, but think we could build up a
collection of useful SQL queries to be shared.
In combination with looking at the current data entry forms, this will
determine the database structure. It is vital to finalize the database
structure before starting to import the legacy data.
3) Loading
Finally, the only part I think we could standardize more is the actual
uploading to DHIS2. Data could be converted to a generic format for
import to DHIS2 the most tricky bit will likely be handling dimensions
and perhaps periods.
The best option here is probably the DHIS DXF (not the Autocad GIS
format). I think it should be feasible for the project to create a
DataLoader for DHIS2, that would allow people to format their data in
a spreadsheet or a database view, and then DHIS2 will take care of the
import. But this is a sizeable project in itself, and will have to go
into the overall prioritization of blueprints.
Knut
On Wed, Nov 10, 2010 at 2:52 PM, samuel cheburet
<samuelcheburet@xxxxxxxxx> wrote:
> Thanks Suzuki,
> Attached is a sample district file in FTP.
> Looking at data structure which is similar to what we have customized to
> dhis2 which capture data by health facility and FTP data is district data
> which form legacy data which require migration.
> Regards
>
> On Wed, Nov 10, 2010 at 4:18 PM, Shinichi Suzuki <shin461@xxxxxxxxx> wrote:
>>
>> Hello Json,
>>
>> Thanks for your answer.
>> I tried to export the some data and looked it by text editor to understand
>> XDF format.
>> >From my web search, XDF is a AutoCAD or other 3D CAD standard format.
>> I could not create from Excel or CSV file to that '.action' file like.
>> ETL tools are looks useful for data exporting for private analysis and not
>> for import into DHIS2.
>> Please give me your recommended tool(s) name to do this.
>> Hopefully, could you send me your document draft when you write?
>> I am afraid I am confusion right now.
>>
>> Many thanks again,
>> Shinichi Suzuki
>>
>> -----Original Message-----
>> From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
>> Sent: Tuesday, November 09, 2010 5:25 PM
>> To: Shinichi Suzuki
>> Cc: Knut Staring; dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Subject: Re: [Dhis2-users] Data Import tools for DHIS2
>>
>> Hi Shinchi,
>>
>> In fact as Knut highlighted, the appropriate method would be the
>> production of XML data (DXF) which DHIS2 already has robust import
>> mechanisms for. We have used direct insertion of data in the past, but
>> as Knut points out, and I will again, you must be very careful when
>> doing this. Much better to try and get the data in the correct format,
>> and import it in the recommended way via DXF if you can.
>>
>> I am right in the middle of migrating data from a legacy system into
>> DHIS. I will try and document as much of the process as I can, which
>> may help you in your import.
>>
>> Best regards,
>> Jason
>>
>>
>> On Tue, Nov 9, 2010 at 3:35 PM, Shinichi Suzuki <shin461@xxxxxxxxx> wrote:
>> > Hi Knut,
>> >
>> > Thanks a lot. I am encouraged to read your mail.
>> > I will try to read two home pages. But I feel familiar with your Access
>> based
>> > tools.
>> > I know your concerning to use these direct modification.
>> > I think DHIS2 should have a API to do this and open them. Then your
>> concerning
>> > will be disappeared.
>> >
>> > Best regards,
>> > Shinichi Suzuki
>> >
>> > -----Original Message-----
>> > From: Knut Staring [mailto:knutst@xxxxxxxxx]
>> > Sent: Monday, November 08, 2010 11:28 AM
>> > To: Shinichi Suzuki
>> > Cc: larshelge@xxxxxxxxx; dhis2-users@xxxxxxxxxxxxxxxxxxx
>> > Subject: Re: Data Import tools for DHIS2
>> >
>> > Hi Shinichi,
>> >
>> > You are completely right that it is important to be able to easily
>> > import the orgunit hierarchy and historical data into DHIS2. The
>> > general term for such operations is Extract, Transform and Load (ETL),
>> > for which there are a number of powerful general tools available. Here
>> > are two:
>> > http://kettle.pentaho.com/
>> > http://www.talend.com/index.php
>> > I have personally also used ODBC connections in Access to transform
>> > and load Excel data into Postgres directly.
>> >
>> > A disadvantage of going directly into the database is that one looses
>> > the validations that the DHIS2 import mechanism can perform (in
>> > addition to the database constraints), and of course it is burdensome
>> > to learn to use the above tools. DHIS2 has the capability to import
>> > XML files, and thus also modern Excel files (.xlsx), and this should
>> > probably be the common way.However, given the many different types of
>> > data and vast range of potential data sources, we will probably never
>> > have a simple wizard that does it all (and there is no sense in trying
>> > to replicate Kettle or Talend).
>> >
>> > Still, we have been thinking about defining a suitable standard format
>> > for such import, as well as providing the user with some assistance in
>> > transforming the data into shape for loading. We have also done some
>> > work on extraction of data automatically from a large number of Excel
>> > files (e.g. one or more per district) using Python. Work remains
>> > before this work will reach a stage where it is robust and generic
>> > enough - but I think working with the Kenyan data could help move this
>> > process forward. I will refresh my memory on the status of the Python
>> > work and get back to you.
>> >
>> > Knut
>> >
>> >
>> > On Mon, Nov 8, 2010 at 8:42 AM, Shinichi Suzuki <shin461@xxxxxxxxx>
>> > wrote:
>> >> Hello, Knut and dhis2_devs, users
>> >>
>> >> Do you have a data import tool written by Python and available to use?
>> >> I need the tools for the following area.
>> >> 1. Existing HIS EXCEL data import to DHIS2. We want to load about
>> >> several
>> >> years historical data which has been corrected into EXCEL files.
>> >> 2. Import from Master Facility List (Web based Facility Master
>> maintenance
>> >> system) to DHIS2 for facility data synchronization. This MFL has a
>> >> capability to export the data into EXCEL file. But MFL does not have a
>> >> "Short name" and need to add the data manually. Then want to import it.
>> >>
>> >> If you do not have it or not available to use it, please advise me if
>> >> possible.
>> >> I believe this import capability is important for the new user of DHIS
>> and
>> >> also could use inter system connection more widely.
>> >>
>> >> Best regards,
>> >> Shinichi Suzuki
>> >> -----------------------------------------------------
>> >> Shinichi Suzuki
>> >> MIS Division MOPHS: LG37 AFYA House, Nairobi, KENYA
>> >> E-Mail: shin461@xxxxxxxxx
>> >> Phone: 0712-754-963
>> >> (JICA Senior Volunteer 21-4)
>> >> -----------------------------------------------------
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Knut Staring
>> >
>> >
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~dhis2-users
>> > Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~dhis2-users
>> > More help : https://help.launchpad.net/ListHelp
>> >
>>
>>
>>
>> --
>> Jason P. Pickering
>> email: jason.p.pickering@xxxxxxxxx
>> tel:+260968395190
>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
>
> --
> Samuel Cheburet
> Ministry Of Health
> P.O. Box 20781
> Nairobi, Kenya
> Mobile- 0721624338
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help : https://help.launchpad.net/ListHelp
>
>
--
Cheers,
Knut Staring
Follow ups
References