← Back to team overview

dhis2-users team mailing list archive

Re: [Dhis2-devs] DHIS2 with R

 

On 27 May 2010 17:27, Knut Staring <knutst@xxxxxxxxx> wrote:
> DHIS already has the concept of datamart and report tables, which do
> provide some separation of the transactional from the analytical,
> though we also have plans for improving this.
>
> R-node with Jquery /GeoExt looks interesting:
> http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/

Slight aside: protoviz looks like a pretty powerful visualization library ..

>
> On Thu, May 27, 2010 at 5:10 PM, Jason Pickering
> <jason.p.pickering@xxxxxxxxx> wrote:
>> Hi Roger,
>>
>> Valid concerns, but I would assume that the typical use case for R
>> would be that the user would typically only be looking at 1) a view
>> that has been prepared for them or 2) be provided with read only
>> access to selected tables. I would not expect that locks would be a
>> problem in this case.
>>
>> However, your point is well taken. It is therefore I have been
>> tinkering around with luciddb, a column-oriented database, that may be
>> more appropriate for analysis. Which database that should be used is
>> probably a discussion, but the point is that a separation between the
>> "transactional" database that is being used for data entry, and the
>> "analysis" database is a good idea. I would regard this to probably
>> outside the scope of what DHIS is really intended to do. If people
>> need to use tools like R, they will likely as well of being capable of
>> coming up with their own solution.
>>
>> However, this does not exclude that certain simple examples could be
>> built into DHIS. Obviously performance is a concern, but of course, it
>> depends on what you are trying to do. R is incredibly powerful when it
>> comes to producing graphics as I am sure that you are aware, and
>> lightyears ahead of the other components we are using (jPlot I think).
>> So, I would think that the typical use case would be to leverage R,
>> possibly as an extension to DHIS2 for those that need it, for the
>> generation of analysis tables and graphics, that would be beyond the
>> scope of the "basic" package, which is really limited to aggregation.
>>
>> Anyway, just a few more thoughts.
>>
>> Regards,
>> Jason
>>
>>
>> On Thu, May 27, 2010 at 3:39 PM, Friedman, Roger (CDC/OID/NCHHSTP)
>> (CTR) <rdf4@xxxxxxx> wrote:
>>> My concern would be DB performance, there's no telling what kind of locks R or any other product using odbc/jdbc is going to use.  I'm already worried about simultaneous transactional and reporting use.  Have there been any large-volume performance tests?  Has any thought been given to splitting reporting and data entry between different DB servers?  I know everyone has been focused on getting the distributed DB aspects right, but assuming universal availability of internet, how would DHIS2 perform on a single (possibly clustered) national DB server?
>>>
>>> -----Original Message-----
>>> From: dhis2-users-bounces+rdf4=cdc.gov@xxxxxxxxxxxxxxxxxxx [mailto:dhis2-users-bounces+rdf4=cdc.gov@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jason Pickering
>>> Sent: Thursday, May 27, 2010 7:03 AM
>>> To: Bob Jolliffe
>>> Cc: dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
>>> Subject: Re: [Dhis2-users] [Dhis2-devs] DHIS2 with R
>>>
>>> Yeah, this I guess comes back time and time again, with my some what
>>> uncomfortable relationship with Hibernate and Java. Clearly, we need
>>> to think about how to make certain procedures crossplatform compatible
>>> (cross platform in the sense of working between Postgres/MySQL and
>>> other DBs) with the need to offer advanced analysis capabilities, with
>>> acceptable performance.
>>>
>>> There could be multiple ways of doing it, but in the absense of having
>>> R integrated into DHIS2, I think the most likely shorterm use case
>>> would be just some documentation on how to use the R client with the
>>> DHIS2 database. Perhaps those users that use R over time with DHIS2
>>> could contribute their procedures, which should be able to be
>>> generalized either with PL/R.
>>>
>>> Of course the difference with using Postgres, is that R procedures can
>>> be embedded as a new language inside the DB. I am not really sure this
>>> is possible with MySQL. This of course reduces the internal overhead
>>> of getting the data out of Postgres, through Java, and into the R
>>> interpreter, but I am not sure really what the impact of this might be
>>> without testing it.
>>>
>>>
>>>
>>> On Thu, May 27, 2010 at 12:26 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>>> On 27 May 2010 11:15, Jason Pickering <jason.p.pickering@xxxxxxxxx> wrote:
>>>>> Hi Bob,
>>>>>
>>>>> Yes, I suspect that most R users would probably want to do things
>>>>> their own way. It has a rather steep learning curve. :)
>>>>>
>>>>> As for canned R scripts, the best way would probably with with PL/R, a
>>>>> procedural Postgresql language which utilizes R.
>>>>>
>>>>> http://www.joeconway.com/plr/doc/index.html
>>>>>
>>>>> I have done some very basic testing and it seems to work just fine on
>>>>> the server side.
>>>>
>>>> Swings and roundabouts to a certain extent.  The main thing is that
>>>> the r scripts are evaluated using the r c library.  If they were
>>>> invoked from within java/dhis then I guess data access would be slower
>>>> than from pl/r (we'd need to have a way to get the data to the r
>>>> interpreter), but number crunching would be similar and would also
>>>> work with mysql and friends.  Not sure which of these are bigger
>>>> problems in typical/possible scenarios.
>>>>
>>>>>
>>>>> I think they are two separate problems really, but I totally agree, C
>>>>> is likely going to be faster than Java for big operations. However, I
>>>>> do think (as all of you know) that the use of stored procedures (with
>>>>> the wrapper facade type of approach) for certain functions (like
>>>>> aggregation and heavy cross tab operations) would be much better to be
>>>>> executed on the database server as a native stored procedure.
>>>>>
>>>>> Regards,
>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 27, 2010 at 11:45 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>>>>> We've talked before about integrating scripting engine (such as R)
>>>>>> into dhis : http://www.rforge.net/rscript/
>>>>>>
>>>>>> But my guess is that most R users are going to be of a level of
>>>>>> sophistication that they would be most comfortable doing the kind of
>>>>>> thing you describe - conecting directly to db with r client and doing
>>>>>> their stuff.
>>>>>>
>>>>>> OTOH if there were sufficiently useful "canned" dhis R scripts which
>>>>>> could take some number crunching load off the jvm and produce canned
>>>>>> useful analysis then that would be different.
>>>>>>
>>>>>> Sadly I don't know sufficient about R to know.  But I sense it ...
>>>>>>
>>>>>> Regards
>>>>>> Bob
>>>>>>
>>>>>> On 27 May 2010 10:08, Jason Pickering <jason.p.pickering@xxxxxxxxx> wrote:
>>>>>>> Hi everyone. I have had a recent question from a user about how DHIS2
>>>>>>> can be used with R. I am including a trivial example here about how to
>>>>>>> use R as as a client to access data and produce a graph in DHIS2.
>>>>>>>
>>>>>>> Just get a copy of R  and install the DBI and RPostregSQL packages with
>>>>>>>
>>>>>>>>install.packages()
>>>>>>>
>>>>>>>
>>>>>>> After that, just connect to the DB, retrieve your data (in this case
>>>>>>> from a report table) and produce a graph.
>>>>>>>
>>>>>>>>library(DBI)
>>>>>>>
>>>>>>>>library(RPostgreSQL)
>>>>>>>
>>>>>>>>drv <- dbDriver("PostgreSQL")
>>>>>>>
>>>>>>>>con <- dbConnect(drv, dbname="dhis2_zm_prod2", user="postgres", password="postgres")
>>>>>>>
>>>>>>>>rs <- dbSendQuery(con, "SELECT * FROM _report_malaria_indicators_district where
>>>>>>> organisationunitid = 3904")
>>>>>>>
>>>>>>>>data <- fetch(rs,n=-1)
>>>>>>>
>>>>>>>>barplot(data$malaria_confirm_incidence, names.arg=as.character(data$periodname), main=as.character(data$organisationunitname[1]),las=2)
>>>>>>>
>>>>>>>>dev.print(png, file="/home/jason/test.png")
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jason
>>>>>>>
>>>>>>> ---
>>>>>>> Jason P. Pickering
>>>>>>> email: jason.p.pickering@xxxxxxxxx
>>>>>>> tel:+260968395190
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Jason P. Pickering
>>>>> email: jason.p.pickering@xxxxxxxxx
>>>>> tel:+260968395190
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Jason P. Pickering
>>> email: jason.p.pickering@xxxxxxxxx
>>> tel:+260968395190
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>>
>> --
>> --
>> Jason P. Pickering
>> email: jason.p.pickering@xxxxxxxxx
>> tel:+260968395190
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
>
> --
> Cheers,
> Knut Staring
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References