← Back to team overview

dhis2-devs team mailing list archive

Scalability issue - assignment of organisationUnits to dataSets

 

Hi team,

We have a scalability issue that we'd like to discuss with the developer
community along with potential solutions.

*The problem*

   - The number of organisationUnits (DHIS2 demo has ~2000) usually far
   outnumbers the number of dataSets (DHIS2 demo has ~20).
   - When you assign an organisationUnit to a dataSet, the *lastUpdated*
   timestamp is only updated on the dataSet, not the organisationUnit.
   - This means that when we modify one of the 2000 organisationUnits to
   have an additional dataSet, we are not able to use the lastUpdated filter
   to download the aforementioned organisationUnit including its associated
   dataSet IDs. We are only able to use the lastUpdated filter to download the
   updated dataSet *along with the organisationUnit IDs of all the other
   (potentially up to 2000) organisationUnits already assigned to the dataSet*.
   The organisationUnit IDs make up ~85% of the dataSets payload on the DHIS2
   demo.
   - Since organisationUnits are far greater in number they are more likely
   to be created or modified on a more frequent basis. If those modifications
   include dataSet assignments, then the dataSets will be "updated" each time
   and downloaded repeatedly.
   - The payloads are still relatively small for moderate internet
   connections. With gzip compression, downloading all dataSets is
   approximately 80KB. However for poor internet connections in remote
   locations an 80KB payload can be problematic especially if it occurs on a
   recurring basis.

*Potential solutions*

   - When assigning an organisationUnit to a dataSet, update the *lastUpdated
   *timestamp on the organisationUnit as well as the dataSet. This doesn't
   solve the problem of dataSets being downloaded repeatedly, but it would
   allow API consumers to exclude the organisationUnit IDs from the dataSets
   payload (~85% of the payload size) and mitigate the payload size issue on
   poor internet connections.
   - When assigning an organisationUnit to a dataSet, only update the
   *lastUpdated *timestamp on the organisationUnit (and not on the
   dataSet). Whilst this solves the issue of downloading dataSets repeatedly,
   it presents backwards compatibility issues for existing consumers of the
   dataSets API.
   - Update the *lastUpdated* timestamp of either the dataSet or the
   organisationUnit, *depending upon the API resource used**.* In the DHIS2
   demo you are able to assign/remove organisationUnits to/from dataSets using
   POST/DELETE requests to either
   */api/dataSets/<id>/organisationUnits/<id>* or
   */api/organisationUnits/<id>/dataSets/<id>*. If you use the former (the
   dataSets resource) then update the timestamp on the dataSet, whereas if you
   use the latter (the organisationUnits resource) then update the timestamp
   on the organisationUnit. This has the advantage of solving the issue of
   downloading dataSets repeatedly whilst also not introducing backwards
   compatibility issues for consumers of the dataSets API.

Thoughts?

Cheers,

-doh