dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #46554
Scalability issue - assignment of organisationUnits to dataSets
Hi team,
We have a scalability issue that we'd like to discuss with the developer
community along with potential solutions.
*The problem*
- The number of organisationUnits (DHIS2 demo has ~2000) usually far
outnumbers the number of dataSets (DHIS2 demo has ~20).
- When you assign an organisationUnit to a dataSet, the *lastUpdated*
timestamp is only updated on the dataSet, not the organisationUnit.
- This means that when we modify one of the 2000 organisationUnits to
have an additional dataSet, we are not able to use the lastUpdated filter
to download the aforementioned organisationUnit including its associated
dataSet IDs. We are only able to use the lastUpdated filter to download the
updated dataSet *along with the organisationUnit IDs of all the other
(potentially up to 2000) organisationUnits already assigned to the dataSet*.
The organisationUnit IDs make up ~85% of the dataSets payload on the DHIS2
demo.
- Since organisationUnits are far greater in number they are more likely
to be created or modified on a more frequent basis. If those modifications
include dataSet assignments, then the dataSets will be "updated" each time
and downloaded repeatedly.
- The payloads are still relatively small for moderate internet
connections. With gzip compression, downloading all dataSets is
approximately 80KB. However for poor internet connections in remote
locations an 80KB payload can be problematic especially if it occurs on a
recurring basis.
*Potential solutions*
- When assigning an organisationUnit to a dataSet, update the *lastUpdated
*timestamp on the organisationUnit as well as the dataSet. This doesn't
solve the problem of dataSets being downloaded repeatedly, but it would
allow API consumers to exclude the organisationUnit IDs from the dataSets
payload (~85% of the payload size) and mitigate the payload size issue on
poor internet connections.
- When assigning an organisationUnit to a dataSet, only update the
*lastUpdated *timestamp on the organisationUnit (and not on the
dataSet). Whilst this solves the issue of downloading dataSets repeatedly,
it presents backwards compatibility issues for existing consumers of the
dataSets API.
- Update the *lastUpdated* timestamp of either the dataSet or the
organisationUnit, *depending upon the API resource used**.* In the DHIS2
demo you are able to assign/remove organisationUnits to/from dataSets using
POST/DELETE requests to either
*/api/dataSets/<id>/organisationUnits/<id>* or
*/api/organisationUnits/<id>/dataSets/<id>*. If you use the former (the
dataSets resource) then update the timestamp on the dataSet, whereas if you
use the latter (the organisationUnits resource) then update the timestamp
on the organisationUnit. This has the advantage of solving the issue of
downloading dataSets repeatedly whilst also not introducing backwards
compatibility issues for consumers of the dataSets API.
Thoughts?
Cheers,
-doh