← Back to team overview

oship-dev team mailing list archive

Re: Automatic Python code generation [Re: adl2py fundamentals]

 

Hi, Tim:
For reference, I did all my tests in Linux Mint 7 Gloria (a distro that is based on Ubuntu 9.04). I downloaded the latest version of generateDS.py (1.17d) from sourceforge (http://sourceforge.net/projects/generateds/) and installed it "the old way" (sudo python setup.py install) without any problems.

Archetype.xsd depends on (i.e. <xs:include>) Resource.xsd and, in its turn, Resource.xsd depends on BaseTypes.xsd. So I grouped these 3 files into a single one (using the "process_includes.py" utility supplied by generateDS.py, which depends on the "python-lxml" package) and I also put the: <xs:element name="archetype" type="ARCHETYPE"/> line on the top of the file, to ascertain that "archetype" would be correctly taken as the root object (by the generateDS.py executable).

Let's recall that the XML version of openEHR (either 1.0.1 or 1.0.2) is less thoroughly tested than the ADL one, so it's normal to find some small typos and inconsistencies here and there. After a few minor corrections to the .xsd files (BaseTypes.xsd is wrongly referenced as "basetypes.xsd" on Resource.xsd; maxOccurs="unbounded" had to be added to the "parent_resource" element), a huge (~400kB) "classes" file was quickly generated (less than 1 sec.) by the program. This file already includes 2 important utility functions: "parse" (that processes a .xml file compatible with the .xsd schema, and return a "xml copy" of it) and "parseLiteral" (that processes the same .xml file and return a "Python literal" version of it -- see example below).

I called this output file "test.py" (not very creative, I know...). I edited it to use the "parseLiteral" function (instead of the default "parse") in its main() module and then I tried to parse our openEHR xml files with it. After (again) some minor corrections, now on the .xml files ("C_CODE_PHRASE" was replaced by "CODE_PHRASE"; "C_DV_QUANTITY" was replaced by "DV_QUANTITY"), I finally got (e.g.) the following output from the "openEHR-EHR-COMPOSITION.encounter.v1.xml" file:

-----snip-----
from test import *

rootObj = archetype(
   original_language=model_.CODE_PHRASE(
       terminology_id=model_.TERMINOLOGY_ID(
       ),
       code_string='en',
   ),
   is_controlled=None,
   description=model_.RESOURCE_DESCRIPTION(
       original_author=[
           model_.original_author(
               id = name,
               valueOf_ = "Thomas Beale",
           ),
           model_.original_author(
               id = organisation,
               valueOf_ = "Ocean Informatics",
           ),
           model_.original_author(
               id = date,
               valueOf_ = "2005-10-10",
           ),
       ],
       other_contributors=[
       ],
       lifecycle_state='AuthorDraft',
       resource_package_uri='None',
       other_details=[
       ],
       details=[
           model_.details(
               language=model_.CODE_PHRASE(
                   terminology_id=model_.TERMINOLOGY_ID(
                   ),
                   code_string='en',
               ),
               purpose='Record of encounter as a progress note.',
               keywords=[
                   'progress',
                   'note',
                   'encounter',
               ],
               use='',
               misuse='',
               copyright='None',
               original_resource_uri=[
               ],
               other_details=[
               ],
           ),
       ],
       parent_resource=[
       ],
   ),
   translations=[
   ],
   archetype_id=model_.ARCHETYPE_ID(
   ),
   adl_version='1.4',
   concept='at0000',
   definition=model_.C_COMPLEX_OBJECT(
       valueOf_ = "",
       rm_type_name='COMPOSITION',
       occurrences=model_.IntervalOfInteger(
           lower_included=True,
           upper_included=True,
           lower_unbounded=False,
           upper_unbounded=False,
           lower=1,
           upper=1,
       ),
       node_id='at0000',
       valueOf_ = "",
       attributes=[
           model_.attributes(
           ),
       ],
   ),
   invariants=[
   ],
   ontology=model_.ARCHETYPE_ONTOLOGY(
       term_definitions=[
           model_.term_definitions(
               language = en,
               items=[
                   model_.items(
                       code = at0000,
                       items=[
                           model_.items(
                               id = description,
valueOf_ = "Generic encounter or progress note composition",
                           ),
                           model_.items(
                               id = text,
                               valueOf_ = "Encounter",
                           ),
                       ],
                   ),
               ],
           ),
       ],
       constraint_definitions=[
       ],
       term_bindings=[
       ],
       constraint_bindings=[
       ],
   ),
)
-----/snip-----

Please note that if one tries to "run" this code in Python, it will complain that "valueOf_" is referenced twice, inside "definition". I suppose that this is also a minor problem with the XML schema ("definition" is a C_COMPLEX_OBJECT, C_COMPLEX_OBJECT extends C_DEFINED_OBJECT, C_DEFINED_OBJECT extends C_OBJECT etc). In any case, the error messages given by generateDS.py during all these tests were informative enough to help me finding these "minor" errors, and that without any prior knowledge of the schema's details.

My opinion is that this approach deserves to be further investigated (maybe on a new Blueprint?), at least as a way to cross-check the XML against the ADL. The only downside I found was that unicode strings (like the German definitions in openEHR-EHR-OBSERVATION.blood_pressure.v1.xml) were not properly handled, but maybe this is so because I am doing something wrong -- I am still "discovering" the program and the schemas (schemata?).

 Cheers,
Roberto.

Tim Cook a écrit :
Hi Roberto,

If you have time could you please install this app and then run it
against the blood pressure XML files[1] and then send me the Python
output so I can compare it to the ADL?

I used generateDS several years ago without much success.


[1] The schemas are here:
http://www.openehr.org/releases/1.0.2/its/XML-schema/index.html

The archetypes are:
openEHR-EHR-CLUSTER.device.v1.adl
openEHR-EHR-CLUSTER.level_of_exertion.v1.adl
openEHR-EHR-COMPOSITION.encounter.v1.adl
openEHR-EHR-OBSERVATION.blood_pressure.v1.adl

their XML representations can be found in their categories at:
http://www.openehr.org/svn/knowledge/archetypes/dev-uk-nhs/gen/xml/openehr/ehr/
Thanks,
Tim
On Mon, 2009-07-06 at 15:13 +0200, Roberto Siqueira wrote:
Hi, all:
I was checking the state-of-the-art of the OpenEHR XML representation (http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html) and also reviewing the different XML modules available in Python to represent data (DOM, objectify, Elementtree, lxml etc) when I found this: http://www.rexx.com/~dkuhlman/generateDS.html -- a module that generates Python classes from XML schemas (XSD files). It's not exactly what we were looking for back then in April, but may be useful anyway. Please have a look at it when you have some time.
  Best regards,
Roberto.

Le 23.04.2009 21:55, Roberto Siqueira a écrit :
[...] By the way: generation of Python code using Python itself is what is called metaprogramming (http://en.wikipedia.org/wiki/Metaprogramming). It would be wonderful to find some sort of Python "metaprogrammer" ("disassembler" or "decompiler") library ready to use, don't you think? Unfortunately, the only ones I've found (up to now) are "low level" bytecode decompilers like: http://docs.python.org/library/dis.html , that are not capable to decompile "high level" objects like classes, mixins etc. In any case, I suppose that the small "helper class" described in: http://effbot.org/zone/python-code-generator.htm will have some utility here, as handling indentation can be very cumbersome, sometimes. [...]



Follow ups

References