oship-dev team mailing list archive

Thread
Date

Re: Automatic Python code generation [Re: adl2py fundamentals]

To: timothywayne.cook@xxxxxxxxx
From: Roberto Siqueira <siqueira@xxxxxxxxxxxxxxx>
Date: Wed, 08 Jul 2009 15:57:24 +0200
Cc: OSHIP-Dev <oship-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <1246908711.6030.92.camel@localhost>
Organization: LMTG (Laboratoire des Mécanismes et Transferts en Géologie)
User-agent: Thunderbird 2.0.0.22 (X11/20090608)

Hi, Tim:

For reference, I did all my tests in Linux Mint 7 Gloria (a distrothat is based on Ubuntu 9.04). I downloaded the latest version ofgenerateDS.py (1.17d) from sourceforge(http://sourceforge.net/projects/generateds/) and installed it "the oldway" (sudo python setup.py install) without any problems.

Archetype.xsd depends on (i.e. <xs:include>) Resource.xsd and, in itsturn, Resource.xsd depends on BaseTypes.xsd. So I grouped these 3 filesinto a single one (using the "process_includes.py" utility supplied bygenerateDS.py, which depends on the "python-lxml" package) and I alsoput the: <xs:element name="archetype" type="ARCHETYPE"/> line on the topof the file, to ascertain that "archetype" would be correctly taken asthe root object (by the generateDS.py executable).

Let's recall that the XML version of openEHR (either 1.0.1 or 1.0.2)is less thoroughly tested than the ADL one, so it's normal to find somesmall typos and inconsistencies here and there. After a few minorcorrections to the .xsd files (BaseTypes.xsd is wrongly referenced as"basetypes.xsd" on Resource.xsd; maxOccurs="unbounded" had to be addedto the "parent_resource" element), a huge (~400kB) "classes" file wasquickly generated (less than 1 sec.) by the program. This file alreadyincludes 2 important utility functions: "parse" (that processes a .xmlfile compatible with the .xsd schema, and return a "xml copy" of it) and"parseLiteral" (that processes the same .xml file and return a "Pythonliteral" version of it -- see example below).

I called this output file "test.py" (not very creative, I know...). Iedited it to use the "parseLiteral" function (instead of the default"parse") in its main() module and then I tried to parse our openEHR xmlfiles with it. After (again) some minor corrections, now on the .xmlfiles ("C_CODE_PHRASE" was replaced by "CODE_PHRASE"; "C_DV_QUANTITY"was replaced by "DV_QUANTITY"), I finally got (e.g.) the followingoutput from the "openEHR-EHR-COMPOSITION.encounter.v1.xml" file:


-----snip-----
from test import *

rootObj = archetype(
   original_language=model_.CODE_PHRASE(
       terminology_id=model_.TERMINOLOGY_ID(
       ),
       code_string='en',
   ),
   is_controlled=None,
   description=model_.RESOURCE_DESCRIPTION(
       original_author=[
           model_.original_author(
               id = name,
               valueOf_ = "Thomas Beale",
           ),
           model_.original_author(
               id = organisation,
               valueOf_ = "Ocean Informatics",
           ),
           model_.original_author(
               id = date,
               valueOf_ = "2005-10-10",
           ),
       ],
       other_contributors=[
       ],
       lifecycle_state='AuthorDraft',
       resource_package_uri='None',
       other_details=[
       ],
       details=[
           model_.details(
               language=model_.CODE_PHRASE(
                   terminology_id=model_.TERMINOLOGY_ID(
                   ),
                   code_string='en',
               ),
               purpose='Record of encounter as a progress note.',
               keywords=[
                   'progress',
                   'note',
                   'encounter',
               ],
               use='',
               misuse='',
               copyright='None',
               original_resource_uri=[
               ],
               other_details=[
               ],
           ),
       ],
       parent_resource=[
       ],
   ),
   translations=[
   ],
   archetype_id=model_.ARCHETYPE_ID(
   ),
   adl_version='1.4',
   concept='at0000',
   definition=model_.C_COMPLEX_OBJECT(
       valueOf_ = "",
       rm_type_name='COMPOSITION',
       occurrences=model_.IntervalOfInteger(
           lower_included=True,
           upper_included=True,
           lower_unbounded=False,
           upper_unbounded=False,
           lower=1,
           upper=1,
       ),
       node_id='at0000',
       valueOf_ = "",
       attributes=[
           model_.attributes(
           ),
       ],
   ),
   invariants=[
   ],
   ontology=model_.ARCHETYPE_ONTOLOGY(
       term_definitions=[
           model_.term_definitions(
               language = en,
               items=[
                   model_.items(
                       code = at0000,
                       items=[
                           model_.items(
                               id = description,

valueOf_ = "Generic encounter orprogress note composition",

                           ),
                           model_.items(
                               id = text,
                               valueOf_ = "Encounter",
                           ),
                       ],
                   ),
               ],
           ),
       ],
       constraint_definitions=[
       ],
       term_bindings=[
       ],
       constraint_bindings=[
       ],
   ),
)
-----/snip-----

Please note that if one tries to "run" this code in Python, it willcomplain that "valueOf_" is referenced twice, inside "definition". Isuppose that this is also a minor problem with the XML schema("definition" is a C_COMPLEX_OBJECT, C_COMPLEX_OBJECT extendsC_DEFINED_OBJECT, C_DEFINED_OBJECT extends C_OBJECT etc). In any case,the error messages given by generateDS.py during all these tests wereinformative enough to help me finding these "minor" errors, and thatwithout any prior knowledge of the schema's details.

My opinion is that this approach deserves to be further investigated(maybe on a new Blueprint?), at least as a way to cross-check the XMLagainst the ADL. The only downside I found was that unicode strings(like the German definitions inopenEHR-EHR-OBSERVATION.blood_pressure.v1.xml) were not properlyhandled, but maybe this is so because I am doing something wrong -- I amstill "discovering" the program and the schemas (schemata?).


 Cheers,
Roberto.

Tim Cook a écrit :

Hi Roberto,

If you have time could you please install this app and then run it
against the blood pressure XML files[1] and then send me the Python
output so I can compare it to the ADL?

I used generateDS several years ago without much success.


[1] The schemas are here:
http://www.openehr.org/releases/1.0.2/its/XML-schema/index.html

The archetypes are:
openEHR-EHR-CLUSTER.device.v1.adl
openEHR-EHR-CLUSTER.level_of_exertion.v1.adl
openEHR-EHR-COMPOSITION.encounter.v1.adl
openEHR-EHR-OBSERVATION.blood_pressure.v1.adl

their XML representations can be found in their categories at:
http://www.openehr.org/svn/knowledge/archetypes/dev-uk-nhs/gen/xml/openehr/ehr/
Thanks,
Tim
On Mon, 2009-07-06 at 15:13 +0200, Roberto Siqueira wrote:
Hi, all:
I was checking the state-of-the-art of the OpenEHR XML representation(http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html) andalso reviewing the different XML modules available in Python torepresent data (DOM, objectify, Elementtree, lxml etc) when I foundthis: http://www.rexx.com/~dkuhlman/generateDS.html -- a module thatgenerates Python classes from XML schemas (XSD files). It's not exactlywhat we were looking for back then in April, but may be useful anyway.Please have a look at it when you have some time.
  Best regards,
Roberto.

Le 23.04.2009 21:55, Roberto Siqueira a écrit :
[...] By the way: generation of Python code using Python itself iswhat is called metaprogramming(http://en.wikipedia.org/wiki/Metaprogramming). It would be wonderfulto find some sort of Python "metaprogrammer" ("disassembler" or"decompiler") library ready to use, don't you think? Unfortunately,the only ones I've found (up to now) are "low level" bytecodedecompilers like: http://docs.python.org/library/dis.html , that arenot capable to decompile "high level" objects like classes, mixinsetc. In any case, I suppose that the small "helper class" describedin: http://effbot.org/zone/python-code-generator.htm will have someutility here, as handling indentation can be very cumbersome,sometimes. [...]

Follow ups

Re: Automatic Python code generation [Re: adl2py fundamentals]
From: Tim Cook, 2009-07-09

References

adl2py fundamentals
From: Roger Erens, 2009-04-23
Re: adl2py fundamentals
From: Roberto Siqueira, 2009-04-23
Automatic Python code generation [Re: adl2py fundamentals]
From: Roberto Siqueira, 2009-07-06
Re: Automatic Python code generation [Re: adl2py fundamentals]
From: Tim Cook, 2009-07-06