← Back to team overview

c2c-oerpscenario team mailing list archive

[Bug 787908] [NEW] sxw2rml cannot support for Simplified Chinese Version OpenOffice 1.0 document

 

Public bug reported:


I used the following python script convert sxw(OpenOffice 1.0 document) to rml document.

<pre>

import zipfile,sys
from pyopenoffice import PyOpenOffice
import StringIO
from lxml import etree
import xml.dom.minidom

import libxslt
import libxml2

fname = r'c:\test.sxw'
xsl_file = './normalized_oo2rml.xsl'
z = zipfile.ZipFile(fname, 'r')
mimetype = z.read('mimetype')
if mimetype.split('/')[-1] == 'vnd.oasis.opendocument.text' :
    xsl_file = './normalized_odt2rml.xsl'

xsl = file(xsl_file).read()
tool = PyOpenOffice('.', save_pict = False)
sxw_file = fname
res = tool.unpackNormalize(sxw_file)

styledoc = libxml2.parseDoc(xsl)

style = libxslt.parseStylesheetDoc(styledoc)
doc = libxml2.parseMemory(res,len(res))
result = style.applyStylesheet(doc, None)
print result

</pre>

There are some bug of minidom python extended, and I fixed it.

@tiny_sxw2rml.pdf (5.x) or @openerp_sxw2rml.pdf     I found the code...

<pre>
        styles_styles = self.styles_dom.getElementsByTagName("style:style")

</pre>

I fixed it like :

<pre>
        ....
        styles_styles = []
        styles_styles = styles_styles + self.styles_dom.getElementsByTagName("style:style")
        styles_styles = styles_styles + self.styles_dom.getElementsByTagName("style:font-decl")
        ....
</pre>

and some trouble with "content_styles" variable...


@normalized_oo2rml.xsl document. I found the code:

<pre>
<xsl:when test="not($fontName='') and boolean($fontName)">

....
    <xsl:when test="contains($fontName,'Courier')">

...
    <xsl:when test="contains($fontName,'Helvetica') or contains($fontName,'Arial') or contains($fontName,'Sans')">

...
    <xsl:otherwise>                       <-------------------- Otherwise 1

...
<xsl:otherwise>                           <-------------------- Otherwise 2
...
</pre>

In Simplified Chinese Version OpenOffice 1.0 document,  The "fontName" is "宋体", "黑体". 
I found in my "test.sxw" file, the normalized_oo2rml.xsl  match the "Otherwise 2",  the sxw file's "宋体" fontName be replaced with "Times-Roman"..

Then,  How to fixed it and add docini/registerFont node to generated rml
file. order to let OpenERP to support the Simplified Chinese Version
OpenOffice 1.0 document can be convert to rml file.

Thanks...

mrshelly
2011/05/25

** Affects: openobject-server
     Importance: Undecided
         Status: New


** Tags: fontname mrshelly report rml sxw2rml

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/787908

Title:
  sxw2rml cannot support for Simplified Chinese Version OpenOffice 1.0
  document

Status in OpenERP Server:
  New

Bug description:
  
  I used the following python script convert sxw(OpenOffice 1.0 document) to rml document.

  <pre>

  import zipfile,sys
  from pyopenoffice import PyOpenOffice
  import StringIO
  from lxml import etree
  import xml.dom.minidom

  import libxslt
  import libxml2

  fname = r'c:\test.sxw'
  xsl_file = './normalized_oo2rml.xsl'
  z = zipfile.ZipFile(fname, 'r')
  mimetype = z.read('mimetype')
  if mimetype.split('/')[-1] == 'vnd.oasis.opendocument.text' :
      xsl_file = './normalized_odt2rml.xsl'

  xsl = file(xsl_file).read()
  tool = PyOpenOffice('.', save_pict = False)
  sxw_file = fname
  res = tool.unpackNormalize(sxw_file)

  styledoc = libxml2.parseDoc(xsl)

  style = libxslt.parseStylesheetDoc(styledoc)
  doc = libxml2.parseMemory(res,len(res))
  result = style.applyStylesheet(doc, None)
  print result

  </pre>

  There are some bug of minidom python extended, and I fixed it.

  @tiny_sxw2rml.pdf (5.x) or @openerp_sxw2rml.pdf     I found the
  code...

  <pre>
          styles_styles = self.styles_dom.getElementsByTagName("style:style")

  </pre>

  I fixed it like :

  <pre>
          ....
          styles_styles = []
          styles_styles = styles_styles + self.styles_dom.getElementsByTagName("style:style")
          styles_styles = styles_styles + self.styles_dom.getElementsByTagName("style:font-decl")
          ....
  </pre>

  and some trouble with "content_styles" variable...

  
  @normalized_oo2rml.xsl document. I found the code:

  <pre>
  <xsl:when test="not($fontName='') and boolean($fontName)">

  ....
      <xsl:when test="contains($fontName,'Courier')">

  ...
      <xsl:when test="contains($fontName,'Helvetica') or contains($fontName,'Arial') or contains($fontName,'Sans')">

  ...
      <xsl:otherwise>                       <-------------------- Otherwise 1

  ...
  <xsl:otherwise>                           <-------------------- Otherwise 2
  ...
  </pre>

  In Simplified Chinese Version OpenOffice 1.0 document,  The "fontName" is "宋体", "黑体". 
  I found in my "test.sxw" file, the normalized_oo2rml.xsl  match the "Otherwise 2",  the sxw file's "宋体" fontName be replaced with "Times-Roman"..

  Then,  How to fixed it and add docini/registerFont node to generated
  rml file. order to let OpenERP to support the Simplified Chinese
  Version OpenOffice 1.0 document can be convert to rml file.

  Thanks...

  mrshelly
  2011/05/25


Follow ups

References