← Back to team overview

openlp-core team mailing list archive

[Merge] lp:~phill-ridout/openlp/1213254_2.0 into lp:openlp/2.0

 

Phill has proposed merging lp:~phill-ridout/openlp/1213254_2.0 into lp:openlp/2.0.

Requested reviews:
  OpenLP Core (openlp-core)

For more details, see:
https://code.launchpad.net/~phill-ridout/openlp/1213254_2.0/+merge/180683

Fix the importing of some OpenLP 1 databases by filtering out invalid xml chars. code from http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python
-- 
https://code.launchpad.net/~phill-ridout/openlp/1213254_2.0/+merge/180683
Your team OpenLP Core is requested to review the proposed merge of lp:~phill-ridout/openlp/1213254_2.0 into lp:openlp/2.0.
=== modified file 'openlp/plugins/songs/lib/xml.py'
--- openlp/plugins/songs/lib/xml.py	2012-12-30 19:41:24 +0000
+++ openlp/plugins/songs/lib/xml.py	2013-08-17 08:12:37 +0000
@@ -92,6 +92,27 @@
         self.song_xml = objectify.fromstring(u'<song version="1.0" />')
         self.lyrics = etree.SubElement(self.song_xml, u'lyrics')
 
+    @staticmethod
+    def valid_xml_char_ordinal(char):
+        """
+        Control Characters we need to filter from the xml.
+        Source <http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python>
+        """
+        return (
+            0x20 <= char <= 0xD7FF
+            or char in (0x9, 0xA, 0xD)
+            or 0xE000 <= char <= 0xFFFD
+            or 0x10000 <= char <= 0x10FFFF
+        )
+
+    @staticmethod
+    def clean_xml_string(xml):
+        """
+        Filter out invalid characters in xml
+        Source <http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python>
+        """
+        return ''.join(char for char in xml if SongXML.valid_xml_char_ordinal(ord(char)))
+
     def add_verse_to_lyrics(self, type, number, content, lang=None):
         """
         Add a verse to the ``<lyrics>`` tag.
@@ -112,6 +133,7 @@
             The verse's language code (ISO-639). This is not required, but
             should be added if available.
         """
+        content = self.clean_xml_string(content)
         verse = etree.Element(u'verse', type=unicode(type),
             label=unicode(number))
         if lang:


Follow ups