← Back to team overview

kicad-developers team mailing list archive

UTF8 class and wchar_t issues fix

 

Hi Orson

Could you have a look into this patch.

I am thinking it is also a fix for bug 1737143.

Currently our UTF8 class has only a append operator for ASCII7 chars.
But if the code uses something like (it is accepted by the compiler):
my_utf_8_string += mychar;
and if mychar is a int (or a wchar_t) containing a non ASCII7 value, the UTF8 string is broken.
the patch adds a append operator for non ASCII7 value (wchar_t wide chars)


I am pretty sure this kind of issues also exists in eagle import (when eagle files uses non ASCII7
chars in texts).

Thanks.

-- 
Jean-Pierre CHARRAS
 common/text_utils.cpp |  5 ++++-
 common/utf8.cpp       | 19 +++++++++++++++++++
 include/utf8.h        |  4 ++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/common/text_utils.cpp b/common/text_utils.cpp
index 2d0d635..dc2a270 100644
--- a/common/text_utils.cpp
+++ b/common/text_utils.cpp
@@ -44,7 +44,10 @@ std::pair<UTF8, std::vector<bool>> ProcessOverbars( const UTF8& aText )
             // If it is a double tilda, just process the second one
         }
 
-        text += *chIt;
+        // remember: *chIt is not necessary a ASCII7 char.
+        // it is a unsigned ( wchar_t ) giving a multibyte char in UTF8 strings
+        text += wchar_t( *chIt );
+
         flags.push_back( overbar );
     }
 
diff --git a/common/utf8.cpp b/common/utf8.cpp
index 1d4bf1d..7be3e0f 100644
--- a/common/utf8.cpp
+++ b/common/utf8.cpp
@@ -218,6 +218,25 @@ UTF8::UTF8( const wchar_t* txt ) :
 }
 
 
+    /// Append a wide (unicode) char to the UTF8 string.
+    /// if this char is not a ASCII7 char, it will be added as a UTF8 multibyte seqence
+UTF8& UTF8::operator+=( wchar_t aUniChar )
+{
+    if( aUniChar <= 0x7F )
+        m_s.operator+=( char( aUniChar ) );
+    else
+    {
+        wchar_t wide_chr[2];    // buffer to store wide chars (unicode) read from aText
+        wide_chr[1] = 0;
+        wide_chr[0] = aUniChar;
+        UTF8 substr( wide_chr );
+        m_s += substr.m_s;
+    }
+
+    return (UTF8&) *this;
+}
+
+
 #if 0   // some unit tests:
 
 #include <stdio.h>
diff --git a/include/utf8.h b/include/utf8.h
index e78e8fd..48150f3 100644
--- a/include/utf8.h
+++ b/include/utf8.h
@@ -145,6 +145,10 @@ public:
         return (UTF8&) *this;
     }
 
+    /// Append a wide (unicode) char to the UTF8 string.
+    /// if this char is not a ASCII7 char, it will be added as a UTF8 multibyte seqence
+    UTF8& operator+=( wchar_t aUniChar );
+
     // std::string::npos is not constexpr, so we can't use it in an
     // initializer.
     static constexpr std::string::size_type npos = -1;

Follow ups