openlp-core team mailing list archive
-
openlp-core team
-
Mailing list archive
-
Message #05799
[Bug 706211] Re: Biblegateway serves defect UTF8 for chinese bibles and breaks the charset detection
Workaround is using the Bibleserver Chinese bible which works fine.
Tried using a cleaner to remove the meta description which did remove it
but didn't change the problem so either it's not just those two
characters or BeautifulSoup does it's detection before it's cleaning
which doesn't help us.
** Changed in: openlp
Status: New => Confirmed
** Changed in: openlp
Importance: Undecided => Low
--
You received this bug notification because you are a member of OpenLP
Core, which is subscribed to OpenLP.
https://bugs.launchpad.net/bugs/706211
Title:
Biblegateway serves defect UTF8 for chinese bibles and breaks the
charset detection
Status in OpenLP - Worship Presentation Software:
Confirmed
Bug description:
This was still mentioned on the forum. As I'm not sure, when I find
the time to work on it, I write tis report.
When downloading references like:
http://www.biblegateway.com/passage/?search=John%203&version=CUV
the received HTML is UTF-8 encoded. The last two characters in <meta name="description" content="... are invalid UTF-8. This causes BeautifulSoup to fall back to cp1252 (which is wrong).
Forcing BeautifulSoup to use UTF-8 didn't work for me. It still fell back to cp1252. The best thing would be to rewrite the regarding codes for all three servers in lxml as the use of LXML is even recommended by BeautifulSoup and it would make more sense to unify the library use in OpenLP.
References