← Back to team overview

zim-wiki team mailing list archive

Re: HTML2Zim


You're right Rod, thanks.
clipcli.py produce "UTF-16LE" code with BOM when I copy text *from Firefox*. So I have to convert it before using html2zim.py
This command is know working fine:
$ ~/clipcli.py text/html | iconv -f UTF-16LE -t UTF-8 | ~/html2zim.py -w ~/

But on the other hand, clipcli.py produce "ASCII text" code when I copy *from Chromium*. I'll try to make a bash script to test the result with the "file" utility.

Then I have difficulties to integrate it with Zim.

I created a "custom tool" with:
[x] Output should replace current selection

The problem is that it doesn't work if the cursor is:
* in a blank line
* between two white spaces
* more generally after any caracter and before a caracter different than [0-9a-zA-Z]

It only works if the cursor is before a caracter like [0-9a-zA-Z]. Then it select the word at the right of the cursor and replace it.

I would like it to paste the results:
* wherever is the cursor if nothing is selected
* or in replacement of a selected zone
Is it possible?


Le 18/12/2015 22:08, Rod Morehead a écrit :
Looks like the conversion script assumes the input file textual data is in UTF-8 format, and your file has characters that aren't valid UTF-8.

You can probably use a text editor or conversion utility to convert your file(s) into UTF-8 format, if you can figure out which encoding that are currently using.

Another possibility is that the file starts with a BOM (byte order mark - https://en.wikipedia.org/wiki/Byte_order_mark) and perhaps the html2zim script doesn't understand or ignore the BOM.

You might try removing the BOM by removing the first couple of binary characters from your file and see if that helps.



On 12/18/2015 09:52 AM, Charles Nepote wrote:
Hi all,

I wonder to know if someone has tried html2zim : https://github.com/MacroBull/html2zim clipcli is working fine. When I do the following command (under Ubuntu Linux), I have a good result:

$ ./clipcli.py text/html
��<pre><code>clipcli text/html</code></pre>

But html2zim doesn't work.

$ ./clipcli.py text/html -f ./index.html | ./html2zim.py -w /home/charles ./index.html
Traceback (most recent call last):
  File "./html2zim.py", line 432, in <module>
  File "./html2zim.py", line 424, in main
    buf = open(sys.argv[1]).read().decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Any idea? Unfortunatly, I haven't got any knowledge of python.


Mailing list: https://launchpad.net/~zim-wiki
Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp