zim-wiki team mailing list archive

Thread
Date

Re: HTML2Zim

To: Rod Morehead <rmore@xxxxxxxxx>, "zim-wiki@xxxxxxxxxxxxxxxxxxx" <zim-wiki@xxxxxxxxxxxxxxxxxxx>
From: Charles Nepote <charles@xxxxxxxxxx>
Date: Mon, 21 Dec 2015 15:41:41 +0100
In-reply-to: <567475E2.9050304@rmore.net>
Reply-to: charles@xxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

You're right Rod, thanks.

clipcli.py produce "UTF-16LE" code with BOM when I copy text *fromFirefox*. So I have to convert it before using html2zim.py

This command is know working fine:
$ ~/clipcli.py text/html | iconv -f UTF-16LE -t UTF-8 | ~/html2zim.py -w ~/

But on the other hand, clipcli.py produce "ASCII text" code when I copy*from Chromium*. I'll try to make a bash script to test the result withthe "file" utility.



Then I have difficulties to integrate it with Zim.

I created a "custom tool" with:
[x] Output should replace current selection

The problem is that it doesn't work if the cursor is:
* in a blank line
* between two white spaces

* more generally after any caracter and before a caracter different than[0-9a-zA-Z]

It only works if the cursor is before a caracter like [0-9a-zA-Z]. Thenit select the word at the right of the cursor and replace it.


I would like it to paste the results:
* wherever is the cursor if nothing is selected
* or in replacement of a selected zone
Is it possible?

Charles.


Le 18/12/2015 22:08, Rod Morehead a écrit :

Looks like the conversion script assumes the input file textual datais in UTF-8 format, and your file has characters that aren't valid UTF-8.
You can probably use a text editor or conversion utility to convertyour file(s) into UTF-8 format, if you can figure out which encodingthat are currently using.
Another possibility is that the file starts with a BOM (byte ordermark - https://en.wikipedia.org/wiki/Byte_order_mark) and perhaps thehtml2zim script doesn't understand or ignore the BOM.
You might try removing the BOM by removing the first couple of binarycharacters from your file and see if that helps.
Thanks,

--Rod


On 12/18/2015 09:52 AM, Charles Nepote wrote:
Hi all,
I wonder to know if someone has tried html2zim :https://github.com/MacroBull/html2zimclipcli is working fine. When I do the following command (underUbuntu Linux), I have a good result:
$ ./clipcli.py text/html
��<pre><code>clipcli text/html</code></pre>


But html2zim doesn't work.
$ ./clipcli.py text/html -f ./index.html | ./html2zim.py -w/home/charles ./index.html
Traceback (most recent call last):
  File "./html2zim.py", line 432, in <module>
    sys.exit(main())
  File "./html2zim.py", line 424, in main
    buf = open(sys.argv[1]).read().decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position0: invalid start byte
Any idea? Unfortunatly, I haven't got any knowledge of python.


Charles.


_______________________________________________
Mailing list: https://launchpad.net/~zim-wiki
Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp

References

HTML2Zim
From: Charles Nepote, 2015-12-18