← Back to team overview

zim-wiki team mailing list archive

Re: Migration from OneNote to Zim


On Fri, Mar 15, 2013 at 8:12 AM, Jaap Karssenberg
<jaap.karssenberg@xxxxxxxxx> wrote:
> On Thu, Mar 14, 2013 at 5:39 PM, Michael Spranger
> <mikeitsecurity@xxxxxxxxx> wrote:
>> How much effort would it take to get that self contained HTML to import into
>> zim?  I am not a scripter so I am of no help there.
> I got some code to unpack the stand alone HTML, that part is easy.
> Next step will be converting the HTML to text while preserving at
> least images and bullet lists. Some other markup can be preserved, but
> most may get lost. Tables will end up as lines of text.
> One limitation I see at the moment for the OneNote importer is that
> when I export a section from OneNote I get multiple pages in a single
> HTML file. Unfortunately the start of a new page is not clearly marked
> in the HTML, so splitting up in multiple pages will not be very
> robust.

OK, I also found some code I hacked some time ago to import fragments
of HTML. Will have to put the two together to get a real solution.

What I need at this point to proceed is some test data:
* .mht export of a notebook section containing multiple pages
* include some images
* include some bullet lists
* include headings and sub-headings (level 1 / 2 /.. )
* use bold / italic / ...
* include some bullet lists

Please make sure that such test data is not private and copyright
free, so I can add it to zim's test suite eventually. Try make it look
like realistic notes, that makes it easier to check if result looks
good as well. (So far I have been using an export of OneNote's welcome
pages, good example data but all copyrighted by Microsoft.)

Given good test data I can probably have a working import function in
a week or two.



Follow ups