← Back to team overview

zim-wiki team mailing list archive

Re: Questions about Wiki markup and its Parser

 

On Mon, Dec 13, 2010 at 4:52 PM, Michael Nagel <ubuntu@xxxxxxxxxxxxxxxxx>wrote:

> in my opinion moving the markup and it's lexer/parser to something generic
> and future-proof is a very good idea. It could (and should) be the first
> step to make zim markup compatible with some widespread markup syntax like
> markdown or restructured text...


Zim's wik-ilanguage is already close enough to CREOLE.

One issue with Zim is that the interactive wiki-to-neat feature requires
either:

   1. Access to specific parts of the parser.
   2. Repeating parsing logic in the UI.

I took a look at the source code in python-creoleparser, and found it quite
complicated. It's dependency on genshi is a plus or a minus depending on how
you look at it.

I will take the time now to post the prototype parser I wrote to the issue
base to get the feedback.

I think that the modular structure of the parser will be useful in Zim,
because the parts may be used independently. These are the well defined
tasks it performs.

   1. Break the text into lines.
   2. Classify each line according to its potential role in the document:
   heading, bullet, indent, paragraph.
   3. Group lines into blocks with a top-down parser: <pre>, <hn>, <ul>
   <ol>, and <p>.
   4. Resolve inline markup where applicable (not in <pre>), allowing for
   reasonable nesting (italics within bold, f.i.)

Come to think of it, the same parser structure can be applied to any wiki
dialect.

I was thinking about the role of xml.etree in the parsing. My current take
on it is that the parse tree should support the Visitor pattern to make it
easy to render into several formats, including xml.etree. The
*compiler*module and way of doing things may exactly what's needed:

http://docs.python.org/library/compiler.html#compiler.ast.Node
http://docs.python.org/library/compiler.html#compiler.visitor.ASTVisitor

-- 
Juancarlo *Añez*

References