← Back to team overview

zim-wiki team mailing list archive

Questions about Wiki markup and its Parser

 

I really like Zim. It has let me take care of my day-to-day design notes in
an simple,  useful, and safe way.

I was examining the source code for Zim with the aim of adding support fro
enumerated lists, but found the code that deals with wiki-text too
complicated to be amenable for a small hack.

Questions:

   1. Is there a reason why Zim doesn't use one of the wiki markup libraries
   readily available for Python?
   2. Would it help (would it be accepted) if I wrote a simpler and more
   extensible wiki parser?

I specialized in programing language theory at the university, and later
taught the subject as a professor, and I think I know why parsing wiki tends
to be complicated:

   1. Wiki is not a regular language, so regular expressions alone don't cut
   it.
   2. Wiki has languages within languages, much like Python format strings
   are a language within Python.
   3. Some of the sub-languages in Wiki are regular, and the outer languages
   seem amenable to top-down analysis, but the whole doesn't look LL or LR, so
   it is unlikely that a grammar can be built for it.

The above means that parsing of Wiki should probably be done by serveral
parsers set up in layers or piplelines. It also means that the sub-languages
should be more formally defined (with the likely consequence that some odd
stuff that currently "just works"  may cease to).

Again, if there's interest, I can give it a shot. The purpose would be to
have a parser that's easier to understand and improve (womewhere in the Zim
docs it says that performance is a non-issue in parsing).

-- 
Juancarlo *Añez*

Follow ups