← Back to team overview

ubuntu-manual team mailing list archive

Re: Content management changes to simplify collaboration


On Tue, 2010-05-18 at 10:13 -0700, Ilya Haykinson wrote:
>  - structured data storage with a concept of a "unit of content"
>     - this is the smallest "addressable" item
>     - I propose that the units of content be paragraphs of copy

I don't know of any open source help systems that treat a
paragraph as a unit of information. You can build your own
system, of course, using the markup from an open format
like Mallard, DocBook, or DITA.

There are some proprietary systems that work this way. I'm
not proposing you use any of them, of course. But you may
want to look at programs like Author-it for ideas.

>  - hierarchical content organization that organizes the content units
>  - ability to create "works" that combine content units in a different
> organizational structure
>  - full-text search capability
>  - data storage using semantic markup
>  - data input using either plain text, or the semantic markup
>  - simple user account management
>  - ability to either instantly commit (for authenticated users) or
> queue up changes for review (for anonymous users)
>  - ability to process a review queue
>  - audits of all changesets

Some of these points are relevant to producing content,
and some of them are relevant to deploying it. I realize
that we have to worry about both ends in open source, but
I think it helps to separate them when designing systems.
For example, full-text search is a deployment issue, and
probably doesn't affect decisions for writing and storage.

>  - DocBook
>     - Pro: strong in semantics; lots of tools
>     - Cons: complex to write manually
>  - Mallard
>     - Pro: simpler than Docbook
>     - Cons: weaker semantics; fewer tools
> Of these, my preference would be for DocBook -- mainly for its
> comprehensiveness in markup capabilities, and because we could always
> convert to Mallard from it (but not the other way).

I actually think it's easier to go from Mallard to DocBook.
Granted, the information in a Mallard document isn't nearly
as semantically rich as that in DocBook. But excepting the
dynamic link system, there are natural and obvious DocBook
elements to target for most or all Mallard elements.

In some cases, you might have information loss if you try
to round-trip. For example, Mallard doesn't have a modeled
classsynopsis element. To convert that to Mallard, you would
pre-format it and put it inside a code block in a synopsis.
When going back to DocBook, you're not going to get back to
a modeled classsynopsis, but you can do a trivial conversion
to a programlisting inside a synopsis element.

Converting from DocBook to Mallard will present difficulties
because DocBook's content model is so loose. For example,
Mallard is very strict about block and inline content models.
You can't mix block elements into inline content. If you have
a DocBook file with an itemizedlist inside a para, you need
to split the content for Mallard.

Furthermore, if the DocBook content is going to be converted
to Mallard automatically to embed it into another Mallard
document (e.g. help pages to embed into the Desktop Help
from upstream), you're going to have to figure out a way
to carry the link information. If you use DocBook 5, you
can embed Mallard-namespaced elements directly into an info
element, which is a nice solution.

I think you need to decide exactly what type of semantic
markup you actually need. Do you need to distinguish between
a menu label and a submenu label, or is one markup element
for GUI labels enough? (And, by the way, you can distinguish
these in Mallard with a style hint, if you need to.) Do you
need modeled class synopses and EBNF productions, or is it
sufficient to pre-format them? (GTK+'s documentation system
pre-formats anyway, even though it uses DocBook.) Do you
need a special element for Q-and-A lists, or is a regular
terms list enough? (You can still use a style hint to give
it special formatting.)

There are a lot more formats than DocBook and Mallard. They
all have their strengths and weaknesses. The only way to
make a good decision is by looking at your requirements and
how those formats stack up against them.