← Back to team overview

ubuntu-manual team mailing list archive

Content management changes to simplify collaboration

 

All,

I would like to jump-start the discussion of the content management system
(i.e. content pool), below.

Here are my values for this type of a system:

 - structured data storage with a concept of a "unit of content"
    - this is the smallest "addressable" item
    - I propose that the units of content be paragraphs of copy
 - hierarchical content organization that organizes the content units
 - ability to create "works" that combine content units in a different
organizational structure
 - full-text search capability
 - data storage using semantic markup
 - data input using either plain text, or the semantic markup
 - simple user account management
 - ability to either instantly commit (for authenticated users) or queue up
changes for review (for anonymous users)
 - ability to process a review queue
 - audits of all changesets

Assuming agreement on the above (not a given, but I wanted to explore
further), there are some open questions with this approach:

 - What data format should we use for storage?
 - What should be the backing store for this data?
 - What technology should be used to implement this?

Candidates for storage:

 - DocBook
    - Pro: strong in semantics; lots of tools
    - Cons: complex to write manually
 - Mallard
    - Pro: simpler than Docbook
    - Cons: weaker semantics; fewer tools

Of these, my preference would be for DocBook -- mainly for its
comprehensiveness in markup capabilities, and because we could always
convert to Mallard from it (but not the other way).

Candidates for backing store:

 - Database
   - Pros: we can design whatever storage structure we want; easier to
search or query; can store metadata alongside data; could work well with
review queues
   - Cons: we'd have to build an RCS-style system within a DB; harder to
contribute offline (can't check out the whole corpus)
 - RCS like bazaar or git
   - Pros: already-built RCS; strong support for changesets; able to check
out whole corpus and work offline; easier to write file-based tools
   - Cons: poor storage of metadata; poor query ability

I don't have a strong recommendation here. Due to the difficulty of storing
metadata in an RCS I could see going with a database-only system (and taking
the hit on implementing an RCS-lite); or going with a hybrid where all the
metadata and the latest version is in a database but revisions etc are
stored in an RCS.

I would leave the technology question for some future date.

Thoughts? Feedback?

-ilya haykinson

Follow ups