Total Knowledge Projects Jobs Clientele Contact

Text Handling

Personal tools
From Total Knowledge
(Difference between revisions)
Jump to: navigation, search
(Some thoughts on text handling)
Current revision (19:55, 19 May 2011) (view source)
(Implementation: Add a note on media UMOs)
 
(3 intermediate revisions not shown.)
Line 14: Line 14:
* Only allowed XHTML tags are used
* Only allowed XHTML tags are used
* All media references are valid
* All media references are valid
-
* All formulas are compileable
+
* All formulae are compileable
Sets of allowed tags and MM objects will depend on kind of text
Sets of allowed tags and MM objects will depend on kind of text
Line 32: Line 32:
Final HTML should be cached. It would make sense to cache it at the time
Final HTML should be cached. It would make sense to cache it at the time
object is published, since published version is immutable.
object is published, since published version is immutable.
 +
 +
=== Implementation ===
 +
On the back end, any rich text will be represented as XML, which will include
 +
subset of allowed HTML tags (perhaps configurable by administrator) and
 +
our special tags for embedding media. Whenever such text will be needed for
 +
display purposes, the theme will request it, passing in an object which implements
 +
ITextRenderer interface, providing functions for turning embedded objects into
 +
HTML tags (in fact they'll have to return corresponding sptk::CXmlElements).
 +
 +
Overall procedure will be as follows:
 +
* get root element of the text to render (does not have to be complete document)
 +
* call <code>uu::util::parseText()</code>, pass in the root and the renderer
 +
* <code>parseText</code> will go through all tags, applying the renderer as needed and will throw an exception if invalid tag is encountered.
 +
* <code>parseText</code> will serialize resulting XML tree into a string and return it.
 +
 +
Embedded media objects (images, flash, etc.) are stored as separate UMOs, and thus must be linked to the UMO being rendered, in order to be used. These objects are referenced in the text through base ID of the embedded UMO. When UMO is published, umo_to_parent_umo ID is supplemented in the text. This way all the normal rules for versioning and relations between UMO versions apply, and at the same time the need to keep track of media UMOs is reduced and the procedure is simplified.

Current revision

Text in UU will be represented with a safe subset of XHTML for the most part. We might add few extensions, for inserting media, Math formulas (MathML? LaTeX?)

Contents

Text Editor

Editing will be done with TinyMCE. We'll have to write a plugin for TinyMCE, to provide simple UI-based access to multimedia objects.

We shall also provide simple plain-text editing ability for non-JS, or pure-text browsers.

On the backend, we will need a filter system, which would verify validity of the text according to the following criteria:

  • Only allowed XHTML tags are used
  • All media references are valid
  • All formulae are compileable

Sets of allowed tags and MM objects will depend on kind of text

Multimedia Objects

Once the validation is passed, list of media references will be extracted, and UMO parent-child relationships with corresponding media objects will be established.

Links to multimedia objects work just like any other umo links (new versions will notify owner or just auto-update. The only question here is if we want to auto-rewrite original object with new IDs when switching to new media object version, or if we want to make separate IDs for such links, which will be permanently preserved.

Cache

Final HTML should be cached. It would make sense to cache it at the time object is published, since published version is immutable.

Implementation

On the back end, any rich text will be represented as XML, which will include subset of allowed HTML tags (perhaps configurable by administrator) and our special tags for embedding media. Whenever such text will be needed for display purposes, the theme will request it, passing in an object which implements ITextRenderer interface, providing functions for turning embedded objects into HTML tags (in fact they'll have to return corresponding sptk::CXmlElements).

Overall procedure will be as follows:

  • get root element of the text to render (does not have to be complete document)
  • call uu::util::parseText(), pass in the root and the renderer
  • parseText will go through all tags, applying the renderer as needed and will throw an exception if invalid tag is encountered.
  • parseText will serialize resulting XML tree into a string and return it.

Embedded media objects (images, flash, etc.) are stored as separate UMOs, and thus must be linked to the UMO being rendered, in order to be used. These objects are referenced in the text through base ID of the embedded UMO. When UMO is published, umo_to_parent_umo ID is supplemented in the text. This way all the normal rules for versioning and relations between UMO versions apply, and at the same time the need to keep track of media UMOs is reduced and the procedure is simplified.