CT is a conversion tool that extracts and cleans content from MS Word into a variety of tagged data formats and can then convert these formats back to Word again.
Via the GUI or XML config files, CT can be easily configured to convert MS Word files in any condition and provide clean and consistent output for use in Publishing, database propagation, archiving or simply to standardise and clean the word files.
Converting back to Word is just as easy with full control over the resulting Word document styling via the default styles GUI and/or XML config,
CT currently provides 6 output formats including:
- Interchange DOM – a property rich legible translation of the entire document content and styling properties used by CT internally to go to the other output formats.
- SML – our own typesetting focused tagged data format used exclusively within our Financial Typesetting Solution
- SMX – a well formed XML version of SML
- XHTML
- JSON
- Text only (content only no tags)
- DITA (coming soon!)
Key Features
- Simple drag and drop GUI or command line operation
- Configurable by UI and XML cfg.
- Fast Word document (docx) conversion
- Output format Simplified typesetting focused SMX/SML, Property rich DOM, XHTML, JSON & text
- Extensive options to control output
- Extract of MS Office objects
- Extract images (high res)
- Correct common table alignment errors
- Purge unnecessary internal properties
- Heading style detection based on content
- Clean-up empty table columns/rows
- Intelligent financial table content merging
- Table header row auto-detection
- Create Word documents directly from XML
- Inline regex operations
- Call external programs for further processing e.g XSLT, Perl
For more information or a free 14 day free trial please contact us.