Examples

Here I would like to present two big files to show the power of HTML Cleaner for Word.

The first one is a document from Microsoft, full name is Microsoft Office 2000 to Microsoft Office 2003 Migration Issues, it's a white paper technical article.

The second is also a white paper from Microsoft, Microsoft Office Word 2003 Rich Text Format (RTF) Specification. Both of them are big files full of complicated tables, retaining the exact appearance is critical and necessary.

Let's see the result of cleaning by HTML Cleaner for Word.
(You can download the example files to compare the result in browser and the code in your editor.)

Example File - MS Office Migration Issues
File NameStatusSize
office00-03Delta03_o.htmOriginal by MS Word1,416k
office00-03Delta03_m.htmCleaned in Medium Scheme715k
office00-03Delta03_x.htmCleaned in Extreme Scheme475k

Download the package of these example files

Example File - Rich Text Format Specification
File NameStatusSize
RTF18-03_o.htmOriginal by MS Word 20032,776k
RTF18-03_m.htmCleaned in Medium Scheme1,235k
RTF18-03_x.htmCleaned in Extreme Scheme1,002k

Download the package of these example files

Both of them were cleaned by 50%, while having the exact same appearance as original, lossless. Though extreme cleaning can save more bytes, appearance was lost. The cleaned file with best quality is only a bit larger than pure HTML one.

Now, let's see some sample codes.

The original HTML code generated by MS Word 2003

Cleaned HTML by HTML Cleaner for Word 1.6

Is it much more easy to read and edit? If you like, you can even remove paragraph(<p class=MainBodyNoPad>) out of cells(<td>) by checking "Optimize tables" up in options. (It may chabge the font or margins of the text in cells)