Parsing Common Document Types


Rich-text file formats are a mixed blessing for Web 3.0 applications that require general processing of text and at least some degree of semantic understanding. On the positive side, rich text lets you use styling information such as headings, tables, and metadata to identify important or specific parts of documents. On the negative side, dealing with rich text is more complex than working with plain text. You’ll get more in-depth coverage of style markup in Chapter 10, but I’ll cover some basics here.


Base Class Document Type Plain Text Code Snippet Cascade Style Sheets 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Mark Watson 2009

Personalised recommendations