Abstract
HTML is a standard file format for pages that can be viewed in a Web browser. There are a great many ways to create HTML pages, ranging from graphical editors such as Dreamweaver to text editors such as Notepad, Emacs, or Vim. Unfortunately, graphical editors generally do not work well with most Web frameworks, including Rails, and HTML produced with text editors may not be standards-compliant; it may lack closing tags or use invalid tag combinations or attributes. Additionally, some graphical editors produce bad HTML; the problem is exacerbated by the fact that modern browsers are very tolerant of HTML that is not standardscompliant, so very incorrect HTML will often still display properly.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2007 David Berube
About this chapter
Cite this chapter
(2007). Cleaning Dirty HTML with tidy. In: Practical Ruby Gems. Apress. https://doi.org/10.1007/978-1-4302-0193-9_32
Download citation
DOI: https://doi.org/10.1007/978-1-4302-0193-9_32
Publisher Name: Apress
Print ISBN: 978-1-59059-811-5
Online ISBN: 978-1-4302-0193-9
eBook Packages: Professional and Applied ComputingProfessional and Applied Computing (R0)Apress Access Books