Advertisement

Case Study: Porting chardet to Python 3

  • Mark Pilgrim

Abstract

Unknown or incorrect character encoding is the number one cause of gibberish text on the web, in your inbox, and indeed across every computer system ever written. In Chapter 4, I talked about the history of character encoding and the creation of Unicode, the “one encoding to rule them all.” I’d love it if I never had to see a gibberish character on a web page again because that would require that all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled text maintained perfect fidelity when converting between encodings.

Keywords

State Machine Regular Expression Relative Import Recent Call Character Encode 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Mark Pilgrim 2009

Authors and Affiliations

  • Mark Pilgrim

There are no affiliations available

Personalised recommendations