Case Study: Porting chardet to Python 3

  • Mark Pilgrim


Unknown or incorrect character encoding is the number one cause of gibberish text on the web, in your inbox, and indeed across every computer system ever written. In Chapter 4, I talked about the history of character encoding and the creation of Unicode, the “one encoding to rule them all.” I’d love it if I never had to see a gibberish character on a web page again because that would require that all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled text maintained perfect fidelity when converting between encodings.


State Machine Regular Expression Relative Import Recent Call Character Encode 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Mark Pilgrim 2009

Authors and Affiliations

  • Mark Pilgrim

There are no affiliations available

Personalised recommendations