Skip to main content

Internationalization

  • Chapter
  • 1276 Accesses

Abstract

Web authors publish in all languages of the world, and several technologies support this multilingual Web. A key factor of correct character representation on the Web is applying the appropriate character encoding. Although this depends on server settings as well, web developers can effectively contribute to proper internationalization of the physical and syntactic structures of web documents. One of the very first steps in standard web site development is to apply national settings on both the file and document content level. Unicode can be considered as the ultimate encoding and is described from the standardistas’ point of view. The use of Unicode byte-order marks, which provide information about the ordering of individually addressable subcomponents within the representation of this multibyte character encoding, can be confusing. Special characters and symbols can often be provided in various ways, including entity sets, escape codes, and hexadecimal notation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Unicode Consortium (2010) The Unicode Standard: A Technical Introduction. Unicode, Inc. www.unicode.org/standard/principles.html. Accessed 29 September 2010

    Google Scholar 

  2. Unicode (2011) Unicode 6.0 Character Code Charts. Unicode Consortium. www.unicode.org/charts/. Accessed 03 Aug 2011

    Google Scholar 

  3. Yergeau F (2003). UTF-8, a transformation format of ISO 10646 [RFC3629]. The Internet Society. www.ietf.org/rfc/rfc3629.txt. Accessed 29 September 2010

    Google Scholar 

  4. Duerst M, Suignard M (2005) Internationalized Resource Identifiers (IRIs). The Internet Society. www.ietf.org/rfc/rfc3987. Accessed 30 September 2010

    Google Scholar 

  5. Ishida R (2010) An Introduction to Multilingual Web Addresses. World Wide Web Consortium. www.w3.org/International/articles/idn-and-iri/. Accessed 30 September 2010

    Google Scholar 

  6. Hickson I (ed.) (2010) HTML5 (Edition for Web Authors) revision 1.4439. A vocabulary and associated APIs for HTML and XHTML. Editor’s Draft. World Wide Web Consortium. http://dev.w3.org/html5/spec-author-view/semantics.html. Accessed 29 September 2010

    Google Scholar 

  7. Hickson I (ed.) (2010) HTML5 (including next generation additions still in development). Draft Standard. Apple Computer, Inc., Mozilla Foundation, and Opera Software ASA. www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html. Accessed 29 September 2010

    Google Scholar 

  8. Dürst M, Freytag A (2007) Characters not suitable for use with markup. In: Unicode in XML and other Markup Languages. Unicode Technical Report #20. W3C Working Group Note. World Wide Web Consortium. www.w3.org/TR/unicode-xml/#Suitable. Accessed 30 September 2010

    Google Scholar 

  9. Dürst M, Freytag A (2007) Format Characters Suitable for Use with Markup. In: Unicode in XML and other Markup Languages. Unicode Technical Report #20. W3C Working Group Note. World Wide Web Consortium. www.w3.org/TR/unicode-xml/#Format. Accessed 30 September 2010

    Google Scholar 

  10. Ishida R (2010) What do I need to know about the BOM? In: The byte-order mark (BOM) in HTML. World Wide Web Consortium. www.w3.org/International/questions/qa-byte-order-mark#bomhow. Accessed 30 September 2010

    Google Scholar 

  11. Cawkwell D, Ishida R (2010) Display problems caused by the UTF-8 BOM. World Wide Web Consortium. www.w3.org/International/questions/qa-utf8-bom. Accessed 30 September 2010

    Google Scholar 

  12. Ishida R (2007). UTF-8 BOM tester. Richard Ishida. http://rishida.net/utils/bomtester/. Accessed 30 September 2010

    Google Scholar 

  13. Ishida R (2010) Normalization in HTML and CSS. World Wide Web Consortium. www.w3.org/International/questions/qa-html-css-normalization. Accessed 30 September 2010

    Google Scholar 

  14. Ishida R (2010) Use UTF-8, if you can. In: Choosing & applying a character encoding. World Wide Web Consortium. www.w3.org/International/questions/qa-choosing-encodings#useunicode. Accessed 30 September 2010

    Google Scholar 

  15. Simonsen K et al (2010) Character sets. The Internet Assigned Numbers Authority. www.iana.org/assignments/character-sets. Accessed 30 September 2010

    Google Scholar 

  16. Ishida R (2010) When to use escapes. In: Using character escapes in markup and CSS. World Wide Web Consortium. www.w3.org/International/questions/qa-escapes#use. Accessed 30 September 2010

    Google Scholar 

  17. Pemberton S et al (2002) Entity references as hex values. In: XHTML 1.0 — The Extensible HyperText Markup Language (2nd edn). A Reformulation of HTML 4 in XML 1.0. W3C Recommendation. World Wide Web Consortium. www.w3.org/TR/xhtml1/#h-4.12. Accessed 29 September 2010

    Google Scholar 

  18. Le Hors A, Jacobs I (ed.) (1999) Character entity references in HTML 4. In: HTML 4.01 Specification. W3C Recommendation. World Wide Web Consortium. www.w3.org/TR/html4/sgml/entities.html. Accessed 29 September 2010

    Google Scholar 

  19. Pemberton S et al (2002) Entity Sets. In: XHTML 1.0 — The Extensible HyperText Markup Language (2nd edn). A Reformulation of HTML 4 in XML 1.0. W3C Recommendation. World Wide Web Consortium. www.w3.org/TR/xhtml1/#h-A2. Accessed 29 September 2010

    Google Scholar 

  20. Pemberton S et al (2002) Using Ampersands in Attribute Values (and Elsewhere). In: XHTML 1.0 — The Extensible HyperText Markup Language (2nd edn). A Reformulation of HTML 4 in XML 1.0. W3C Recommendation. World Wide Web Consortium. www.w3.org/TR/2002/REC-xhtml1-20020801/#C_16. Accessed 30 September 2010

    Google Scholar 

  21. Ishida R (2010) By the way. In: Using character escapes in markup and CSS. World Wide Web Consortium. www.w3.org/International/questions/qa-escapes#bytheway. Accessed 30 September 2010

    Google Scholar 

  22. Pemberton S et al (2002) The Named Character Reference ↪os;. In: XHTML 1.0 — The Extensible HyperText Markup Language (2nd edn). A Reformulation of HTML 4 in XML 1.0. W3C Recommendation. World Wide Web Consortium. www.w3.org/TR/2002/REC-xhtml1-20020801/#C_16. Accessed 30 September 2010

    Google Scholar 

  23. Ishida R (2010) When not to use escapes. In: Using character escapes in markup and CSS. World Wide Web Consortium. www.w3.org/International/questions/qa-escapes#not. Accessed 30 September 2010

    Google Scholar 

  24. W3C I18N Activity Group (2010) W3C Internationalization Checker. World Wide Web Consortium. http://qa-dev.w3.org/i18n-checker/. Accessed 30 September 2010

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Leslie F. Sikos, Ph.D.

About this chapter

Cite this chapter

Sikos, L.F. (2011). Internationalization. In: Web Standards. Apress. https://doi.org/10.1007/978-1-4302-4042-6_2

Download citation

Publish with us

Policies and ethics