Home   FAQs   New Arrivals   Specials   Pricing & Shipping   Location   Corporate Services   Why Choose Bookware?  
 Search:   
Call our store: 9955 5567 (from within Sydney) or 1800 734 567 (from outside Sydney)
 View Cart   Check Out   
 
Browse by Subject
 TAFE Accounting
 TAFE I.T./Computing
 TAFE - Other
I.T
 .NET
 Windows 8
 Adobe CS6
 Cisco
 CCNA 2012
 CCNP 2012
 Java
 VB
 ASP
 Web Design
 E-Commerce
 Project Management
 ITIL
 Macintosh
 Mobile Devices
 Linux
 Windows Server 2012
 SQL Server 2012
 SAP
Certification
 MCITP
 MCTS
Economics and Business
 Accounting
 Business Information Systems
 Economics
 Finance
 Management
 Marketing
 TAX
 Human Resources
Academic
 Law
 Nursing
 Medical
 Psychology
 Engineering

Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard

by: Richard Gillam

Notify me when in stock

On-line Price: $84.95 (includes GST)

Paperback package 896

20%Off Retail Price

You save: $21.00

This item is available to backorder. Usually ships within 3 - 4 weeks.

Retail Price: $105.95

Publisher: ,Sep-2002

Category: SOFTWARE ENGINEERING Level:

ISBN: 0201700522
ISBN13: 9780201700527

Add to Shopping Cart

Summary


      'Rich has a clear, colloquial style that allows him to make even complex Unicode matters understandable. People dealing with Unicode will find this book a valuable resource.'


  --Dr. Mark Davis, President, The Unicode Consortium


  As the software marketplace becomes more global in scope, programmers are recognizing the importance of the Unicode standard for engineering robust software that works across multiple regions, countries, languages, alphabets, and scripts. Unicode Demystified offers an in-depth introduction to the encoding standard and provides the tools and techniques necessary to create today's globally interoperable software systems.


  An ideal complement to specifics found in The Unicode Standard, Version 3.0 (Addison-Wesley, 2000), this practical guidebook brings the 'big picture' of Unicode into practical focus for the day-to-day programmer and the internationalization specialist alike. Beginning with a structural overview of the standard and a discussion of its heritage and motivations, the book then shifts focus to the various writing systems represented by Unicode--along with the challenges associated with each. From there, the book looks at Unicode in action and presents strategies for implementing various aspects of the standard.


  Topics covered include:


  The basics of Unicode--what it is and what it isn't

The history and development of character encoding

The architecture and salient features of Unicode, including character properties, normalization forms, and storage and serialization formats

The character repertoire: scripts of Europe, the Middle East, Africa, Asia, and more, plus numbers, punctuation, symbols, and special characters

Implementation techniques: conversions, searching and sorting, rendering, and editing

Using Unicode with the Internet, programming languages, and operating systems

With this book as a guide, programmers now have the tools necessary to understand, create, and deploy dynamic software systems across today's increasingly global marketplace.


          0201700522B08092002


  Features


                  Author Bio


      Richard Gillam is a senior development engineer at Trilogy, a leading developer of large-enterprise e-commerce solutions. He is a former member of IBM's Globalization Center of Competency, where he was one of the original designers of the open-source International Components for Unicode and was responsible for several of the international frameworks in the Java Class Libraries. Rich is a former columnist for C++ Report, a regular presenter at the International Unicode Conferences, and a Specialist Member of the Unicode Consortium.


          0201700522AB08092002

Table of Contents

Preface.


  I. UNICODE IN ESSENCE: AN ARCHITECTURAL OVERVIEW OF THE UNICODE STANDARD.


      1. Language, Computers, and Unicode.


  What Unicode Is.

What Unicode Isn't.

The Challenge of Representing Text in Computers.

What This Book Does.

How This Book Is Organized.

Part I: Unicode in Essence.

Part II: Unicode in Depth.

Part III: Unicode in Action.


      2. A Brief History of Character Encoding.


  Prehistory.

The Telegraph and Morse Code.

The Teletypewriter and Baudot Code.

Other Teletype and Telegraphy Codes.

FIELDATA and ASCII.

Hollerith and EBCDIC.

Single-Byte Encoding Systems.

Eight-Bit Encoding Schemes and the ISO 2022 Model.

ISO 8859.

Other 8-Bit Encoding Schemes.

Character Encoding Terminology.

Multiple-Byte Encoding Systems.

East Asian Coded Character Sets.

Character Encoding Schemes for East Asian Coded Character Sets.

Other East Asian Encoding Systems.

ISO 10646 and Unicode.

How the Unicode Standard Is Maintained.


      3. Architecture:Not Just a Pile of Code Charts.


  The Unicode Character-Glyph Model.

Character Positioning.

The Principle of Unification.

Alternate-Glyph Selection.

Multiple Representations.

Flavors of Unicode.

Character Semantics.

Unicode Versions and Unicode Technical Reports.

Unicode Standard Annexes.

Unicode Technical Standards.

Unicode Technical Reports.

Draft and Proposed Draft Technical Reports.

Superseded Technical Reports.

Unicode Versions.

Unicode Stability Policies.

Arrangement of the Encoding Space.

Organization of the Planes.

The Basic Multilingual Plane.

The Supplementary Planes.

Noncharacter Code Point Values.

Conforming to the Standard.

General.

Producing Text as Output.

Interpreting Text from the Outside World.

Passing Text Through.

Drawing Text on the Screen or Other Output Devices.

Comparing Character Strings.

Summary.


      4. Combining Character Sequences and Unicode Normalization.


  How Unicode Non-spacing Marks Work.

Dealing Properly with Combining Character Sequences.

Canonical Decompositions.

Canonical Accent Ordering.

Double Diacritics.

Compatibility Decompositions.

Singleton Decompositions.

Hangul.

Unicode Normalization Forms.

Grapheme Clusters.


      5. Character Properties and the Unicode Character Database.


  Where to Get the Unicode Character Database.

The UNIDATA Directory.

UnicodeData.txt.

PropList.txt.

General Character Properties.

Standard Character Names.

Algorithmically Derived Names.

Control-Character Names.

ISO 10646 Comments.

Block and Script.

General Category.

Letters.

Marks.

Numbers.

Punctuation.

Symbols.

Separators.

Miscellaneous.

Other Categories.

Properties of Letters.

SpecialCasing.txt.

CaseFolding.txt.

Properties of Digits, Numerals, and Mathematical Symbols.

Layout-Related Properties.

Bidirectional Layout.

Mirroring.

Arabic Contextual Shaping.

East Asian Width.

Line-Breaking Property.

Normalization-Related Properties.

Decomposition.

Decomposition Type.

Combining Class.

Composition Exclusion List.

Normalization Test File.

Derived Normalization Properties.

Grapheme Cluster-Related Properties.

Unihan.txt.


      6. Unicode Storage and Serialization Formats.


  A Historical Note.

UTF-32.

UTF-16 and the Surrogate Mechanism.

Ending-ness and the Byte Order Mark.

UTF-8.

CESU-8.

UTF-EBCDIC.

UTF-7.

Standard Compression Scheme for Unicode.

BOCU.

Detecting Unicode Storage Formats.


  II. UNICODE IN DEPTH: A GUIDED TOUR OF THE CHARACTER REPERTOIRE.


          7. Scripts of Europe.


  The Western Alphabetic Scripts.

The Latin Alphabet.

The Latin-1 Characters.

The Latin Extended A Block.

The Latin Extended B Block.

The Latin Extended Additional Block.

The International Phonetic Alphabet.

Diacritical Marks.

Isolated Combining Marks.

Spacing Modifier Letters.

The Greek Alphabet.

The Greek Block.

The Greek Extended Block.

The Coptic Alphabet.

The Cyrillic Alphabet.

The Cyrillic Block.

The Cyrillic Supplementary Block.

The Armenian Alphabet.

The Georgian Alphabet.


      8. Scripts of the Middle East.


  Bidirectional Text Layout.

The Unicode Bidirectional Layout Algorithm.

Inherent Directionality.

Neutrals.

Numbers.

The Left-to-Right and Right-to-Left Marks.

The Explicit Override Characters.

The Explicit Embedding Characters.

Mirroring Characters.

Line and Paragraph Boundaries.

Bidirectional Text in a Text-Editing Environment.

The Hebrew Alphabet.

The Hebrew Block.

The Arabic Alphabet.

The Arabic Block.

Joiners and Non-joiners.

The Arabic Presentation Forms B Block.

The Arabic Presentation Forms A Block.

The Syriac Alphabet.

The Syriac Block.

The Thaana Script.

The Thaana Block.


      9. Scripts of India and Southeast Asia.


  Devanagari.

The Devanagari Block.

Bengali.

The Bengali Block.

Gurmukhi.

The Gurmukhi Block.

Gujarati.

The Gujarati Block.

Oriya.

The Oriya Block.

Tamil.

The Tamil Block.

Telugu.

The Telugu Block.

Kannada.

The Kannada Block.

Malayalam.

The Malayalam Block.

Sinhala.

The Sinhala Block.

Thai.

The Thai Block.

Lao.

The Lao Block.

Khmer.

The Khmer Block.

Myanmar.

The Myanmar Block.

Tibetan.

The Tibetan Block.

The Philippine Scripts.


      10. Scripts of East Asia.


  The Han Characters.

Variant Forms of Han Characters.

Han Characters in Unicode.

The CJK Unified Ideographs Area.

The CJK Unified Ideographs Extension A Area.

The CJK Unified Ideographs Extension B Area.

The CJK Compatibility Ideographs Block.

The CJK Compatibility Ideographs Supplement Block.

The Kangxi Radicals Block.

The CJK Radicals Supplement Block.

Ideographic Description Sequences.

Bopomofo.

The Bopomofo Block.

The Bopomofo Extended Block.

Japanese.

The Hiragana Block.

The Katakana Block.

The Katakana Phonetic Extensions Block.

The Kanbun Block.

Korean.

The Hangul Jamo Block.

The Hangul Compatibility Jamo Block.

The Hangul Syllables Area.

Half-width and Full-width Characters.

The Half-width and Full-width Forms Block.

Vertical Text Layout.

Ruby.

The Interlinear Annotation Characters.

Yi.

The Yi Syllables Block.

The Yi Radicals Block.


      11. Scripts from Other Parts of the World.


  Mongolian.

The Mongolian Block.

Ethiopic.

The Ethiopic Block.

Cherokee.

The Cherokee Block.

Canadian Aboriginal Syllables.

The Unified Canadian Aboriginal Syllabics Block.

Historical Scripts.

Runic.

Ogham.

Old Italic.

Gothic.

Deseret.


      12. Numbers, Punctuation, Symbols, and Specials.


  Numbers.

Western Positional Notation.

Alphabetic Numerals.

Roman Numerals.

Han Characters as Numerals.

Other Numeration Systems.

Numeric Presentation Forms.

National and Nominal Digit Shapes.

Punctuation.

Script-Specific Punctuation.

The General Punctuation Block.

The CJK Symbols and Punctuation Block.

Spaces.

Dashes and Hyphens.

Quotation Marks, Apostrophes, and Similar-Looking Characters.

Paired Punctuation.

Dot Leaders.

Bullets and Dots.

Special Characters.

Line and Paragraph Separators.

Segment and Page Separators.

Control Characters.

Characters That Control Word Wrapping.

Characters That Control Glyph Selection.

The Grapheme Joiner.

Bidirectional Formatting Characters.

Deprecated Characters.

Interlinear Annotation.

The Object Replacement Character.

The General Substitution Character.

Tagging Characters.

Noncharacters.

Symbols Used with Numbers.

Numeric Punctuation.

Currency Symbols.

Unit Markers.

Math Symbols.

Mathematical Alphanumeric Symbols.

Other Symbols and Miscellaneous Characters.

Musical Notation.

Braille.

Other Symbols.

Presentation Forms.

Miscellaneous Characters.


  III. UNICODE IN ACTION: IMPLEMENTING AND USING THE UNICODE STANDARD.


          13 Techniques and Data Structures for Handling Unicode Text.


  Useful Data Structures.

Testing for Membership in a Class.

The Inversion List.

Performing Set Operations on Inversion Lists.

Mapping Single Characters to Other Values.

Inversion Maps.

The Compact Array.

Two-Level Compact Arrays.

Mapping Single Characters to Multiple Values.

Exception Tables.

Mapping Multiple Characters to Other Values.

Exception Tables and Key Closure.

Tries as Exception Tables.

Tries as the Main Lookup Table.

Single Versus Multiple Tables.


      14. Conversions and Transformations.


  Converting Between Unicode Encoding Forms.

Converting Between UTF-16 and UTF-32.

Converting Between UTF-8 and UTF-32.

Converting Between UTF-8 and UTF-16.

Implementing Unicode Compression.

Unicode Normalization.

Canonical Decomposition.

Compatibility Decomposition.

Canonical Composition.

Optimizing Unicode Normalization.

Testing Unicode Normalization.

Converting Between Unicode and Other Standards.

Getting Conversion Information.

Converting Between Unicode and Single-Byte Encodings.

Converting Between Unicode and Multibyte Encodings.

Other Types of Conversions.

Handling Exceptional Conditions.

Dealing with Differences in Encoding Philosophy.

Choosing a Converter.

Line-Break Conversion.

Case Mapping and Case Folding.

Case Mapping on a Single Character.

Case Mapping on a String.

Case Folding.

Transliteration.


      15 Searching and Sorting.


  The Basics of Language-Sensitive String Comparison.

Multilevel Comparisons.

Ignorable Characters.

French Accent Sorting.

Contracting Character Sequences.

Expanding Characters.

Context-Sensitive Weighting.

Putting It All Together.

Other Processes and Equivalences.

Language-Sensitive Comparison on Unicode Text.

Unicode Normalization.

Reordering.

A General Implementation Strategy.

The Unicode Collation Algorithm.

The Default UCA Sort Order.

Alternate Weighting.

Optimizations and Enhancements.

Language-Insensitive String Comparison.

Sorting.

Collation Strength and Secondary Keys.

Exposing Sort Keys.

Minimizing Sort Key Length.

Searching.

The Boyer-Moore Algorithm.

Using the Boyer-Moore Algorithm with Unicode.

'Whole Word' Searches.

Using Unicode with Regular Expressions.


      16. Rendering and Editing.


  Line Breaking.

Line-Breaking Properties.

Implementing Boundary Analysis with Pair Tables.

Implementing Boundary Analysis with State Machines.

Performing Boundary Analysis Using a Dictionary.

A Few More Thoughts on Boundary Analysis.

Performing Line Breaking.

Line Layout.

Glyph Selection and Positioning.

Font Technologies.

Poor Man's Glyph Selection.

Glyph Selection and Placement in AAT.

Glyph Selection and Placement in OpenType.

Special-Purpose Rendering Technology.

Compound and Virtual Fonts.

Special Text-Editing Considerations.

Optimizing for Editing Performance.

Accepting Text Input.

Handling Arrow Keys.

Handling Discontiguous Selection.

Handling Multiple-Click Selection.


      17. Unicode and Other Technologies.


  Unicode and the Internet.

The W3C Character Model.

XML.

HTML and HTTP.

URLs and Domain Names.

Mail and Usenet.

Unicode and Programming Languages.

The Unicode Identifier Guidelines.

Java.

C and C++.

Javascript and JScript.

Visual Basic.

Perl.

ICU.

Unicode and Operating Systems.

Microsoft Windows.

MacOS.

Varieties of UNIX.

Conclusion.


      Glossary.

Bibliography.

Index.

0201700522T082920.