Originally compiled and posted 10 May 2000.
Content last modified sometime in 2000.
External links last verified 18 Nov. 2000.

Important: This is a historic page that is no longer being maintained. The information is outdated, and some of it never was fully accurate. Links are no longer being checked and may well fail. It is being left online as a historical reference for those who might need to try and figure out what was going on with WWW pages made before Unicode was standardized.

Using Character Encodings, References, and Entities - Real-World Considerations

In a perfect world, there would be one uniform standard which would encompass all glyphs of all human languages, with room for new additions... a standard used on all computing equipment. Actually, there now is such a standard: Unicode.

Thankfully, Unicode is becoming the standard character set for Microsoft Windows and the Mac OS (Apple and Xerox were early major proponents of the Unicode standard), as well as other essential platforms with which the author has insufficient familiarity to discuss here.

Problem is, there is a whole bunch of legacy software and hardware out there that does not comprehend Unicode, and is likely to continue being used for at least the next few years. This presents a major headache for the web designer, since there is no one single way to represent characters - even just Roman characters used in English - which will display correctly on everything.

The easiest and safest approach, where possible, is to stick with "low ASCII", the characters of the ASCII character set below number 128. The text of these paragraphs is an example - standard characters, no accents, no typographical embellishments such as "curly" quotation marks.

If one is in a totally isolated environment such as a fully private Intranet, and all the computing equipment on the Intranet uses the same character set, one can take advantage of character encoding declarations. Preferably (from what i read), this is done server-side. The next best option is under the control of the HTML author, in the form of a META tag in the <HEAD> part:

<META http-equiv="Content-Type" content="text/html; charset=ISO-10646-Unicode-Latin-1">

(for a full listing of possible charset [actually character encoding] entries, see IANA registered charset values).
When one of these options is taken, the web page author may generate and use any characters available, just as if using a word processor on that platform.

Why is this not useful on the public Internet? Because most personal computers/web browsers (especially older ones) are terrible at supporting anything besides ISO Latin 1 (ISO-8859-1) and whatever is native on their platform - and not all systems agree on how to display ISO Latin 1!

What to do? A majority of web authors seem to prefer using a de-facto flavor of ISO Latin 1 as promulgated by Netscape with early versions of Navigator, seemingly based upon Windows ANSI, which it closely matches. This works for Netscape products (and probably for MS Internet Explorer, though the author of these pages avoids MS products and has never checked) and other products which follow Netscape's definition of mapping characters between numbers 128 and 159. Due to the history of many conflicting character set mappings in the range 128 to 159, W3C officially declares 128 to 159 to be forbidden territory, to be avoided. Even so, the popularity of the Big Two browsers makes this a viable choice for many web authors.

From ongoing personal experience, this choice alienates users of minority browsers, some of which correctly do not make assumptions about characters in the range 128 to 159, and will display "?" or a similar symbol indicative of undefined data (the popular Cyberdog 2.0 for Macintosh is one example).

Since there is no one solution for displaying "high ASCII" while meeting the needs of all browsers (short of serving custom pages designed for the foibles of the various browsers, which is an outstanding choice if one has the wherewithall to pull it off), if one is to use those "special" characters, one has to choose what browser users one wishes to alienate. The popular de-facto Netscape standard has been discussed above; next i present one alternative that i have chosen and would like you to consider.

On my personal pages, where no large sums of money are at stake, and no one will lose their job if visitors leave unhappy, i have the luxury of pleasing myself. This i have done by making the site look stellar on whatever my favorite browser is at the time.

For awhile, this was the very old MacWeb 1.1.1e, one of the few browsers to work on the Mac Plus. Since MacWeb could only use the native Standard Mac Roman Character Set, since i had no idea at the time how to properly declare a character encoding, and since even if i had done so many (if not all) Wintel systems would not have even tried to remap for proper display, i probably managed to alienate all kinds of folks. I knew this was suboptimal, even for a backwater personal site.

Since switching to Cyberdog 2.0 several years ago, learning of Cyberdog's visionary and early support of the Unicode mapping, and learning about what i am sharing with you on these pages, i now have a much better, though still imperfect, setup.

I start by not declaring a character encoding (actually, this is not that good an idea; i will likely list ISO-8859-1 when i have time). If/when i do declare one, it will be ISO-8859-1, since nearly everything can (or thinks it can) handle that. Next, i specifically use either character references or character entities whenever i wish to use a "special" character which is not part of "low ASCII". These will display correctly on browsers that know what to do with them no matter what character encoding the browser thinks i am using. In fact, so far this is the same as most web authors using the de-facto Netscape method.

The departure is in what particular references or entities i use. I avoid 128-159 entirely. Instead, i use the proper Unicode reference or entity, such as &#8217; or &rsquo; for the "curly" right apostrophe (a.k.a. right single quote). The way i figure it, users of the Big Two tend to want to upgrade to the latest and greatest when they come out, and at least in the case of Netscape, recent versions support the Unicode references (so far, not so hot on the entities, so i usually use the references these days). This way, i have my browser, Cyberdog, covered, plus current versions of the Big Two, and anything else smart enough to handle Unicode. The only folks i am alienating are older Big Two users, MacWeb users, and other folks whose browsers don't know Unicode from bagels. This method will become more compatible (rather than less) as folks continue to dance the upgrade shuffle, and we all enter our compatible future (so we believe).


World O’ Apple & Macintosh Sonic’s signatureThe Sonically Pure Pages

This Siber-Sonically pure page is:
Valid HTML 4.01! Valid CSS!  yet another Web page made on a Mac Cyberdog 2.0 savvy