entities:

Special characters or entities

This page will introduce you to the Special Characters in html and also to the concept  of entities and the various  entities that are available to use in your documents.

Quick Links:
HTML Entities
En and em spaces
Copyright and trademark
Quick Links:
Currency symbols
Real quotes
Understanding encoding

Entities

There are certain characters that cannot be simply typed in they have to be referenced directly, also some of the characters cannot actually be typed on a standard  keyboard
These are characters such as the Trade Mark symbol (®) or the Copyright symbol (©), and others could cause the html client  confusion, with characters like the right angle  brackets or greater than symbols (< and >).
This is simply because several characters in html have special meanings  for special purposes,  such as the less than symbol < this signals  to the browser the begining of a tag.
So this character cannot be used in normal text  within an html document, this is why the correct entities or code is used instead, and then the browser will then render the correct character.
These types of characters in html are referred to as "entities", and they are referrenced by using a particular code in an html document.
This type of code always begins with an ampersand like this & and they always end with a semicolon like this ;
Here is an example of the entity for the ampersand &amp;
There are three diffent ways to specify an entity they are listed below.

  • A mnemonic code, such as the ampersand &amp; and the &copy; © copyright symbol.
  • A decimal value that corrseponds to the character, such as &#169 for the copyright symbol.
  • A hexadecimal value corresponding the the character, such as &#xA9 the copyright symbol.
Light bulb icon Remember if you use the decimal or the hexadecimal methods to specify an html entities, you need to prefix the value with the number # or pound sign/symbol.
Below is a table containing a few essential entities.
Decimal Entity Mnemonic Entity Character
&#34; &quot; Double quote mark
&#38; &amp; Ampersand
&#60; &lt; Less than symbol
&#62; &gt; Greater than symbol
&#160; &nbsp; Nonbreaking space

En and Em spaces and dashes

There are two other types of dashes and spaces in html, they the en and the em dashes and spaces.
The names of these characters are derived from their relative size, the en characters  are as wide as a Capital N, and the em characters  are as wide as a capital M.
These characters are used in specific ways in the English language, these are described below.

  • En spaces are used, in cases where you need a larger space than normal space between the characters. A good example of the use of en spaces is between street numbers and street names, to give an address clarity, (123 Nowhere Street).
  • Em are uausually used in a document to separate headlines,dates,numbers,figures and captions ect (Figure 5–8 A Crankshaft:).
  • En dashes are used in phone numbers and element numbering types of constructs, instead of using hyphens (1–2–3–4).
  • Em dashes are usually used grammatically in an html document, a good example is dividing thoughts in a sentence (The service was awful—well that's what I though anyway).

The table below lists the En and the Em entities .

Decimal Entity Mnemonic Entity Character
&#8194; &ensp; En space
&#8195; &emsp; Em space
&#8211; &ndash; En dash
&#8212; &mdash; Em dash
Top of pageTop of page

   

Copyright and Trademark Symbols

The Copyright and the Trademark symbols are special  and they signify a legal relationship that exists between individuals, companies, text or images, ect used in the document.
The Copyright looks like this©, and is used in a document to indicate that someone has asserted certain rights on the material, in the document, for example the text or an image.
Usually some text is included with the symbol to indicate which rights, an example of this is often found in written "works" include this type of "phrase" as a Copyright, "Copyright © 2003. All rights reserved"

The Trademark looks like this TM, and the Registered mark looks like this ®, and they are used to indicate that a word or phrase has been trademarked or registered.
This then gives the company or an individual the right to use this unique trademark, an example of this is "Windows" is a registered trademark of Microsoft.
Note:
There are fonts that do contain the trademark symbol, however this symbol is actually two  characters and as such is included as an exception, not as a rule.
It is better not to rely on an entity to display the symbol, but to use small and superscripted font  coding to achieve the same effect, as the example below illustrates.

  <small><sup>TM</sup></small>

The table below lists the Copyright and the Registered entities.

Decimal Entity Mnemonic Entity Character
&#169; &copy Copyright symbol
&#174; &reg; Registered symbol
Top of pageTop of page

   

Currency Symbols

There are quite a few currency symbols these include the English pound (£), the European Euro () and the U.S. dollar ($), and the Japanese yen (¥) too.
There is also a General currency symbol too ¤, the dollar symbol is ASCII character 24 (US fonts ) and can be found on your keyboard and used directly.
The table below lists the currency entities.

Deciaml Entity Mnenomic Entity Character
&#162; &cent; The cent symbo (¢)
&#163; &pound; English pound
&#164; &curren; General currency
&#165; &yen; Japanese yen
&#8364; &euro; European euro
Top of pageTop of page

   

"Real" Quotation Marks

The Quotation marks that are available on your keyboard ( " and ) are "straight quotes", this means that they are actually just small superscripted vertical lines.
The quote marks used in Publishing resemble the numbers "6" and "9", and they are dots with a serif leading off them like these “ ”
The table below lists the entities for real quotes.

Decimal Entity Mnemonic Entity Character
&#8216; &lsquo; Left/Opening single-quote
&#8217; &rsquo; Right/Closing single-quote and apostrophe
&#8220; &ldquo; Left/Opening double-quote
&#8221; &rdquo; Right/Closing double-quote

There are lots more entities for many other characters such as Arrows, there are also Accented characters, Greek and Mathematical characters and Mathematical  symbols, I have just touched the surface here, these are the most common  entities used in html documents.

Top of pageTop of page

You can find a comprehensive list at the W3Schools web site of entities

Understanding Character Encodings

Despite the fact that html has it's roots firmly grounded in plain text, html needs to have the ability to display a wide range of characters, and many of these cannot be typed on a regular keyboard.
Human langauge is rich with accented and extended characters,  there are also many reserved characters  in html too, and it defines many entities to use to insert special  characters in an html document.

The simplest analogy of character  encoding is that it basically maps the binary data to their proper character  equivalents.
An example of this would be that in a standard, U.S. English document character, 65 is matched to a capital letter A.
The American Standard Code for Information Interchange or (ASCII) coding is used in most of the English fonts  we use, this means that when we insert a capital A we can garantee that we will see a letter A in our documents.

This is of coures if the document is encoded as English and the font  that has been specified is also encoded as English etc.
The documents encoding is usually passed to the user agent (browser), in the HTTP header,  unfortunately some browsers dont know how to handle the HTTP headers,  and incorrectly handle the header.
This is why is is always a good idea to include meta tags  in the head section  of an html document to explicitly declare the documents encoding like the example meta tag below.

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

I bet your now wondering, "what if somebody veiws my document in Japan or China or Russia ?"
This is where the encoding does it's job and helps to ensure that the correct equivalent characters  are used when the document is displayed/viewed.
With most fonts an international character set is encoded into them as well as the native character set of the font.

The browser will try to use the appropriate characters in the appropriate font  where none-native character sets are specified in a document, if it cannot find the appropriate font  it will use an alternate font  or the current if it cannot find an appropriate alternative.

However if the document does not declare it's encoding then none of this can be accomplished, and then the browser will simply use the character that corresponds to the character position arriving in the data stream.
Which is not good and could have the following results, for example a capital A will then be translated as whatever the 65 th character in the font the user agent is using.

You can find a complete list of all of the 7–bit ASCII characters used in html–xhtml documents.

Google
 
Top of page
Valid XHTML1.0 Transitional. Valid CSS. Copyright © 2005 –
www.syntaxsandbox.co.uk