Sandia National Laboratories

HTML Reference Manual
Last Modified: 2 January 1996
Copyright 1996. All rights retained by Sandia National Laboratories. (The date is correct, this is over a year old.)


Introduction to HTML
Sandia Requirements
General HTML Syntax
HTML Comments
HTML Elements and list of elements
Uniform Resource Locator (URL)
Special Characters
Internal Icons

Introduction to HTML

HTML is an evolving language which is used to construct documents which can be viewed by World Wide Web browsers. HTML has been standardized by the WWW consortium as the IETF RFC 1866, commonly referred to as HTML Version 2. However, the RFC is forming the base for further work in extending and enhancing the standard, and various user agent vendors are already shipping products with extensions beyond this RFC. RFC 1866 also declares some earlier HTML elements obsolete. There is active work on-going to standardize specifications of many enhancements to RFC 1866 on a wide variety of topics. These topics include (in no special order):
This is very close to consensus, with only a few areas of contention. It includes significant changes/enhancements from existing implementations of tables, yet does not go as far in providing table layout and presentation control as some wanted. Future enhancements to the area of tables are expected.
This involves HTML elements and attributes to ease the use of non-english languages in web pages. Note that this is trying to address not only the problem of right-to-left languages but also non-english alphabets/scripts. This is commonly referred to as the i18n proposal.
file upload
This is designed to allow a FORM to send a local file as part of a FORM submission, so long as the user actively concurs in the action. This has already reached consensus as RFC 1867, but needs to be merged with RFC 1866.
Much work is on-going to merge these two mechanisms to invoke/include some non-HTML at a spot on an HTML page. The desire is to allow/standardize things like JAVA(tm) but to develop a much more generalized HTML markup mechanism. I expect the markup of this whole area to dramatically change in the near future.
link types
This is to deal with the whole area of REL/REV. Work in this area seems to be slow to come to a proposal, but a low level of discussion continues in this area.
client-side image maps
This is to define areas of an IMG which, if selected/clicked, will invoke identified hyperlinks. This capability already exists by placing an IMG with an ISMAP attribute within the contents of an A element which points to a program which will accept the coordinates selected and take action. This proposal is to define the elements/attributes that will allow all this to be specified in the actual HTML document without need for a responding program. A proposal exists in this area, which is included in this reference manual, but little further work is apparent in this area at this time even though many browsers are implementing enhancements which provide this.
style sheets
A full conference on this topic was recently held in France. While some of the implementation details are still in hot debate, the basic concept of providing some linkage from an HTML page to style sheets to specify presentation is completely accepted. I predict that most vendors will let the standards group get closer to final agreement before they begin to offer this capability.
There have been preliminary proposals in this area, and discussion sporadically continues, but I have not observed any significant progress towards a formal proposal or consensus on HTML markup for math formulae. At present users are including these as images.
Some of these enhancements were packaged earlier as a single proposal known as Version 3. However, the proposals have been split into separate topics for ease of discussion in the standards group and the package known as Version 3 has currently been withdrawn. HTML+ was a preliminary draft of the proposals known as Version 3. There is now some talk of interim Versions 2.x to standardize the various separate topics as they reach consensus. It is unclear when, or if, there will be a sufficient collection of enhancements to be (again) called Version 3.

This Reference Manual reflects the official RFC as well as two kinds of major enhancements/extensions known to the author: those implemented on user agents currently actively used at Sandia Labs, and those major proposals in the standards group which are, in the opinion of the author, nearing stability and consensus. While this may introduce confusion by describing non-existent features, or features available only in one vendor's product, there is value in being aware of the proposed directions in this language. For further information about HTML and the continuing standards process, see Acknowledgements.

Not all current browsers (user agents) implement all features of RFC 1866. In addition, some browsers define and handle their own extensions. A browser is supposed to ignore any element or attribute of the HTML language that it is not designed to handle.


RFC 1866 defines two subsets of HTML: recommended and deprecated. The recommended subset consist of only those features of the language that are unlikely to compromise the structural integrity of a document. The deprecated subset includes certain features of the language necessary for backward compatibility, but are tended to be used and implemented inconsistently, and their use is discouraged. As of the IETF HTML BOF of 26 July 1994, five levels of conformance of the HTML Language elements were defined. RFC 1866 explicitly refers to level 1 and level 2, and implies level 0. RFC 1866 does not currently include anything from level 3 or 4. User agents (browsers) can choose (and declare) what level of conformance they fulfill.
Level 0
Mandatory. Heading, lists, anchors, etc.
(A text only browser is expected to have Level 0 conformance.)
Level 1
Images, Emphasis, Text highlighting
Level 2
Forms, Character Definitions
Level 3
Tables, Figures, etc.
(proposed as extensions to RFC 1866)
Level 4
Mathematical formulae
(proposed as an extension to RFC 1866, not existing practice, not yet included in this Reference Manual)

Sandia Requirements

The Integrated Information Services Division of Sandia National Laboratories has established a set of requirements for HTML documents prepared at Sandia National Laboratores. These requirements are an attempt to maintain document consistency, provide some configuration management hooks, automatically insert links between documents and identify document ownership within Sandia's Internal Web. For individuals with access to Sandia's Internal Web, a more complete specification of those requirements is available. In addition to requirements, that specification also references documents with style guide suggestions for Sandia HTML authors. While this Reference Manual attempts to identify the Sandia requirements, for any conflicts, the specification page on the Internal Web takes precedence.

In addition to the SGML identification line described in General HTML Syntax, and the HTML elements and attributes which are required (see ADDRESS, BODY, HEAD, HTML, IMG, META, LINK, TITLE), all documents shall have a "point-of-contact" name identified in the document as part of the ADDRESS element that is e-mail enabled by surrounding the name with the <A> element. Sandia authors should use Sandia's cgi-bin e-mail program to generate a mail message submission form by setting the HREF attribute of the A element to one of the addresses identified in the Sandia specification document. Until all the browsers which the CIO organization of Sandia suppports, and for which Sandia has a site license, accept the newly emerging mailto: URL access code, Sandia authors of documents on our Internal web are discouraged from setting the HREF attribute of the A element to "" This mechanism is acceptable for documents on the external web, but authors should also display the e-mail address of the "point-of-contact" name so that manually initiated e-mail is possible.

All Sandia documents shall also display a line indicating the date the document was last modified. This is usually placed with the ADDRESS element at either the beginning or end of the document. Recommended format is: Last Modified: 18 Apr 1995

Any Sandia document which is part of a set of HTML subdocuments which form a sequence or hierarchy should include two specific LINK elements identifying the REL values of next and previous. In addition, the author should also place icons or text enabled with the <A> element at the beginning and/or end of the BODY section to make it easy to jump to the document next or previous in the sequence. Only one next and one previous relationship may be specified in a document.

General HTML Syntax

HTML is an application of ISO Standard 8879:1986 -- Standard Generalized Markup Language (SGML). SGML is a system for defining structured document types, and markup languages to represent instances of those document types. HTML is one such markup language. Its syntax follows the syntax of SGML. An HTML file is an file of text whose character set includes [ISO-8859-1] and agrees with [ISO-10646]. This is often commonly refered to as an ASCII file of text. This document text also includes instructions to a user agent (often called a browser) mostly about displaying the text. To formally identify the file as containing HTML elements, the beginning of the file should contain a line in SGML syntax identifying the version of HTML being used. For RFC 1866 HTML Version 2.0, this line is
Sandia Requirements
The SGML specification line identifying the file as HTML is required before the <HTML> element in the file. It should either specify the version number of HTML to which the file conforms by using a line like the one above, or simply specify that the file is HTML by using the line:
Unless explicitly instructed otherwise, the browser normally treats all white space that is not a blank (e.g. tabs, end-of-line characters, etc.) as a single blank, and collapses multiple white space to a single blank. Browsers are inconsistent in whether they collapse multiple line breaks (such as defined by multiple BR or P elements) into a single line break, but a strict interpretation of the standard would seem to imply that they should.

The browser instructions consist of HTML elements which are HTML element names, optionally followed by HTML attributes, all surrounded by "<" and ">" Attributes may be followed by an equals sign and some "value". Some attributes do not define values. Others take a default if no value is specified. White space is allowed around the equals sign. RFC 1866 requires that the "value" must be surrounded by single or double quotes, unless the "value" is one of a fixed list of name tokens. It appears that using quotes is always acceptable, but their absence is sometimes a problem with some browsers. The maximum length of "value" (after parsing) is defined as 1024 characters. Most HTML elements are designed to surround some content, usually text, and thus have both a beginning and ending tag. The ending tag is simply the element name preceded by a "/" all surrounded by "<" and ">" HTML names must begin with a letter and are followed by up to 71 letters, digits, periods, or hyphens. The standard generally describes the HTML names in upper case, but (except for Special Characters) browsers are supposed to ignore the case of HTML names.

Proposals to extend RFC 1866 define a few attributes to exist on almost every element. The elements LANG and DIR are introduced by the internationalization proposal and are designed to deal with alternative languages. The syntax and registry of LANG values is defined by RFC 1766, which is in review for consideration of changing language codes from two characters to three characters. The values of DIR are proposed as ltr and rtl which would inform the browser that the language direction of the content is either left-to-right or right-to-left. The elements ID, MD and CLASS are designed for use with style sheets. The exact details of this topic are still being discussed and are subject to change.

HTML Comments

To include comments in an HTML document that will be ignored by the HTML browser, surround them with <!-- and -->. After the beginning comment delimiter, all text up to the next occurrence of --> is ignored. Hence comments cannot be nested. However, multiple comments may be included within the <! and > delimiters. Each comment starts with -- and includes all text up to and including the next occurrence of --. White space is allowed between the closing -- and >, but not between the opening <! and the first --.
Minimum Attributes
<!--characters... -->
Common Usages
Not all browsers will properly handle HTML elements (i.e. an "<") inside of a comment. Some documents expect any text that begins with "<!" to be treated as a comment and do not include the two dashes at the beginning "<!--" or the end "-->" of the commented text. RFC 1866 specifies the double dash at beginning and end as required. A document using this syntax of missing double dash is actually coding invalid SGML. Some browsers will simply ignore these invalid and unrecognized SGML commands, and thereby give the (misleading) appearance of treating them as comments. Such browsers will terminate this "comment" upon the first occurance of a ">". Some browsers incorrectly prohibit the double dash "--" sequence from appearing anywhere within the commented text except at the end.

A few early browsers recognized an element <comment> </comment>. RFC 1866 does not include this element, therefore this element should be considered obsolete.

There is a proposal to add the SGML comment syntax to HTML elements. This proposal would begin a comment with a double dash encountered inside any HTML element (but not inside quotes), and treat everything as comments (including any ">", "<", or quote character) until the next occuring double dash. Such syntax would not only allow comments within any HTML element, but HTML elements within a comment. This proposal is not in RFC 1866.

HTML Elements

The complete description of all RFC 1866 HTML elements and all their attributes, as well as many enhancements, are maintained in a separate Web document to keep the size of that document as small as possible.
Sandia Requirements
Sandia HTML developers have specifically requested that this reference to HTML elements be formatted as a single document. This permits them to access the document once while doing document development, and use the scroll bar to get to all the element descriptions without need for multiple retrievals. For browsers which cache files, the internal document hyperlinks can also be used without need for a further retrievals.

Uniform Resource Locator (URL)

Uniform Resource Locators (URL) or Uniform Resource Identifiers (URI) let you specify how to reach an Internet resource. They consist of four basic parts: aaaa://bbb.bbb.bbb/ccc/ccc/ccc?ddd
aaaa: The access method.
This specifies the mechanism to be used by the browser to communicate with the resource. Mechanism codes must be registered to be widely recognized. An X500 mechanism, as well as WHOIS, and Network Database mechanisms are under study. Currently recogized mechanisms include:
http: HyperText Transfer Protocol
This is the most commonly used access method. It requires a program running on the destination computer that understands and responds to this protocol. The file retrieved might be an HTML file, a graphic file, a sound file, an animation sequence file, a file to be executed by the server (e.g. cgi-bin files), or a word processing file. Whether the file retrieved can be handled depends on the browser.
https: HyperText Transfer Protocol
This is a variation on the standard access method designed to provide some level of security of transmission. (ed: I am still researching/working on this definition. Anyone with more details should e-mail me the descriptive URL.)
file: Local File Access
This method causes the browser to load a file from the locally accessible disk system. This is commonly used to preview Web pages being developed on a computer that has a browser, but does not have a server.
ftp: File Transport Protocol
This method uses normal Internet FTP to retrieve a file. Most browsers will ask for a location/name on the local disk system to store the file. Some browsers will simply display a file that is text.
mailto: E-Mail Form
The argument following the access code is the destination e-mail address. If the browser understands this access code, the browser will automatically generate an input FORM for entering the e-mail message. It may also accept additional arguments for default "Subject:" etc. Most browsers now handle this access code. Note that any special characters in an e-mail address (e.g. "%") must be URL converted (e.g. "%25")
news: USENET News
Only argument following the access code is the group or article name.
nntp: Local Network News Transport Protocol
(ed: I am still researching/working on this definition.)
wais: Wide Area Information Servers
(ed: I am still researching/working on this definition.)
gopher: GOPHER
(ed: I am still researching/working on this definition.)
telnet: TELNET
The arguments following the access code are the login arguments to the telnet session as user[:password]@host.
cid: Content identifiers for MIME body part
(ed: I am still researching/working on this definition.)
mid: Message identifiers for electronic mail
(ed: I am still researching/working on this definition.)
afs: AFS File Access
(ed: I am still researching/working on this definition.)
prospero: Prospero Link
(ed: I am still researching/working on this definition.)
x-exec: Executable Program
(ed: I am still researching/working on this definition.)
//bbb.bbb.bbb The Internet node
This specifies the node on the Internet where the file is located. If not included, it defaults to the computer on which the browser is running, which is appropriate for access method file: The node may optionally be followed by a colon and the port number. Most browsers default to port 80, which is also what most servers use to reply to the browser. For access method ftp: the node name may take the form //user[:password]@host. Without a user name, the user anonymous is used.
/ccc/ccc/ccc The file path
This is the pathname of the file to be retrieved, including the directories, subdirectories, and filename. A server can specify the "root" of the directory system recognized by an access type as some subordinate directory. This restricts access to files subordinate to that directory.
?ddd Arguments
Depending upon the access method, and the file accessed, characters can follow the file name, separated by some pre-defined special character (e.g. "#", "?", "&", etc.). This information can then be used by the access method as additional arguments for the access. In the case of access method http: where the file is an executable cgi-bin program registered with the server, the arguments are passed to the program. For an HTML document, "#" identifies the fragment name internal to an HTML document and identified by the A element with the NAMEattribute where presentation will begin.
Further References
BNF Specification of URLs.

Special Characters

Minimum Attributes
The leading ampersand is required. The ampersand and semicolon delimit an entity name which the user agent will replace with a special character. The trailing semicolon is necessary when the character following the entity is not a space or end of line. It is never incorrect to include the trailing semicolon. These main four special characters are specifically included in RFC 1866.
< (less than sign)
> (greater than sign)
& (The ampersand sign itself)
" (quote character)
Some browsers always require the trailing semicolon. RFC 1866 specifies &quot; as being a double quote, but some older browsers display it as a single quote. The remaining special character entity names defined in RFC 1866, as well as some proposed special character entity names, are listed in and can be used to see what they will produce on your browser. Not all of those special characters are recognized by all browsers. All entity names are defined as case sensitive. The entity name of many of the special characters intentionally includes mixed case which must be entered exactly as specified. Since most browsers are insensitive to case for HTML names, many browsers do not require the entity names of the main four special characters to be lower case.

ISO Latin-1 character number "nn" (the number sign "#" is required)

The document displays all the ISO Latin-1 characters and can be used to see what they will produce on your browser.

RFC 1866 recommends referencing special characters with the entity names described above instead of using these numeric ISO Latin-1 code entity names.

Internal Icons

Minimum Attributes
A standard set of icon names is proposed to be recognized by the browser for use by a proposed DINGBAT attribute in the H1, H2, H3, H4, H5, H6, and UL elements or as hypertext links to an image. The browser may either display them from local code/files, or may expand them to URLs. Similar to special characters, these names are case sensitive, and are enclosed in the "&" and ";" characters.
Icon entity names are proposed as an extension to RFC 1866. Browsers are likely to provide extensions to this set of icon names. Some browsers already have internal icons but use a different method than entity names to access them. The following document lists the currently proposed icon entity names, as well as one set of known internal icons:


This document is a merging of information from a variety of other documents obtained from the WWW. Everything in this document was copied or borrowed from some other Web document, all of which I have attempted to recognize in the Acknowledgements below. While there exist a number of Beginner's Guides to writing HTML, verbal descriptions of parts of HTML, and technical Document Type Definitions (DTD) specifications of the language, I could not find a concise reference document as a Web page for the language. This reference document is an attempt to fill that gap. Judging by the thousands of sites that have accessed this document, it appears that it is filling a gap for many people. While I have made every effort to be complete concerning the language, that is always an unattained goal. The information in this document is the best I have been able to find. If I have something missing or wrong, please send me e-mail with a definitive source so that I can keep this document as current, accurate, and complete as possible.

The mention, or lack of mention, of any HTML browser product, service, or company shall not be construed as either a positive or negative endorsement of or comment about such product, service, or company. At most it expresses my personal knowledge or ignorance of the information. This document is neither intended to be nor capable of being an exhaustive discussion of browser capabilities. It is focused on Sandia's internal needs. Please also refer to the standard Sandia disclaimer of liability.


Thanks to the following sources from which I have liberally borrowed.
A repository of the RFC 1866, the official specification of HTML Version 2.0.
The general home page for HTML information at WWW.
This November 1994 alphabetical listing of HTML elements for the proposed Version 2, constructed by Dan Connolly, formed the original structure of this document. Prior to Dan Connolly's move to World Wide Web Consortium (W3C) at MIT, this reference was:
An excellent starting page for information on SGML.
A quick reference card to the HTML elements, organized by logical function. This card appears in printed form in Spinning the Web (ISBN 1-850-32141-8) by Andrew Ford, and The WorldWideWeb Handbook (ISBN 1-850-32205-8) by Peter Flynn, both published by International Thomson Computer Press.
The Library of Congress index of links about HTML.
A book on HTML Documentation by Ian Graham which I believe has by now been published. Ian has informed me (as of Jan 96) that this collection is recently updated and reorganized, and now covers additional material, including Microsoft and Netscape extensions.
Documentation of Mosiac for X version 2.0 fill-out form support.
The HTML Primer developed by NCAR.
A set of pointers to various publications about HTML including the draft 1.2 of Internet MIME Content Type (RFC 1341) which expired 13 Jan 1994. That was a description of Version 1 HTML written by Tim Berners-Lee and Daniel Connolly.
A quick reference to the "most commonly used elements" of HTML Version 1 and 2 by Michael Grobe.
The Document Type Definition (DTD) of the Version 2 specification for HTML. Prior to Dan Connolly's move to World Wide Web Consortium (W3C) at MIT, this reference was:
HTML Version 2 as of 29 Nov 1994 as an Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) written by Tim Berners-Lee and Daniel Connolly. This link no longer exists.
The text of HTML Version 2.0.
The description of the extensions to HTML used by Netscape, Version 1.0.
The description of the extensions to HTML used by Netscape, Version 1.1. and
The description of the extensions to HTML used by Netscape, Version 2.0.
The description of the extensions to HTML used by Microsoft Internet Explorer, Version 2.0. This is labeled beta documentation, and the last time I accessed it was very incomplete. Many extensions are mentioned but not documented.
This document appears to be a definitive reference to the Common Gateway Interface, which describes the structure of cgi-bin programs. These programs are required for the ISMAP attribute of the IMG element, as well as for processing FORM element input.
This is a comprehensive tutorial on imagemapping. It includes a discussion of an NCSA image map utility. This is a cgi-bin program that is useful with the ISMAP attribute on the IMG element. Earlier, but now obsolete, references to this utility included: and
HTML+ (a superset of Version 2, and the precursor to Version 3) as of 8 Nov 1993, by David Raggett. Includes some interesting possibilities.
An overview of the proposed features of Version 3, and identification of those proposals likely to be deferred until Version 3.1. Also identifies access to a test browser designed to display the new Version 3.0 features as part of the review process.
An early draft of the text for a proposed Version 3 as a (now expired) Internet RFC by David Raggett. This link is identical to
An excellent analysis of the relationship of the ISO8859 Latin-1 character set with HTML entities, including representational tables, identification of browser implementation inconsistencies and differing/invalid presenatation.
This is my personal favorite of the many documents describing how to write good HTML, including "Common Errors" and "Things to Avoid". This used to be (and still is) located at: but the new location has been updated.
The Style Guide at the World Wide Web Consortium (W3C) at MIT, written by Tim Berners-Lee.
A list of W3C Tech Reports, which currently only lists the latest proposal for standardization of tables as an extension to RFC 1866.
The extensions to HTML required to access Java(tm) applets, Version Beta 1.0. Note that the HTML tag changed from APP to APPLET from pre-beta to beta.
The site I use to retrieve internet-drafts. These include drafts of proposals being considered by the HTML Working Group developing extensions to RFC 1866. Two specific proposals reviewed for inclusion in this Reference Manual were draft-ietf-html-fileupload-03.txt and draft-ietf-html-clientsideimagemap-01.txt.

Michael J. Hannah
Sandia National Laboratories

Welcome to Sandia National Labs || DOE Home Page || Doing Business with Sandia || Search Sandia Information || Send questions about Sandia ||Sandia disclaimer of liability