|
In computing, HyperText Markup Language (HTML)
is a markup language designed for the creation of web pages and other
information viewable in a browser. HTML is used to structure
information — denoting certain text as headings, paragraphs, lists and
so on — and can be used to describe, to some degree, the appearance and
semantics of a document.
Originally
defined by Tim Berners-Lee and further developed by the IETF with a
simplified SGML syntax, HTML is now an international standard (ISO/IEC
15445:2000). Later HTML specifications are maintained by the World Wide
Web Consortium (W3C).
Early
versions of HTML were defined with looser syntactic rules which helped
its adoption by those unfamiliar with web publishing. Web browsers
commonly made assumptions about intent and proceeded with rendering of
the page. Over time, the trend in the official standards has been to
create an increasingly strict language syntax; however, browsers still
continue to render pages that are far from valid HTML.
XHTML, which
applies the stricter rules of XML to HTML to make it easier to process
and maintain, is the W3C's successor to HTML. As such, many consider
XHTML to be the "current version" of HTML, but it is a separate,
parallel standard; the W3C continues to recommend the use of either
XHTML 1.1, XHTML 1.0, or HTML 4.01 for web publishing.
Introduction
HTML is a form of markup
that is oriented toward the construction of single-page text documents
with specialized rendering software called HTML user agents, the most common example of which is a web browser.
HTML provides a means by which the document's content can be annotated
with various kinds of metadata and rendering hints. The rendering cues
may range from minor text decorations, such as specifying that a
certain word be underlined or that an image be inserted, to
sophisticated imagemaps and form definitions. The metadata may include
information about the document's title and author, structural
information such as headings, paragraphs, lists, and information that
allows the document to be linked to other documents to form a hypertext
web. HTML is a
text based format that is designed to be both readable and editable by
humans using a text editor. However, writing and updating a large
number of pages by hand in this way is time consuming, requires a good
knowledge of HTML and can make consistency difficult to maintain.
Visual HTML editors such as Macromedia Dreamweaver, Adobe GoLive or
Microsoft FrontPage allow the creation of web pages to be treated much
like word processor documents. The code generated by these programs can
be of poor quality. However, the open-source visual HTML editor Nvu
generates code of high quality.
HTML can be
generated on the fly using a server-side scripting system such as Perl,
PHP, JSP, or ASP. Many web applications like content management
systems, wikis and web forums generate HTML pages.
Version history of the standard
- Hypertext Markup
Language (First Version), published June 1993 as an Internet
Engineering Task Force (IETF) working draft (not standard).
- HTML 2.0, published November 1995 as IETF RFC 1866, and declared obsolete/historic by RFC 2854 in June 2000.
- HTML 3.2, published January 14, 1997 as a W3C Recommendation.
- HTML 4.0, published December 18, 1997 as a W3C Recommendation.
- HTML 4.01, published December 24, 1999 as a W3C Recommendation.
- ISO/IEC 15445:2000 ("ISO HTML", based on HTML 4.01 Strict), published May 15, 2000 as an ISO/IEC international standard.
- XHTML 1.0, published January 26, 2000 as a W3C Recommendation, later revised and republished August 1, 2002.
There is no official
standard HTML 1.0 specification because there were multiple informal
HTML standards at the time. However, some people consider the initial
edition provided by Tim Berners-Lee to be the definitive HTML 1.0. That
version did not include an IMG element type. Work on a successor for
HTML, then called "HTML+", began in late 1993, designed originally to
be "A superset of HTML…which will allow a gradual rollover from the
previous format of HTML". The first formal specification was therefore
given the version number 2.0 in order to distinguish it from these
unofficial "standards". Work on HTML+ continued, but it never became a
standard. The
HTML 3.0 standard was proposed by the newly formed W3C in March 1995,
and provided many new capabilities such as support for tables, text
flow around figures, and the display of complex math elements. Even
though it was designed to be compatible with HTML 2.0, it was too
complex at the time to be implemented, and when the draft expired in
September 1995 work in this direction was discontinued due to lack of
browser support. HTML 3.1 was never officially proposed, and the next
standard proposal was HTML 3.2 (code-named "Wilbur"), which dropped the
majority of the new features in HTML 3.0 and instead adopted many
browser-specific element types and attributes which had been created
for the Netscape and Mosaic web browsers. Math support as proposed by
HTML 3.0 finally came about years later with a different standard,
MathML.
HTML 4.0
likewise adopted many browser-specific element types and attributes,
but at the same time began to try to "clean up" the standard by marking
some of them as deprecated, and suggesting they not be used.
Minor editorial revisions to the HTML 4.0 specification were published as HTML 4.01.
The most common extension for files containing HTML is .html, however, older operating systems, such as DOS, limit file extensions to three letters, so a .htm extension is also used. Although perhaps less common now, the shorter form is still widely supported by current software.
Markup element types
Below are the kinds of markup element types in HTML.
- Structural markup. Describes the purpose of text. For example,
<h2>Golf</h2>
- directs the
browser to render "Golf" as a second-level heading, similar to "Markup
element types" at the start of this section. Structural markup does not
denote any specific rendering, but most web browsers have standardised
on how elements should be formatted. For example, by default, headings
like these will appear in large, bold text. Further styling should be
done with Cascading Style Sheets (CSS).
- Presentational markup. Describes the appearance of the text, regardless of its function. For example,
<b>boldface</b>
- will render "boldface" in bold
text. In the majority of cases, using presentational markup is
inappropriate, and presentation should be controlled by using CSS. In
the case of both
<b>bold</b> and <i>italic</i> there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely <strong>strong emphasis</strong> and <em>emphasis</em> respectively. It is easier to see how an aural user agent should interpret the latter two elements.
- Hypertext markup. Links parts of the document to other documents. For example,
<a href="http://wikipedia.org/">Wikipedia</a>
- will render the word Wikipedia as a hyperlink to the specified URL.
The Document Type Definition
In order to specify
which version of the HTML standard they conform to, all HTML documents
should start with a Document Type Declaration (informally, a
"DOCTYPE"), which makes reference to a Document Type Definition (DTD).
For example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
This declaration asserts
that the document conforms to the Strict DTD of HTML 4.01, which is
purely structural, leaving formatting to Cascading Style Sheets. In
some cases, the presence or absence of an appropriate DTD may influence
how a web browser will display the page. In
addition to the Strict DTD, HTML 4.01 provides Transitional and
Frameset DTDs. The Transitional DTD was intended to gradually phase in
the changes made in the Strict DTD, while the Frameset DTD was intended
for those documents which contained frames.
Separation of style and content
Efforts of the web
development community have led to a new thinking in the way a web
document should be written; XHTML epitomizes this effort. Standards
stress using markup which suggests the structure of the document, like
headings, paragraphs, block quoted text, and tables, instead of using
markup which is written for visual purposes only, like <font>,
<b> (bold), and <i> (italics). Some of these elements are
not permitted in certain varieties of HTML, like HTML 4.01 Strict. CSS
provides a way to separate the HTML structure from the content's
presentation, by keeping all code dealing with presentation defined in
a CSS file. See separation of style and content.
Serving HTML
The World Wide Web
primarily uses HTTP to serve HTML documents to users. In order to do
this correctly, it is necessary for the document to be described
correctly: the necessary metadata includes the MIME Type (typically
"text/html", although other choices include "application/xhtml+xml")
and the character encoding (see Character encodings in HTML).
HTML Email
HTML is also used in
email messages. Many email clients include a GUI HTML editor for
composing emails and a rendering engine for displaying them once
received. Use of HTML in email is quite controversial due to a variety
of issues. The main benefit is the ability to decorate an email with
presentational attributes (bold headings etc). However, there are a
number of disadvantages, which include:
- the recipient may not have an email client that can display HTML
- the email has larger size because lots of formatting will be
much larger than the plain text equivalent. This issue is made slightly
worse by the fact that, for compatibility, most clients send a
plaintext version as well.
- overuse of formatting (there was at one stage a craze for
making letterheads using HTML and sending them as part of every e-mail)
- potential security issues of deluding the recipient to
accept an email as being from an authoriative source (such as a bank)
when this is not the case; this is related to phishing scams.
- potential security issues of simply rendering a complex format like HTML.
For these reasons many
mailing lists deliberately block HTML email either stripping out the
HTML part to just leave the plain text part or rejecting the entire
message.
|