LearningHTML

On this page I keep notes about HTML, the Hyper Text Markup Language - mostly about its syntax and semantics, and mostly about HTML5. I taught myself the basics sometime in the 1990ies, and have been slightly in touch on and off again. I'm starting this page now that I plan to get an education in web programming.

The next learning page after this one is LearningCSS.

References

Latest HTML specification, which as of writing this is HTML 5.1
HTML5 tutorial @ w3schools.com
Learning Web Design - A Beginner's Guide to HTML, CSS, JavaScript, and Web Graphics. 4th edition, August 2012. Jennifer Niederst Robbins. O'Reilly. ISBN 978-1-449-31927-4. On this page I refer to this book with [LearningWebDesign].
HTML5 & CSS3 For the Real World. 2nd edition, 2015. Alexis Goldstein, Louis Lazaris, Estelle Weyl. SitePoint. ISBN 978-0-9874674-8-5. On this page I refer to this book with [HtmlForTheRealWorld].

Alphabetical list of all HTML elements
Character references by the Web Standards Project
Browser Support:
- http://caniuse.com/
- http://html5please.com/

Glossary

DOM: Document Object Model
DTD: Document Type Definition. Wikipedia link. A DTD is a markup vocabulary that defines how documents of a specific markup language must look like. DTD is a concept of SGML. HTML5 has no DTD because HTML5 is not based on SGML.
Element: An element of the markup language that has a certain semantic meaning. Example: div. Note the absence of angle brackets - putting angle brackets around an element name creates a tag (cf. there).
HTML: Hyper Text Markup Language. Wikipedia link.
HTML 4.x: Version 4 of the HTML language. Version 4.0 was published in 1997, version 4.01 was published in 1999. All versions of HTML up until 4.01 are an application of SGML.
HTML5: Version 5 of the HTML language (Wikipedia link). The original version of HTML5 (5.0 so-to-speak) was published in 2014, version 5.1 was published in 2016. According to Wikipedia, HTML5 is no longer based on SGML.
Page: A page, or web page, is frequently used to refer to a single HTML document and all of the external resources that it references (e.g. style sheets, images), typically displayed by a web browser after fetching everthing from the Internet
Responsive: A web page is said to be responsive when it is designed to work with all sorts of devices, from desktop computers to tablets to smart phones, by adapting (responding) to the requirements and constraints of each device. A typical constraint is the device's screen size. If there are different versions of a web page, e.g. one for desktop computers and one for mobile devices, then the web page is not responsive.
SGML: Standard Generalized Markup Language. Wikipedia link.
Tag: A tag is a markup construct that begins with < and ends with >. An HTML element typically is represented by an opening and a corresponding closing tag, e.g. <div> and </div>. Sometimes opening and closing tags are contracted into a special form, the empty-element tag. Example: <br />.
XML: Extended Markup Language. Wikipedia link.
Web, the Web: Short name for the World Wide Web. A lot, if not most of the content of the web is made of HTML pages.
URL: Uniform Resource Locator. An address to a resource that can be found on the Internet. Example: http://www.example.com/foo/bar.html
XHTML: Extensible Hyper Text Markup Language. Wikipedia link. This version of HTML is based on XML and is more restrictive in its syntax than the regular HTML (which is based on SGML). XHTML documents must be well-formed and can be processed by any XML parser, while regular HTML documents usually cannot be processed by an XML parser because they are allowed to have markup that does not conform to the "well-formed" requirement of XML (e.g. some elements do not have to have closing tags).
XHTML 1.x: Version 1.0 was published in 2000 as an official recommendation of the W3C. It is based on HTML 4. Version 1.1 was published in 2001.
XHTML 2.0: Development on this version was abandoned in favour of HTML5 and XHTML5.
XHTML5: XHTML5 is based on HTML5. As of writing this, no version of XHTML5 has been officially published yet as a recommendation of the W3C.

Syntax

HTML vs. XML

As mentioned in the glossary, XHTML is a version of HTML that is based on XML and is more restrictive in its syntax than regular HTML. Notably, the document must be well-formed (according to the XML definition of "well-formed"), which includes

All elements must appear in the document with a closing tag, or use a special tag type called empty-element tag which denotes both opening and closing at the same time (e.g. <br />).
All element names must be in lowercase
All attributes must have explicit values

In comparison to XHTML, regular HTML is more lenient in its syntax:

Some elements do not need to have a closing tag. TODO: Which ones? At the time that I'm writing this, the W3C Validator declares even documents with missing </body> tag as valid - in fact, I can even have tags with arbitrary element names in the document and it is still accepted as valid!
Element names can be uppercase, lowercase or mixed case - it doesn't matter
Some boolean attributes can be specified with just their name. Examples: checked, selected, multiple. This is called "attribute minimization", which is an SGML practice. In XHTML these attributes must be written out explicitly, with the attribute value being the same as the attribute name.

Note: This StackOverflow post shows a solution how the validator.nu online validator can be persuaded to perform validation with stricter rules. The trick is to add a namespace to the html element (<html xmlns="http://www.w3.org/1999/xhtml">) and to select the validator preset "XHTML + SVG 1.1 + MathML 3.0".

DTD

HTML5 is not based on SGML and therefore has no DTD (Document Type Definition).

HTML5 still requires a bare-bones Document Type Declaration (DOCTYPE declaration). Every HTML5 document must begin with the following line:

<! DOCTYPE html>

Whitespace

Spaces, tabs, newlines, carriage returns - all these are summarily called "whitespace". Web browsers do not format a document according to the whitespace that they find in a document, instead they render every type of whitespace character as a regular space character. Furthermore, web browsers contract all consecutive whitespace characters and render them as a single space character.

Comments

Comments are not rendered by web browsers at all. You can place comments inside an HTML document using the following syntax:

<!-- This is a comment text -->

`class` and `id` attributes

There are a number of global attributes that can be set on (almost) any HTML element. Two of those are of particular interest:

class: Used for classifying elements. In any given document, several elements can have the same class. An element can have several classes, in that case separate the class names with a space character (e.g. <div class="foo bar">)
id: Used for identifying a specific instance of an element. In any given document, only one element can have a specific id.

class and id are used by CSS as anchors for applying styles.

Character references

Certain characters have special meaning and cannot be used as-is in an HTML document. For instance, the character "<" is interpreted as the beginning of an opening or closing tag. Other special characters may be impossible to encode using a document's encoding.

Characters such as these must be "escaped" by writing them in the form of a character reference. The format of a character reference is this:

&foo;

Instead of "foo" you write the reference to the desired character. A character can be referenced in one of two ways:

By using the character's name (e.g. the copyright sign © = ©)
By using the character's numeric value (e.g. the copyright sign © = &169;)

Predefined sets exist both for character names and for numeric values. The "Web Standards Project" (see the References section) has a list for both.

URLs

A few special types of URLs may be used in anchor elements:

#foo: Links to an element in the same document that has the id attribute set to the value "foo"
mailto:foo@bar.com: Causes a mail client to open with a "new message" window open and the email address set as the message receiver.
tel:123456789: Causes a telephony client to open and to dial the specified number. Usually the user must confirm that the call should be made. Best practice is to specify the number in international format. Some browsers, especially on smart phones, attempt to auto-detect telephone numbers, but if a document contains long'ish number sequences the auto-detect routines may generate false positives. Auto-detection can be prevented by specifying this in the document header: <meta name="format-detection" content="telephone=no" />

Semantics

Semantics vs. Presentation

TODO: Mention the role of CSS, and that some traditional elements in HTML 4.01 and XHTML 1.0 are presentational in nature (e.g. <font>, <i> and <center>), but those are illegal in HTML5.

Basic structure of an HTML5 document

This is how a minimal HTML5 document looks like:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>Document title</title>
  </head>

  <body>
  </body>
</html>

Notes:

The lang attribute is really optional
The meta element that specifies the character set is recommended and should appear before any content-based elements, such as title
The title element is the only thing that is mandatory in the document head section

External resources

TODO: Style sheets etc. and how they are referenced

The DOM

TODO

Element types

Block vs. inline elements

HTML elements are displayed either as block elements or as inline elements:

Browsers treat block elements as though they are in little rectangular boxes, stacked up within the page. Block elements begin on a new line, and typically some space is also added above and below the element. Examples of block elements: Headings, paragraphs.
Inline elements do not break the flow of the text. HTML5 calls inline elements "text-level semantic elements". Examples of inline elements: em.

Whether an element is displayed as a block or inline element can be controlled with the CSS property display. See the LearningCSS page for details.

Metadata content elements

Metadata content elements are not displayed to the user, instead the information that they contain tells the browser something or other about the page. Examples are meta, style, link and title.

Flow content elements

Flow content elements are almost all elements that can be used in the body of a page. The only elements excluded from this category are elements that have no effect on the document's flow. Examples are meta, script and link in the document's head. TODO: Examples for elements in the document's body.

FWIW, [LearningWebDesign] p. 73 and p. 76 mention that li and dd elements may contain any type of "flow element" or "flow content".

Content-grouping elements

[LearningWebDesign] p. 76 mentions that "content-grouping elements (like paragraphs)" are not allowed to appear in an dt element. Later on that same page it says that the HTML5 specs considers the following elements to "group content":

p
hr
ul, ol, dl
div
blockquote
pre
figure
figcaption

Sectioning content elements

section
article
nav
aside
TODO: Are address, header and footer also sectioning elements?

General notes:

Sectioning elements create a new item in the document outline.
A sectioning element may have its own internal heading hierarchy, regardless of its position in the parent document.

Heading content elements

Examples: h1, h2, etc.

TODO: Give an actual definition of this category.

Sectioning roots

If a heading occurs within an element that is in the category "sectioning roots", then the heading is not included in the document outline. Elements that are sectioning roots:

blockquote
figure
details
fieldset
td
body (TODO: why?)

Phrasing content elements

Phrasing content elements are approximately (but not exactly) those elements that are inline elements. Some examples:

img
em
strong
cite

TODO: Example of a phrasing content element that is not an inline element.

Embedded content elements

Examples: img, video, canvas, embed, object.

TODO: Give an actual definition of this category.

Interactive content elements

Interactive content elements are those that have a representation that the user can, in some way, interact with. Examples:

a
form
audio, but only when the "controls" attribute is present
input, but only when the "type" attribute is not set to "hidden"

Forms

Notes about forms:

The data entered by the user is encoded by the web browser before it is sent to the server. The encoding method is the same used for URLs, e.g. a space character is encoded as %20, a slash characters is encoded as %2F, etc..
The web browser sends the data to the server using a HTTP request of type GET or POST, depending on what the method attribute of the form element specifies
- POST sends the data behind the scenes to the URL that was specified in the action attribute, the browser then displays whatever is sent back by the server as the response. The user does not get to see the URL used for the POST request.
- GET tacks the data onto the URL that was specified in the action attribute in the form ?variableNameA=variableValueA&variableNameB=variableValueB&... and then sends a request to the resulting URL. The browser's address field changes as a result to this and the form data becomes visible as part of the URL. The URL can be bookmarked by the user. Typical usage of this are search engine queries.
It is best practice to wrap form controls in semantic HTML elements such as lists or div.
A form control represents a variable. The variable name is specified by the name attribute of the form control element, the variable value is the data that the user enters.

APIs

Canvas API: Adds 2D drawing
Drag and Drop API: Add drag and drop functionality to the browser
Editing API: Create text editors embedded in the browser
Geolocation API: For location-based stuff
Media Player API: Controls audio and video. Markup is done with the audio and video elements.
Offline Web Application API: Makes it possible for a web application to work even when there is no Internet connection.
Session History API: Exposes the browser history for better control over the Back button.
Web Storage API: Go beyond cookies for storing data in the client's browser cache
Web Workers API: Provides a way to run computationally complicated scripts in the background while the browser keeps a responsive UI.
Web Sockets API: Allows network traffic between client/server without the HTTP overhead

Web browsers

TODO: Write about important specifics of only the most important web browsers: Firefox, Internet Explorer, Edge, Safari, Chrome, Opera.

In Firefox 53, if a textarea form control should not contain any default text, you still must add a closing tag. If you use the empty-element syntax, Firefox will gobble up the remainder of the document as the default text of the textarea control.

HTML5 support in Internet Explorer

Old versions of Internet Explorer do not have support for HTML5. This can be fixed by adding some snippets to the header of the HTML document.

Add this snippet for all Internet Explorers that do not have support for HTML5:

<style>
section, article, nav, aside, header, footer, hgroup { display: block; }
</style>

For Internet Explorer 8 and older, the snippet above is not enough. For IE 8 and older, also add this snippet:

<!--[if lt IE 9]>
<script src="http://html5shiv.googlecode.com/svn/ trunk/html5-els.js"></script>
<![endif]-->

Ask the Duck to find the article on html5doctor.com that explains the issue.

HTML5 outlining system

[LearningWebDesign] p. 81 states that

"As of this writing, no browsers support the HTML5 outlining system, so to make your documents accessible and logically structured for all users, it is safest to use headings in descending numerical order, even within sectioning elements."

Web servers

TODO: Write about a few important things (and only a few!) that pertain to web servers and HTML. Examples:

The default file name is typically index.html, but can also be default.htm
File extensions are typically .html, sometimes .htm

Elements

Basic document structure elements

The elements that make up the basic structure of an HTML document are:

html: The root element.
head: The first of the two elements below the root element. Contains the "header" of the document, which are various things that the browser does not display (with the exception of the document title).
body: The second of the two elements below the root element. Contains the "body" of the document, which is the actual web page content that the browser displays.

Header elements

These are elements that can appear in the document header <head>:

meta: Has meta information about the document, such as the character encoding.
title: The document title. The browser typically displays this in the title of a browser tab.
style: A CSS style sheet that is embedded in the document.

Basic block elements

p: Paragraph. May contain text, images and other inline elements, but no block elements such as headings, lists or sections.

Basic inline elements

a: Anchor, or hypertext link. The href attribute is used to specify the target URL. Whatever is inside the anchor element becomes a clickable link. Typical content inside an anchor element is text or an image. In HTML5, anything can be placed within an anchor element - even block elements. In HTML 4.x, an anchor element could have only inline content. The special URL syntax "#foo" refers to an element in the same document that has the id attribute set to the value "foo". The download attribute indicates that the targeted resource should be downloaded rather than navigated to. The target attribute can be set to the value "_blank" to tell the browser to open the referenced document in a new tab or window. Any other value than "_blank" will attempt to find a window that already exists with that name and open the referenced document in that window. Opening in a new window may fail if the browser is set to block popup windows.
abbr: Abbreviation or acronyms. The title attribute is used to provide the long version of the abbreviation or acronym.
br: Line break. This is an empty element.
cite: Citation. A reference to the title of a work, such as a book title.
dfn: The first and defining instance of a word or phrase in a document.
em: Emphasized text. Think of a sentence where the emphasis on a given word can change the sentence's meaning.
mark: Highlighted text of some sort. Example: Search terms.
strong: Important text. Semantically, "importance" is not the same as "emphasis". Note that strong elements can be nested to provide even more importance to something that is already important.
i: Text that is in alternate voice or mood, or otherwise offset from the surrounding text in a manner that indicates a different quality of text. Useful examples for this vague definition: The scientific name of a species, or a phrase from another language (reductio ad absurdum), or a voice sounding over the telephone in a piece of fiction. Note that before HTML5 the i element was used for giving typesetting instructions ("italics") - in HTML5 a text marked up with i may still be displayed in italics, but that is pure coincidence :-).
b: Text to which "attention is being drawn for utilitarian purposes without conveying extra importance, and with no implication of an alternate voice or mood". Somewhat useful examples for this vague definition: Keywords in a document abstract, product names in a review, article lead paragraph (aka "lede"). Note that before HTML5 the b element was used for giving typesetting instructions ("bold") - in HTML5 a text marked up with b may still be displayed in bold, but that is pure coincidence :-).
sub: Subscript.
sup: Superscript.
small: Side comments such as a copyright notice at the bottom of a page.
time: Date and/or time information for which a machine-readable form can be established. Typically, the element content is the date/time information in human readable format and the element has a datetime attribute that specifies the date/time in machine-readable form. If the datetime attribute is missing, then the element content itself must be machine-readable. A large number of machine-readable formats are allowed, too many to list here so if in doubt consult the HTML5 specs. Note: The date must be a date on the Gregorian calendar, which means that BC dates are not possible. TODO: The HTML5 specs state that "For times without dates (or times referring to events that recur on multiple dates), specifying the geographic location that controls the time is usually more useful than specifying a time zone offset, because geographic locations change time zone offsets with daylight savings time." How can I specify the geographic location?
wbr: Indicates a word/line break opportunity. This is an empty element.

Heading elements

h1 - h6: Level 1-6 headings
hgroup: Used to group heading elements together. This is used to suppress subheadings from the document outline (e.g. a table of contents) that a browser may generate for a document. The purpose is simple: Subheadings or taglines that merely have a clarifying function in reference to their parent heading should not introduce a new level to the document outline. Adding any number of heading elements to a hgroup element causes the browser to only include the highest-ranked heading within the group in the document outline. Note: The future of the hgroup element is uncertain. The W3C specs have dropped the element entirely, the WHATWG specs still mention it, but according to [HtmlForTheRealWorld] p. 42 browsers do not really support the element.

List and list item elements

ul: Unordered list. Despite the name, the browser displays list items exactly in the order in which they appear in the document - it's only the elements' semantics that say that the reader is not supposed to pay any attention to the list item order. Unordered list items are typically displayed with bullet points to prevent the reader from adding any significance to the order in which the items appear.
ol: Ordered list. List items are typically displayed with increasing numbers to add signifcance to the order in which the items appear. By default the first list item gets number 1, but this can be changed by setting the list element's start attribute to any number. The reversed attribute specifies that the list items should be displayed in reverse order.
li: List item in an unordered or ordered list.
dl: Description list. List items consist of two things: A name and a value.
dt, dd: Appear together, usually first dt then dd, to make up a list item in a description list. dt defines the name of the item, dd the value of the item. Because description lists are extremely well suited for creating glossaries, dt can be remembered to define the "term" and dd the "definition" of a glossary item. It is possible to list multiple terms (dt) and then have a single common definition (dd) for all of the terms, or to have a single term with multiple definitions, or even both.

General notes:

Both list elements and list item elements are block elements
A list element may contain only list item elements
A list item element may contain any type of flow element, including another nested list. The exception is the dt element which may not contain "headings or content-grouping elements (like paragraphs)" (cf. the "Learning Web Design" book).

Table elements

table: The table itself. Tables are made up of rows.
tr: A table row. Rows are made up of table header cells and/or table data cells.
th: A table header cell. The rowspan and colspan attributes define if a cell spans more than 1 row or column. The scope attribute can be used to explicitly state what the header cell is associated with. Possible values are "row", "column", "rowgroup" or "colgroup", meaning that the header cell is either a header for a single row, single column, a group of rows (thead, tfoot, tbody - see below) or a group of columns (colgroup - see below). If scope is not sufficient (e.g. in a table where a lot of column or row spanning is going on), the table header cell can be given an id attribute and all table data cells must then reference back to the header cell using the headers attribute, specifying the header cell's id as the attribute value.
td: A table data cell. Row/column spanning is done exactly the same as with table header cells.
caption: The table caption.
thead, tfoot, tbody: These elements group one or more table rows (tr elements) together. Semantically, the rows are thus placed in the table header, the table footer or the table body.
col: A table column. Table column elements can only appear within a column group element. A table column element may represent several columns, the number is specified with the span attribute. Table columns are enumerated in the order which corresponds to their placement in the table, i.e. the first col element describes the first column, the second col element describes the second column, etc.. The class attribute can be used to provide a CSS styling anchor.
colgroup: This element groups one or more columns together. It is possible to have any number of column groups, because column groups do not have any special semantic meaning (unlike row groups, which semantically denote table header, footer and body). A table column group element either contains col elements, or specifies the number of columns it represents using the span attribute. As with columns, the id or class attributes are used to provide styling anchors.

Note: The order in which elements appear in a table is important:

0-1 caption
0-n colgroup
0-1 thead
0-1 tbody or 1-n tr
0-1 tfoot

Instead of at the end, the tfoot element may also appar right after the thead element.

Quotation elements

blockquote: Long quotation. This is a content-grouping element. It is recommended (but not required) that content within blockquote be contained in other elements such as paragraphs, headings or lists.
q: Short quotation. This is an inline element. Note: Browsers automatically put quotation marks around text marked up with q.

Figure elements

figure: A figure that illustrates some other content in the document. Both the figure and the content it illustrates should be self-contained units that can be separated from each other and that do not have to appear together in the flow of the document. The figure may be an image (img element), but can also be a text snippet of some sort.
figcaption: The caption of a figure. This element must appear inside a figure element.

Page organization elements

section: Thematic group of content. A section usually contains a heading. Sections are useful to divide an entire document, or an article. TODO: Any restrictions on where a section can be used?
article: Self-contained work that could be reused in some other context.
aside: Content that is related but tangential to the surrounding content. Think of it as a "sidebar".
header: Introductory material of the entire document, or of a section or article. There are no restrictions on what this may contain. The surrounding content defines what the introductory material refers to.
footer: End material of the entire document, or of a section or article. There are no restrictions on what this may contain. The surrounding content defines what the end material refers to.
address: Address of the author of the entire document, or of a section or article. This is not used for arbitrary kinds of addresses, the semantic meaning is specifically intended for author contact information.
nav: Content that is used to provide primary navigation of a site or a lengthy article. The nav element typically contains a elements.
div: This is a generic block element.

Program code inline elements

The following elements are all inline elements.

code: A small fragment of computer code. Examples: A file name, an XML element name, a keyword from a programming language, etc.
kbd: Keyboard. Text entered by a user. This is typically used in technical documents.
samp: Sample output of a program
var: Variable name

Measurement / gauge elements

progress: Describes the current status of a changing process. Defining the completion state is optional. The progress element supports at least these two attributes: max, value. TODO: Are there more attributes?
meter: Describes a value in a well-defined range where the minimum and maximum values are known. Example: Disk usage. Counter examples: Age, height, weight (because these have unknown maximum values). The meter element supports six attributes: min, max, value, high, low and optimum.

Form elements

form: The main element that represents a form. Forms cannot be nested, but a form can contain block elements. The action attribute specifies the URL to which the form data should be sent. The method attribute specifies the HTTP request type that should be used with the action URL. Possible values are "get" (the default) and "post".
input: Generic element that represents a form control. This is an empty element. The type attribute specifies the type of the form control. The name attribute specifies the variable name. The value attribute specifies the control's default value. The placeholder attribute is used to display a hint to the user what kind of data he is supposed to enter. The placeholder text is displayed only if the form control does not contain any data.
button: More modern variant of an input element with type "submit", "reset" or "button". The major advantage of the button element is that it can have other content besides text, for instance an image. The button element also has attributes that can be used to control form submission. The type attribute controls the behaviour of a button when it is activated. Possible values for type are "submit" (the default), "reset" and "button" and they have the same meaning as for the input element. Note: Buttons are not restricted to forms, they can be used anywhere on a a page.
textarea: Form control that represents a multiline text field. As with the input element, the textarea element has the attributes name and placeholder. The default value, however, is specified by the text content of the element. Additional attributs are rows and cols that specify the dimensions of the text field.
datalist: Enumerates values for a list of pre-defined values that the user can select from a drop-down menu in a text entry field. The user is not restricted to those values, i.e. this is not a regular combobox. Values are enumerated inside the datalist element using <option value="foo" /> elements. The datalist element has an id attribute through which it can be referenced by the input element (via the list attribute). The datalist element is defined outside the input element. Note: Apparently not all browsers support this.
select: Either a combobox or a listbox. The value of the size attribute determines which one: If the attribute is missing or has value 1 it's a combobox. All other values are a listbox and define how many rows the listbox has. Values are enumerated inside the select element using <option value="the-submit-value">The display value</option> elements. The multiple attribute enables multi-selection in a listbox. The selected attribute defines which options are selected (in a combobox only one option can ever be selected, in a listbox the multiple attribute is important). The proper attribute values for multiple and selected that must be used for XHTML are "multiple" and "selected" (duh!).
output, keygen: Somewhat esoteric and apparently poorly supported controls. See the specs for more info.

Types of form controls, i.e. these are input elements whose type attribute has been set to the corresponding value:

text: Single-line text field. This is the default type. The minlength and maxlength attributes are useful for this.
password, search, email, tel, url: Text controls that allow to enter text of various types.
submit, reset, button: Button controls. The value attribute can be used to provide a custom button text that overrides the browser default. The types "submit" and "reset" generate buttons that trigger form submission and reset, respectively. The type "button" generates buttons that don't do anything unless you add a JavaScript event handler to their click event.
radio: Radio button control. All radio buttons that have the same value for the name attribute are grouped together. The value for the value attribute must be unique, though, because the value for the selected radio button is sent to the server and identifies the user's choice. The checked attribute, if present, sets a radio button to the checked state - the proper value for the attribute that must be used for XHTML is "checked".
checkbox: Checkbox button control. The same rules apply as for radio buttons.
file: File selection control. The size attribute defines the width of the text field (if the browser displays one at all). When a file is transmitted via a form, you must use the POST method and set the form's enctype attribute to the value "multipart/form-data".
hidden: A hidden form control whose purpose is to send a pre-determined name/value pair to the server when the form is submitted.
date, time, datetime, datetime-local, month, week: Picker control for choosing from a variety of date-related values. The value attribute defines the default value.
number, range: Spinner and slider controls to pick a number. Both controls have the min and max attributes to define the available number range to pick from. The slider additionally has the step attribute which can be set to a fractional value.
color: Color picker control. Values are in hex RGB format (#RRGGBB).

Accessibility features:

label: Associates a label with a form control. Text in the label element is displayed to label the form control. If a form control is nested within a label element, the association between the two is implicit. If the form control and the label are separately placed in the document, the id attribute value of the form control must be used as the for attribute value of the label element. In that case the association between the two is explicit.
fieldset: Used to place form controls into logical groups. Browsers typically display fieldsets as a group box.
legend: Used to give a fieldset a title. The legend element must be nested within its parent fieldset element.

Generic elements

The following generic elements have no special semantic meaning and are used when no other, more specific element can be used. Generic elements are typically used in conjunction with the class and/or id attributes to provide an anchor for CSS styling.

div: This is a generic block element. Content marked up by this is conceptually related in some way. A section is an alternative to div that is slightly less generic.
span: This is a generic inline element.

Other elements

hr: Horizontal rule. A logical divider between sections or paragraphs, used on the same level as sections and paragraphs. It indicates a thematic break of some sort. Do not use this to display a horizontal line - it's better to create a border with CSS. This is an empty element.
pre: Preformatted text. Text in this element is displayed as-is, i.e. even whitespace is preserved exactly as it appears in the source document. Preformatted text is typically displayed in a constant-width font, making it ideal for source code snippets or ASCII-style diagrams.
img: An image. This is a phrasing element. This is an empty element. The src attribute is used to specify the URL of the image, the alt attribute is used to specify a replacement text that should be displayed when images are not available, and the title attribute is used to specify a tooltip text. Note that the alt attribute must be present for the document to be valid! Images are inline elements, and they are aligned with the baseline of the surrounding text. According to [LearningWebDesign] p. 123, only the GIF, JPEG and PNG image formats are supported by browsers (TODO: is this true?). The same page also says that the file extensions .gif, .jpg and .png must be used, but this is not strictly true - something else can be used as long as the web server supplies the appropriate content type when the browser requests the image. The width and height attributes can be used to specify the image size upfront - doing so may considerably speed up the layouting of the page, but at the cost of flexibility (e.g. the server may want to send differently sized images depending on whether the client is a desktop or a mobile device browser). Also, if an image does not match the predefined size, the browser will resize the image to the prescribed size, possibly causing the image to appear blurry.
picture: Apparently similar to img but lets you specify multiple image sources. Intended to help with responsive web design, e.g. define a low-res version of an image for mobile and a high-res version for desktop. TODO: Add more details. What about browser support?
details: Marks up a part of the document as hidden by default, but the user can expand the section to reveal the additional information. An optional summary/code> element inside the details element provides a short text that the browser displays even when the details section is collapsed. The open atttribute, if present, specifies that the details section should be expanded by default. Note that at the time of writing browser support for the details element is still incomplete (no support in Firefox and IE).
iframe: Embeds a separate document in the current document inside an inline frame. The frame displays scrollbars if the embedded document is too large to fit the size specified for the iframe element. Similar to an image, the src, width and height attributes are used to describe the inline frame. If a browser does not support inline frames, it will display the content of the element instead (i.e. there's no alt attribute).

Global attributes

accesskey
Used to assign a keyboard shortcut to an element, typically a form control. When the user hits the keyboard shortcut the element is "activated", i.e. it gets the focus. Valid shortcuts are single characters. Example: accesskey="c".
class, id
Assigns a class or an identifier to an element. These are typically used as anchors for CSS styling. The identifier is sometimes also used to reference one element from another (e.g. labels reference form controls via their id).
contextmenu
Associates a context menu with an element. The attribute value is the identifier of the menu element. TODO: What is the menu element?
draggable, dropzone
Assign drag & drop capabilities to parts of the document. TODO: How does this work?
hidden
If present, an element and its descendants are not rendered.
lang
Two-letter language code of an element (ISO 639-2)
style
Semicolon-separated style rules
tabindex
Location of an element in the tab order of the document. Value -1 removes the element from the tab order.
title
Title of an element, typically displayed as a tooltip

LearningHTML

References

Glossary

Syntax

HTML vs. XML

DTD

Whitespace

Comments

class and id attributes

Character references

URLs

Semantics

Semantics vs. Presentation

Basic structure of an HTML5 document

External resources

The DOM

Element types

Block vs. inline elements

Metadata content elements

Flow content elements

Content-grouping elements

Sectioning content elements

Heading content elements

Sectioning roots

Phrasing content elements

Embedded content elements

Interactive content elements

Forms

APIs

Web browsers

HTML5 support in Internet Explorer

HTML5 outlining system

Web servers

Elements

Basic document structure elements

Header elements

Basic block elements

Basic inline elements

Heading elements

List and list item elements

Table elements

Quotation elements

Figure elements

Page organization elements

Program code inline elements

Measurement / gauge elements

Form elements

Generic elements

Other elements

Global attributes

Navigation menu

Search

`class` and `id` attributes