LearningHTML
On this page I keep notes about HTML, the Hyper Text Markup Language - mostly about its syntax and semantics, and mostly about HTML5. I taught myself the basics sometime in the 1990ies, and have been slightly in touch on and off again. I'm starting this page now that I plan to get an education in web programming.
The next learning page after this one is LearningCSS.
References
- Latest HTML specification, which as of writing this is HTML 5.1
- HTML5 tutorial @ w3schools.com
- Learning Web Design - A Beginner's Guide to HTML, CSS, JavaScript, and Web Graphics. 4th edition, August 2012. Jennifer Niederst Robbins. O'Reilly. ISBN 978-1-449-31927-4. On this page I refer to this book with [LearningWebDesign].
- HTML5 & CSS3 For the Real World. 2nd edition, 2015. Alexis Goldstein, Louis Lazaris, Estelle Weyl. SitePoint. ISBN 978-0-9874674-8-5. On this page I refer to this book with [HtmlForTheRealWorld].
- Alphabetical list of all HTML elements
- Character references by the Web Standards Project
- Browser Support:
Glossary
- DOM
- Document Object Model
- DTD
- Document Type Definition. Wikipedia link. A DTD is a markup vocabulary that defines how documents of a specific markup language must look like. DTD is a concept of SGML. HTML5 has no DTD because HTML5 is not based on SGML.
- Element
- An element of the markup language that has a certain semantic meaning. Example:
div
. Note the absence of angle brackets - putting angle brackets around an element name creates a tag (cf. there). - HTML
- Hyper Text Markup Language. Wikipedia link.
- HTML 4.x
- Version 4 of the HTML language. Version 4.0 was published in 1997, version 4.01 was published in 1999. All versions of HTML up until 4.01 are an application of SGML.
- HTML5
- Version 5 of the HTML language (Wikipedia link). The original version of HTML5 (5.0 so-to-speak) was published in 2014, version 5.1 was published in 2016. According to Wikipedia, HTML5 is no longer based on SGML.
- Page
- A page, or web page, is frequently used to refer to a single HTML document and all of the external resources that it references (e.g. style sheets, images), typically displayed by a web browser after fetching everthing from the Internet
- Responsive
- A web page is said to be responsive when it is designed to work with all sorts of devices, from desktop computers to tablets to smart phones, by adapting (responding) to the requirements and constraints of each device. A typical constraint is the device's screen size. If there are different versions of a web page, e.g. one for desktop computers and one for mobile devices, then the web page is not responsive.
- SGML
- Standard Generalized Markup Language. Wikipedia link.
- Tag
- A tag is a markup construct that begins with < and ends with >. An HTML element typically is represented by an opening and a corresponding closing tag, e.g.
<div>
and</div>
. Sometimes opening and closing tags are contracted into a special form, the empty-element tag. Example:<br />
. - XML
- Extended Markup Language. Wikipedia link.
- Web, the Web
- Short name for the World Wide Web. A lot, if not most of the content of the web is made of HTML pages.
- URL
- Uniform Resource Locator. An address to a resource that can be found on the Internet. Example:
http://www.example.com/foo/bar.html
- XHTML
- Extensible Hyper Text Markup Language. Wikipedia link. This version of HTML is based on XML and is more restrictive in its syntax than the regular HTML (which is based on SGML). XHTML documents must be well-formed and can be processed by any XML parser, while regular HTML documents usually cannot be processed by an XML parser because they are allowed to have markup that does not conform to the "well-formed" requirement of XML (e.g. some elements do not have to have closing tags).
- XHTML 1.x
- Version 1.0 was published in 2000 as an official recommendation of the W3C. It is based on HTML 4. Version 1.1 was published in 2001.
- XHTML 2.0
- Development on this version was abandoned in favour of HTML5 and XHTML5.
- XHTML5
- XHTML5 is based on HTML5. As of writing this, no version of XHTML5 has been officially published yet as a recommendation of the W3C.
Syntax
HTML vs. XML
As mentioned in the glossary, XHTML is a version of HTML that is based on XML and is more restrictive in its syntax than regular HTML. Notably, the document must be well-formed (according to the XML definition of "well-formed"), which includes
- All elements must appear in the document with a closing tag, or use a special tag type called empty-element tag which denotes both opening and closing at the same time (e.g.
<br />
). - All element names must be in lowercase
- All attributes must have explicit values
In comparison to XHTML, regular HTML is more lenient in its syntax:
- Some elements do not need to have a closing tag. TODO: Which ones? At the time that I'm writing this, the W3C Validator declares even documents with missing
</body>
tag as valid - in fact, I can even have tags with arbitrary element names in the document and it is still accepted as valid! - Element names can be uppercase, lowercase or mixed case - it doesn't matter
- Some boolean attributes can be specified with just their name. Examples: checked, selected, multiple. This is called "attribute minimization", which is an SGML practice. In XHTML these attributes must be written out explicitly, with the attribute value being the same as the attribute name.
Note: This StackOverflow post shows a solution how the validator.nu online validator can be persuaded to perform validation with stricter rules. The trick is to add a namespace to the html
element (<html xmlns="http://www.w3.org/1999/xhtml">
) and to select the validator preset "XHTML + SVG 1.1 + MathML 3.0".
DTD
HTML5 is not based on SGML and therefore has no DTD (Document Type Definition).
HTML5 still requires a bare-bones Document Type Declaration (DOCTYPE declaration). Every HTML5 document must begin with the following line:
<! DOCTYPE html>
Whitespace
Spaces, tabs, newlines, carriage returns - all these are summarily called "whitespace". Web browsers do not format a document according to the whitespace that they find in a document, instead they render every type of whitespace character as a regular space character. Furthermore, web browsers contract all consecutive whitespace characters and render them as a single space character.
Comments
Comments are not rendered by web browsers at all. You can place comments inside an HTML document using the following syntax:
<!-- This is a comment text -->
class
and id
attributes
There are a number of global attributes that can be set on (almost) any HTML element. Two of those are of particular interest:
- class
- Used for classifying elements. In any given document, several elements can have the same class. An element can have several classes, in that case separate the class names with a space character (e.g.
<div class="foo bar">
) - id
- Used for identifying a specific instance of an element. In any given document, only one element can have a specific id.
class
and id
are used by CSS as anchors for applying styles.
Character references
Certain characters have special meaning and cannot be used as-is in an HTML document. For instance, the character "<" is interpreted as the beginning of an opening or closing tag. Other special characters may be impossible to encode using a document's encoding.
Characters such as these must be "escaped" by writing them in the form of a character reference. The format of a character reference is this:
&foo;
Instead of "foo" you write the reference to the desired character. A character can be referenced in one of two ways:
- By using the character's name (e.g. the copyright sign © =
©
) - By using the character's numeric value (e.g. the copyright sign © =
&169;
)
Predefined sets exist both for character names and for numeric values. The "Web Standards Project" (see the References section) has a list for both.
URLs
A few special types of URLs may be used in anchor elements:
- #foo
- Links to an element in the same document that has the
id
attribute set to the value "foo" - mailto:foo@bar.com
- Causes a mail client to open with a "new message" window open and the email address set as the message receiver.
- tel:123456789
- Causes a telephony client to open and to dial the specified number. Usually the user must confirm that the call should be made. Best practice is to specify the number in international format. Some browsers, especially on smart phones, attempt to auto-detect telephone numbers, but if a document contains long'ish number sequences the auto-detect routines may generate false positives. Auto-detection can be prevented by specifying this in the document header:
<meta name="format-detection" content="telephone=no" />
Semantics
Semantics vs. Presentation
TODO: Mention the role of CSS, and that some traditional elements in HTML 4.01 and XHTML 1.0 are presentational in nature (e.g. <font>, <i> and <center>
), but those are illegal in HTML5.
Basic structure of an HTML5 document
This is how a minimal HTML5 document looks like:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>Document title</title> </head> <body> </body> </html>
Notes:
- The
lang
attribute is really optional - The
meta
element that specifies the character set is recommended and should appear before any content-based elements, such astitle
- The
title
element is the only thing that is mandatory in the document head section
External resources
TODO: Style sheets etc. and how they are referenced
The DOM
TODO
Element types
Block vs. inline elements
HTML elements are displayed either as block elements or as inline elements:
- Browsers treat block elements as though they are in little rectangular boxes, stacked up within the page. Block elements begin on a new line, and typically some space is also added above and below the element. Examples of block elements: Headings, paragraphs.
- Inline elements do not break the flow of the text. HTML5 calls inline elements "text-level semantic elements". Examples of inline elements:
em
.
Whether an element is displayed as a block or inline element can be controlled with the CSS property display
. See the LearningCSS page for details.
Metadata content elements
Metadata content elements are not displayed to the user, instead the information that they contain tells the browser something or other about the page. Examples are meta
, style
, link
and title
.
Flow content elements
Flow content elements are almost all elements that can be used in the body of a page. The only elements excluded from this category are elements that have no effect on the document's flow. Examples are meta
, script
and link
in the document's head. TODO: Examples for elements in the document's body.
FWIW, [LearningWebDesign] p. 73 and p. 76 mention that li
and dd
elements may contain any type of "flow element" or "flow content".
Content-grouping elements
[LearningWebDesign] p. 76 mentions that "content-grouping elements (like paragraphs)" are not allowed to appear in an dt
element. Later on that same page it says that the HTML5 specs considers the following elements to "group content":
- p
- hr
- ul, ol, dl
- div
- blockquote
- pre
- figure
- figcaption
Sectioning content elements
- section
- article
- nav
- aside
- TODO: Are
address
,header
andfooter
also sectioning elements?
General notes:
- Sectioning elements create a new item in the document outline.
- A sectioning element may have its own internal heading hierarchy, regardless of its position in the parent document.
Heading content elements
Examples: h1
, h2
, etc.
TODO: Give an actual definition of this category.
Sectioning roots
If a heading occurs within an element that is in the category "sectioning roots", then the heading is not included in the document outline. Elements that are sectioning roots:
- blockquote
- figure
- details
- fieldset
- td
- body (TODO: why?)
Phrasing content elements
Phrasing content elements are approximately (but not exactly) those elements that are inline elements. Some examples:
- img
- em
- strong
- cite
TODO: Example of a phrasing content element that is not an inline element.
Embedded content elements
Examples: img
, video
, canvas
, embed
, object
.
TODO: Give an actual definition of this category.
Interactive content elements
Interactive content elements are those that have a representation that the user can, in some way, interact with. Examples:
- a
- form
- audio, but only when the "controls" attribute is present
- input, but only when the "type" attribute is not set to "hidden"
Forms
Notes about forms:
- The data entered by the user is encoded by the web browser before it is sent to the server. The encoding method is the same used for URLs, e.g. a space character is encoded as %20, a slash characters is encoded as %2F, etc..
- The web browser sends the data to the server using a HTTP request of type GET or POST, depending on what the
method
attribute of theform
element specifies- POST sends the data behind the scenes to the URL that was specified in the
action
attribute, the browser then displays whatever is sent back by the server as the response. The user does not get to see the URL used for the POST request. - GET tacks the data onto the URL that was specified in the
action
attribute in the form?variableNameA=variableValueA&variableNameB=variableValueB&...
and then sends a request to the resulting URL. The browser's address field changes as a result to this and the form data becomes visible as part of the URL. The URL can be bookmarked by the user. Typical usage of this are search engine queries.
- POST sends the data behind the scenes to the URL that was specified in the
- It is best practice to wrap form controls in semantic HTML elements such as lists or
div
. - A form control represents a variable. The variable name is specified by the
name
attribute of the form control element, the variable value is the data that the user enters.
APIs
- Canvas API
- Adds 2D drawing
- Drag and Drop API
- Add drag and drop functionality to the browser
- Editing API
- Create text editors embedded in the browser
- Geolocation API
- For location-based stuff
- Media Player API
- Controls audio and video. Markup is done with the
audio
andvideo
elements. - Offline Web Application API
- Makes it possible for a web application to work even when there is no Internet connection.
- Session History API
- Exposes the browser history for better control over the Back button.
- Web Storage API
- Go beyond cookies for storing data in the client's browser cache
- Web Workers API
- Provides a way to run computationally complicated scripts in the background while the browser keeps a responsive UI.
- Web Sockets API
- Allows network traffic between client/server without the HTTP overhead
Web browsers
TODO: Write about important specifics of only the most important web browsers: Firefox, Internet Explorer, Edge, Safari, Chrome, Opera.
- In Firefox 53, if a
textarea
form control should not contain any default text, you still must add a closing tag. If you use the empty-element syntax, Firefox will gobble up the remainder of the document as the default text of thetextarea
control.
HTML5 support in Internet Explorer
Old versions of Internet Explorer do not have support for HTML5. This can be fixed by adding some snippets to the header of the HTML document.
Add this snippet for all Internet Explorers that do not have support for HTML5:
<style> section, article, nav, aside, header, footer, hgroup { display: block; } </style>
For Internet Explorer 8 and older, the snippet above is not enough. For IE 8 and older, also add this snippet:
<!--[if lt IE 9]> <script src="http://html5shiv.googlecode.com/svn/ trunk/html5-els.js"></script> <![endif]-->
Ask the Duck to find the article on html5doctor.com that explains the issue.
HTML5 outlining system
[LearningWebDesign] p. 81 states that
"As of this writing, no browsers support the HTML5 outlining system, so to make your documents accessible and logically structured for all users, it is safest to use headings in descending numerical order, even within sectioning elements."
Web servers
TODO: Write about a few important things (and only a few!) that pertain to web servers and HTML. Examples:
- The default file name is typically
index.html
, but can also bedefault.htm
- File extensions are typically .html, sometimes .htm
Elements
Basic document structure elements
The elements that make up the basic structure of an HTML document are:
- html
- The root element.
- head
- The first of the two elements below the root element. Contains the "header" of the document, which are various things that the browser does not display (with the exception of the document title).
- body
- The second of the two elements below the root element. Contains the "body" of the document, which is the actual web page content that the browser displays.
Header elements
These are elements that can appear in the document header <head>
:
- meta
- Has meta information about the document, such as the character encoding.
- title
- The document title. The browser typically displays this in the title of a browser tab.
- style
- A CSS style sheet that is embedded in the document.
Basic block elements
- p
- Paragraph. May contain text, images and other inline elements, but no block elements such as headings, lists or sections.
Basic inline elements
- a
- Anchor, or hypertext link. The
href
attribute is used to specify the target URL. Whatever is inside the anchor element becomes a clickable link. Typical content inside an anchor element is text or an image. In HTML5, anything can be placed within an anchor element - even block elements. In HTML 4.x, an anchor element could have only inline content. The special URL syntax "#foo" refers to an element in the same document that has theid
attribute set to the value "foo". Thedownload
attribute indicates that the targeted resource should be downloaded rather than navigated to. Thetarget
attribute can be set to the value "_blank" to tell the browser to open the referenced document in a new tab or window. Any other value than "_blank" will attempt to find a window that already exists with that name and open the referenced document in that window. Opening in a new window may fail if the browser is set to block popup windows. - abbr
- Abbreviation or acronyms. The
title
attribute is used to provide the long version of the abbreviation or acronym. - br
- Line break. This is an empty element.
- cite
- Citation. A reference to the title of a work, such as a book title.
- dfn
- The first and defining instance of a word or phrase in a document.
- em
- Emphasized text. Think of a sentence where the emphasis on a given word can change the sentence's meaning.
- mark
- Highlighted text of some sort. Example: Search terms.
- strong
- Important text. Semantically, "importance" is not the same as "emphasis". Note that
strong
elements can be nested to provide even more importance to something that is already important. - i
- Text that is in alternate voice or mood, or otherwise offset from the surrounding text in a manner that indicates a different quality of text. Useful examples for this vague definition: The scientific name of a species, or a phrase from another language (reductio ad absurdum), or a voice sounding over the telephone in a piece of fiction. Note that before HTML5 the
i
element was used for giving typesetting instructions ("italics") - in HTML5 a text marked up withi
may still be displayed in italics, but that is pure coincidence :-). - b
- Text to which "attention is being drawn for utilitarian purposes without conveying extra importance, and with no implication of an alternate voice or mood". Somewhat useful examples for this vague definition: Keywords in a document abstract, product names in a review, article lead paragraph (aka "lede"). Note that before HTML5 the
b
element was used for giving typesetting instructions ("bold") - in HTML5 a text marked up withb
may still be displayed in bold, but that is pure coincidence :-). - sub
- Subscript.
- sup
- Superscript.
- small
- Side comments such as a copyright notice at the bottom of a page.
- time
- Date and/or time information for which a machine-readable form can be established. Typically, the element content is the date/time information in human readable format and the element has a
datetime
attribute that specifies the date/time in machine-readable form. If thedatetime
attribute is missing, then the element content itself must be machine-readable. A large number of machine-readable formats are allowed, too many to list here so if in doubt consult the HTML5 specs. Note: The date must be a date on the Gregorian calendar, which means that BC dates are not possible. TODO: The HTML5 specs state that "For times without dates (or times referring to events that recur on multiple dates), specifying the geographic location that controls the time is usually more useful than specifying a time zone offset, because geographic locations change time zone offsets with daylight savings time." How can I specify the geographic location? - wbr
- Indicates a word/line break opportunity. This is an empty element.
Heading elements
- h1 - h6
- Level 1-6 headings
- hgroup
- Used to group heading elements together. This is used to suppress subheadings from the document outline (e.g. a table of contents) that a browser may generate for a document. The purpose is simple: Subheadings or taglines that merely have a clarifying function in reference to their parent heading should not introduce a new level to the document outline. Adding any number of heading elements to a
hgroup
element causes the browser to only include the highest-ranked heading within the group in the document outline. Note: The future of thehgroup
element is uncertain. The W3C specs have dropped the element entirely, the WHATWG specs still mention it, but according to [HtmlForTheRealWorld] p. 42 browsers do not really support the element.
List and list item elements
- ul
- Unordered list. Despite the name, the browser displays list items exactly in the order in which they appear in the document - it's only the elements' semantics that say that the reader is not supposed to pay any attention to the list item order. Unordered list items are typically displayed with bullet points to prevent the reader from adding any significance to the order in which the items appear.
- ol
- Ordered list. List items are typically displayed with increasing numbers to add signifcance to the order in which the items appear. By default the first list item gets number 1, but this can be changed by setting the list element's
start
attribute to any number. Thereversed
attribute specifies that the list items should be displayed in reverse order. - li
- List item in an unordered or ordered list.
- dl
- Description list. List items consist of two things: A name and a value.
- dt, dd
- Appear together, usually first
dt
thendd
, to make up a list item in a description list.dt
defines the name of the item,dd
the value of the item. Because description lists are extremely well suited for creating glossaries,dt
can be remembered to define the "term" anddd
the "definition" of a glossary item. It is possible to list multiple terms (dt
) and then have a single common definition (dd
) for all of the terms, or to have a single term with multiple definitions, or even both.
General notes:
- Both list elements and list item elements are block elements
- A list element may contain only list item elements
- A list item element may contain any type of flow element, including another nested list. The exception is the
dt
element which may not contain "headings or content-grouping elements (like paragraphs)" (cf. the "Learning Web Design" book).
Table elements
- table
- The table itself. Tables are made up of rows.
- tr
- A table row. Rows are made up of table header cells and/or table data cells.
- th
- A table header cell. The
rowspan
andcolspan
attributes define if a cell spans more than 1 row or column. Thescope
attribute can be used to explicitly state what the header cell is associated with. Possible values are "row", "column", "rowgroup" or "colgroup", meaning that the header cell is either a header for a single row, single column, a group of rows (thead, tfoot, tbody - see below) or a group of columns (colgroup - see below). Ifscope
is not sufficient (e.g. in a table where a lot of column or row spanning is going on), the table header cell can be given anid
attribute and all table data cells must then reference back to the header cell using theheaders
attribute, specifying the header cell's id as the attribute value. - td
- A table data cell. Row/column spanning is done exactly the same as with table header cells.
- caption
- The table caption.
- thead, tfoot, tbody
- These elements group one or more table rows (
tr
elements) together. Semantically, the rows are thus placed in the table header, the table footer or the table body. - col
- A table column. Table column elements can only appear within a column group element. A table column element may represent several columns, the number is specified with the
span
attribute. Table columns are enumerated in the order which corresponds to their placement in the table, i.e. the firstcol
element describes the first column, the secondcol
element describes the second column, etc.. Theclass
attribute can be used to provide a CSS styling anchor. - colgroup
- This element groups one or more columns together. It is possible to have any number of column groups, because column groups do not have any special semantic meaning (unlike row groups, which semantically denote table header, footer and body). A table column group element either contains
col
elements, or specifies the number of columns it represents using thespan
attribute. As with columns, theid
orclass
attributes are used to provide styling anchors.
Note: The order in which elements appear in a table is important:
- 0-1 caption
- 0-n colgroup
- 0-1 thead
- 0-1 tbody or 1-n tr
- 0-1 tfoot
Instead of at the end, the tfoot
element may also appar right after the thead
element.
Quotation elements
- blockquote
- Long quotation. This is a content-grouping element. It is recommended (but not required) that content within
blockquote
be contained in other elements such as paragraphs, headings or lists. - q
- Short quotation. This is an inline element. Note: Browsers automatically put quotation marks around text marked up with
q
.
Figure elements
- figure
- A figure that illustrates some other content in the document. Both the figure and the content it illustrates should be self-contained units that can be separated from each other and that do not have to appear together in the flow of the document. The figure may be an image (
img
element), but can also be a text snippet of some sort. - figcaption
- The caption of a figure. This element must appear inside a
figure
element.
Page organization elements
- section
- Thematic group of content. A section usually contains a heading. Sections are useful to divide an entire document, or an article. TODO: Any restrictions on where a section can be used?
- article
- Self-contained work that could be reused in some other context.
- aside
- Content that is related but tangential to the surrounding content. Think of it as a "sidebar".
- header
- Introductory material of the entire document, or of a section or article. There are no restrictions on what this may contain. The surrounding content defines what the introductory material refers to.
- footer
- End material of the entire document, or of a section or article. There are no restrictions on what this may contain. The surrounding content defines what the end material refers to.
- address
- Address of the author of the entire document, or of a section or article. This is not used for arbitrary kinds of addresses, the semantic meaning is specifically intended for author contact information.
- nav
- Content that is used to provide primary navigation of a site or a lengthy article. The
nav
element typically containsa
elements. - div
- This is a generic block element.
Program code inline elements
The following elements are all inline elements.
- code
- A small fragment of computer code. Examples: A file name, an XML element name, a keyword from a programming language, etc.
- kbd
- Keyboard. Text entered by a user. This is typically used in technical documents.
- samp
- Sample output of a program
- var
- Variable name
Measurement / gauge elements
- progress
- Describes the current status of a changing process. Defining the completion state is optional. The
progress
element supports at least these two attributes:max
,value
. TODO: Are there more attributes? - meter
- Describes a value in a well-defined range where the minimum and maximum values are known. Example: Disk usage. Counter examples: Age, height, weight (because these have unknown maximum values). The
meter
element supports six attributes:min
,max
,value
,high
,low
andoptimum
.
Form elements
- form
- The main element that represents a form. Forms cannot be nested, but a form can contain block elements. The
action
attribute specifies the URL to which the form data should be sent. Themethod
attribute specifies the HTTP request type that should be used with theaction
URL. Possible values are "get" (the default) and "post". - input
- Generic element that represents a form control. This is an empty element. The
type
attribute specifies the type of the form control. Thename
attribute specifies the variable name. Thevalue
attribute specifies the control's default value. Theplaceholder
attribute is used to display a hint to the user what kind of data he is supposed to enter. The placeholder text is displayed only if the form control does not contain any data. - button
- More modern variant of an
input
element with type "submit", "reset" or "button". The major advantage of thebutton
element is that it can have other content besides text, for instance an image. Thebutton
element also has attributes that can be used to control form submission. Thetype
attribute controls the behaviour of a button when it is activated. Possible values fortype
are "submit" (the default), "reset" and "button" and they have the same meaning as for theinput
element. Note: Buttons are not restricted to forms, they can be used anywhere on a a page. - textarea
- Form control that represents a multiline text field. As with the
input
element, thetextarea
element has the attributesname
andplaceholder
. The default value, however, is specified by the text content of the element. Additional attributs arerows
andcols
that specify the dimensions of the text field. - datalist
- Enumerates values for a list of pre-defined values that the user can select from a drop-down menu in a text entry field. The user is not restricted to those values, i.e. this is not a regular combobox. Values are enumerated inside the
datalist
element using<option value="foo" />
elements. Thedatalist
element has anid
attribute through which it can be referenced by theinput
element (via thelist
attribute). Thedatalist
element is defined outside theinput
element. Note: Apparently not all browsers support this. - select
- Either a combobox or a listbox. The value of the
size
attribute determines which one: If the attribute is missing or has value 1 it's a combobox. All other values are a listbox and define how many rows the listbox has. Values are enumerated inside theselect
element using<option value="the-submit-value">The display value</option>
elements. Themultiple
attribute enables multi-selection in a listbox. Theselected
attribute defines which options are selected (in a combobox only one option can ever be selected, in a listbox themultiple
attribute is important). The proper attribute values formultiple
andselected
that must be used for XHTML are "multiple" and "selected" (duh!). - output, keygen
- Somewhat esoteric and apparently poorly supported controls. See the specs for more info.
Types of form controls, i.e. these are input
elements whose type
attribute has been set to the corresponding value:
- text
- Single-line text field. This is the default type. The
minlength
andmaxlength
attributes are useful for this. - password, search, email, tel, url
- Text controls that allow to enter text of various types.
- submit, reset, button
- Button controls. The
value
attribute can be used to provide a custom button text that overrides the browser default. The types "submit" and "reset" generate buttons that trigger form submission and reset, respectively. The type "button" generates buttons that don't do anything unless you add a JavaScript event handler to theirclick
event. - radio
- Radio button control. All radio buttons that have the same value for the
name
attribute are grouped together. The value for thevalue
attribute must be unique, though, because the value for the selected radio button is sent to the server and identifies the user's choice. Thechecked
attribute, if present, sets a radio button to the checked state - the proper value for the attribute that must be used for XHTML is "checked". - checkbox
- Checkbox button control. The same rules apply as for radio buttons.
- file
- File selection control. The
size
attribute defines the width of the text field (if the browser displays one at all). When a file is transmitted via a form, you must use the POST method and set the form'senctype
attribute to the value "multipart/form-data". - hidden
- A hidden form control whose purpose is to send a pre-determined name/value pair to the server when the form is submitted.
- date, time, datetime, datetime-local, month, week
- Picker control for choosing from a variety of date-related values. The
value
attribute defines the default value. - number, range
- Spinner and slider controls to pick a number. Both controls have the
min
andmax
attributes to define the available number range to pick from. The slider additionally has thestep
attribute which can be set to a fractional value. - color
- Color picker control. Values are in hex RGB format (#RRGGBB).
Accessibility features:
- label
- Associates a label with a form control. Text in the
label
element is displayed to label the form control. If a form control is nested within alabel
element, the association between the two is implicit. If the form control and the label are separately placed in the document, theid
attribute value of the form control must be used as thefor
attribute value of the label element. In that case the association between the two is explicit. - fieldset
- Used to place form controls into logical groups. Browsers typically display fieldsets as a group box.
- legend
- Used to give a fieldset a title. The
legend
element must be nested within its parentfieldset
element.
Generic elements
The following generic elements have no special semantic meaning and are used when no other, more specific element can be used. Generic elements are typically used in conjunction with the class
and/or id
attributes to provide an anchor for CSS styling.
- div
- This is a generic block element. Content marked up by this is conceptually related in some way. A
section
is an alternative todiv
that is slightly less generic. - span
- This is a generic inline element.
Other elements
- hr
- Horizontal rule. A logical divider between sections or paragraphs, used on the same level as sections and paragraphs. It indicates a thematic break of some sort. Do not use this to display a horizontal line - it's better to create a border with CSS. This is an empty element.
- pre
- Preformatted text. Text in this element is displayed as-is, i.e. even whitespace is preserved exactly as it appears in the source document. Preformatted text is typically displayed in a constant-width font, making it ideal for source code snippets or ASCII-style diagrams.
- img
- An image. This is a phrasing element. This is an empty element. The
src
attribute is used to specify the URL of the image, thealt
attribute is used to specify a replacement text that should be displayed when images are not available, and thetitle
attribute is used to specify a tooltip text. Note that thealt
attribute must be present for the document to be valid! Images are inline elements, and they are aligned with the baseline of the surrounding text. According to [LearningWebDesign] p. 123, only the GIF, JPEG and PNG image formats are supported by browsers (TODO: is this true?). The same page also says that the file extensions .gif, .jpg and .png must be used, but this is not strictly true - something else can be used as long as the web server supplies the appropriate content type when the browser requests the image. Thewidth
andheight
attributes can be used to specify the image size upfront - doing so may considerably speed up the layouting of the page, but at the cost of flexibility (e.g. the server may want to send differently sized images depending on whether the client is a desktop or a mobile device browser). Also, if an image does not match the predefined size, the browser will resize the image to the prescribed size, possibly causing the image to appear blurry. - picture
- Apparently similar to
img
but lets you specify multiple image sources. Intended to help with responsive web design, e.g. define a low-res version of an image for mobile and a high-res version for desktop. TODO: Add more details. What about browser support? - details
- Marks up a part of the document as hidden by default, but the user can expand the section to reveal the additional information. An optional
summary/code> element inside the
details
element provides a short text that the browser displays even when the details section is collapsed. Theopen
atttribute, if present, specifies that the details section should be expanded by default. Note that at the time of writing browser support for thedetails
element is still incomplete (no support in Firefox and IE). - iframe
- Embeds a separate document in the current document inside an inline frame. The frame displays scrollbars if the embedded document is too large to fit the size specified for the
iframe
element. Similar to an image, thesrc
,width
andheight
attributes are used to describe the inline frame. If a browser does not support inline frames, it will display the content of the element instead (i.e. there's noalt
attribute).
Global attributes
- accesskey
- Used to assign a keyboard shortcut to an element, typically a form control. When the user hits the keyboard shortcut the element is "activated", i.e. it gets the focus. Valid shortcuts are single characters. Example:
accesskey="c"
.
- class, id
- Assigns a class or an identifier to an element. These are typically used as anchors for CSS styling. The identifier is sometimes also used to reference one element from another (e.g. labels reference form controls via their
id
).
- contextmenu
- Associates a context menu with an element. The attribute value is the identifier of the
menu
element. TODO: What is the menu
element?
- draggable, dropzone
- Assign drag & drop capabilities to parts of the document. TODO: How does this work?
- hidden
- If present, an element and its descendants are not rendered.
- lang
- Two-letter language code of an element (ISO 639-2)
- style
- Semicolon-separated style rules
- tabindex
- Location of an element in the tab order of the document. Value -1 removes the element from the tab order.
- title
- Title of an element, typically displayed as a tooltip