A dive into the Link
header
This note describes my current understanding of the Link
header as defined
by the IETF RFC8288.
The link model
In words of the RFC8288, “[...] a link is a typed connection between two resources [...]”.
The RFC defines the following concepts:
- Link context: The subject of the connection.
- Link target: The object of the connection.
- Link relation type: The type of connection.
- Target attributes: A set of optional target descriptors.
- Target attribute extension: Set of attributes not defined by the RFC.
For example, given a resource A
, the context, linked to a resource B
, the
target, with a relation type alternate
it can be depicted as:
Building on that, let's define that B
, the target, has a content type
[IETF RFC2046] of text/csv
:
Given that we are in HTTP territory, resources are identified by URLs. The previous example, using URLs could be something like:
This model can be expressed in many ways: HTML, Atom, Turtle, etc. The RFC8288
defines how to express links using the HTTP Link
header.
The header syntax
At first, the syntax seems straightforward but bit by bit we'll see how it gets more knitted. Let's start with a basic example:
<https://example.org/foo.csv>; rel="alternate"; type="text/csv"
It can be expressed in ABNF as:
header = link *(OWS "," OWS link)
link = "<" target ">" *(OWS ";" OWS param)
param = name ["=" value]
target = uriref
name = token
value = token / quoted-string
The above, although not complete, already expresses a few rules that are not obvious with the initial example:
- Whitespace (OWS) is optional in many cases.
- A param may not have a value.
- A value may or may not be quoted.
For example, this is equivalent to the first example:
<https://example.org/foo.csv>;rel=alternate;type="text/csv"
Another example with multiple links:
<https://example.org/foo.csv>; rel="alternate"; type="text/csv",
<https://example.org/>; rel=canonical,
<http://other.net>; private; anchor="#foo"
Let's twist it a bit; when the param name ends in *
the value
should follow the IETF RFC8187, for example
UTF-8'en'An%20example
. With that in mind we can extend the original ABNF as:
header = link *(OWS "," OWS link)
link = "<" target ">" *(OWS ";" OWS (ext-param / param))
param = name ["=" value]
target = uriref
name = token
value = token / quoted-string
ext-param = name "*" "=" ext-value
ext-value = encoding "'" [language] "'" pct-value
The semantics
Once we have extracted the primitive values from the Link
header, we need to
process them to obtain the list of actual links.
The context
The context is the resource where the header comes from. Its identifier is
the request URL. For example, the following HTTP request makes
https://example.net/things
the context for all links found in the response
header.
GET /things HTTP/1.1
Host: example.net
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json
Link: </things?p=2>; rel="next"
Things couldn't be that straightforward; if a link has a param anchor
, the
context will change. The anchor
value must be a valid URI, either absolute
or relative.
If it's absolute, it replaces the context entirely. For example:
GET /things HTTP/1.1
Host: example.net
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json
Link: </>; rel="canonical"; anchor="https://other.org"
Makes https://other.org
the context for that link.
If it is relative, the rules for joining URI's apply. For example:
GET /things HTTP/1.1
Host: example.net
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json
Link: </copyright>; rel="copyright"; anchor="#section_3"
Makes https://example.net/things#section_3
the context for that link.
Any anchor
param other than the first one must be ignored.
Finally, the RFC says:
Note that depending on HTTP status code and response headers, the link context might be "anonymous" (i.e., no link context is available). For example, this is the case on a 404 response to a GET request.
The rules and reasoning for that escape my understanding.
The target
The target is identified by the URI resulting of joining the URI reference
found in target
(see ABNF rules) with the context.
For example, the following HTTP request makes https://example.net/things?p=2
the link target.
GET /things HTTP/1.1
Host: example.net
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json
Link: </things?p=2>; rel="next"
The relation type
The rel
param defines the relation type of a link.
Any rel
param other than the first one must be ignored.
The RFC requires a link to have a relation type but the syntax is lax enough to allow for a link to not have one. That makes it somewhat compatible with HTML which does not require links to have any relation type.
For example, the following link defines https://example.net/
as the
canonical resource for the context.
<https://example.net/>; rel="canonical"
You can express multiple relation types at once by separating the types with spaces. For example, the following are equivalent:
<https://example.org/things>; rel="first previous"
<https://example.org/things>; rel="first", <https://example.org/things>; rel="previous"
Previous versions of this RFC define a rev
param to define a reverse
relation type. RFC8288 deprecates it.
The target title
The title
param defines the target title attribute. Any title
param
other than the first one must be ignored. For example:
</spoons/>; rel="chapter"; title="Spoons, spoons and some more spoons"
Now, HTTP headers are not supposed to have characters other than a subset of
the US-ASCII encoding so title
values are quite limited. The way to escape
from this limitation is using the title*
param. It expects its value to
follow the IETF RFC8187. In short, it encodes three values in one: encoding,
language and the title encoded with percent-encoding.
The syntax looks like:
ext-value = encoding "'" [language] "'" pct-value
Where encoding
is UTF-8
, the optional language
is a valid IETF
RFC5646 language tag and the percent
encoded value uses valid UTF-8 codepoints. The encoding could be something
different but as far as I can tell only UTF-8 is normative.
The following link defines both title
and title*
. As per the RFC, title*
takes precedence as long as it can be decoded, otherwise it falls back to
title
.
</spoons/>; rel="chapter"; title="Spoons"; title*=UTF-8'en'Spoons%20%F0%9F%A5%84
So the title of the link above is “Spoons 🥄” but falls back to “Spoons” if the processor can't handle UTF-8 correctly.
The target content type
The type
param hints the target content type attribute. Any type
param
other than the first one must be ignored.
The value is expected to conform to the IETF RFC6838 media type.
</spoons/>; rel="chapter"; type="text/html"
The target language
The hreflang
param hints the target language attribute. Multiple
hreflang
param hint that the target resource is available in these
languages.
The value is expected to conform to the IETF RFC5646 language tag.
</spoons/>; rel="chapter"; hreflang="de"; hreflang="en"
The target medium
The media
param hints the target medium attribute. Any media
param
other than the first one must be ignored.
The value is expected to conform to the W3C media queries.
</spoons/>; rel="chapter"; media="screen and (color)"
Other target attributes
Any other param is considered be a target attribute with unknown
semantics. It is suggested that the star rule applied to title
and title*
is applied to other pairs (e.g. author
and author*
).
Why would I want to use a link header instead of “x”?
Links can be defined in many ways depending on the capabilities of the
serialisation format. For example, HTML has a
well defined way via link
, a
and area
;
HAL has the _links
property;
JSON-LD builds on top of RDF; etc.
But not all serialisation formats have the same expressivity. For example, CSV
is a rigid tabular format with no margin for contextual information. In
this case, expressing links via a HTTP header is an interesting possibility.
CSV on the Web uses it to link
to the metadata with a rel="describedby"
.
Resources
- IETF RFC2046. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types.
- IETF RFC3986. Uniform Resource Identifier (URI): Generic Syntax.
- IETF RFC5646. Tags for Identifying Languages.
- IETF RFC6838. Media Type Specifications and Registration Procedures.
- IETF RFC8187. Indicating Character Encoding and Language for HTTP Header Field Parameters.
- IETF RFC8288. Web linking.
- IANA relation type registry.
- W3C RDF 1.1.
Licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.