A dive into the `Link` header

Published on June 22, 2019 by Arnau Siches

Table of contents

This note describes my current understanding of the Link header as defined by the IETF RFC8288.

# The link model

In words of the RFC8288, “[...] a link is a typed connection between two resources [...]”.

The RFC defines the following concepts:

Link context : The subject of the connection.
Link target : The object of the connection.
Link relation type : The type of connection.
Target attributes : A set of optional target descriptors.
Target attribute extension : Set of attributes not defined by the RFC.

For example, given a resource A, the context, linked to a resource B, the target, with a relation type alternate it can be depicted as:

Building on that, let's define that B, the target, has a content type [IETF RFC2046] of text/csv :

Given that we are in HTTP territory, resources are identified by URLs. The previous example, using URLs could be something like:

This model can be expressed in many ways: HTML, Atom, Turtle, etc. The RFC8288 defines how to express links using the HTTP Link header.

# The header syntax

At first, the syntax seems straightforward but bit by bit we'll see how it gets more knitted. Let's start with a basic example:

<https://example.org/foo.csv>; rel="alternate"; type="text/csv"

It can be expressed in ABNF as:

header = link *(OWS "," OWS link)
link   = "<" target ">" *(OWS ";" OWS param)
param  = name ["=" value]
target = uriref
name   = token
value  = token / quoted-string

The above, although not complete, already expresses a few rules that are not obvious with the initial example:

Whitespace (OWS) is optional in many cases.
A param may not have a value.
A value may or may not be quoted.

For example, this is equivalent to the first example:

<https://example.org/foo.csv>;rel=alternate;type="text/csv"

Another example with multiple links:

<https://example.org/foo.csv>; rel="alternate"; type="text/csv",
<https://example.org/>; rel=canonical,
<http://other.net>; private; anchor="#foo"

Let's twist it a bit; when the param name ends in * the value should follow the IETF RFC8187, for example UTF-8'en'An%20example. With that in mind we can extend the original ABNF as:

header    = link *(OWS "," OWS link)
link      = "<" target ">" *(OWS ";" OWS (ext-param / param))
param     = name ["=" value]
target    = uriref
name      = token
value     = token / quoted-string
ext-param = name "*" "=" ext-value
ext-value = encoding "'" [language] "'" pct-value

# The semantics

Once we have extracted the primitive values from the Link header, we need to process them to obtain the list of actual links.

# The context

The context is the resource where the header comes from. Its identifier is the request URL. For example, the following HTTP request makes https://example.net/things the context for all links found in the response header.

GET /things HTTP/1.1
Host: example.net
Accept: application/json

HTTP/1.1 200 OK
Content-Type: application/json
Link: </things?p=2>; rel="next"

Things couldn't be that straightforward; if a link has a param anchor, the context will change. The anchor value must be a valid URI, either absolute or relative.

If it's absolute, it replaces the context entirely. For example:

GET /things HTTP/1.1
Host: example.net
Accept: application/json

HTTP/1.1 200 OK
Content-Type: application/json
Link: </>; rel="canonical"; anchor="https://other.org"

Makes https://other.org the context for that link.

If it is relative, the rules for joining URI's apply. For example:

GET /things HTTP/1.1
Host: example.net
Accept: application/json

HTTP/1.1 200 OK
Content-Type: application/json
Link: </copyright>; rel="copyright"; anchor="#section_3"

Makes https://example.net/things#section_3 the context for that link.

Any anchor param other than the first one must be ignored.

Finally, the RFC says:

Note that depending on HTTP status code and response headers, the link context might be "anonymous" (i.e., no link context is available). For example, this is the case on a 404 response to a GET request.

The rules and reasoning for that escape my understanding.

# The target

The target is identified by the URI resulting of joining the URI reference found in target (see ABNF rules) with the context.

For example, the following HTTP request makes https://example.net/things?p=2 the link target.

GET /things HTTP/1.1
Host: example.net
Accept: application/json

HTTP/1.1 200 OK
Content-Type: application/json
Link: </things?p=2>; rel="next"

# The relation type

The rel param defines the relation type of a link. Any rel param other than the first one must be ignored.

The RFC requires a link to have a relation type but the syntax is lax enough to allow for a link to not have one. That makes it somewhat compatible with HTML which does not require links to have any relation type.

For example, the following link defines https://example.net/ as the canonical resource for the context.

<https://example.net/>; rel="canonical"

You can express multiple relation types at once by separating the types with spaces. For example, the following are equivalent:

<https://example.org/things>; rel="first previous"

<https://example.org/things>; rel="first", <https://example.org/things>; rel="previous"

Previous versions of this RFC define a rev param to define a reverse relation type. RFC8288 deprecates it.

# The target title

The title param defines the target title attribute. Any title param other than the first one must be ignored. For example:

</spoons/>; rel="chapter"; title="Spoons, spoons and some more spoons"

Now, HTTP headers are not supposed to have characters other than a subset of the US-ASCII encoding so title values are quite limited. The way to escape from this limitation is using the title* param. It expects its value to follow the IETF RFC8187. In short, it encodes three values in one: encoding, language and the title encoded with percent-encoding.

The syntax looks like:

ext-value = encoding "'" [language] "'" pct-value

Where encoding is UTF-8, the optional language is a valid IETF RFC5646 language tag and the percent encoded value uses valid UTF-8 codepoints. The encoding could be something different but as far as I can tell only UTF-8 is normative.

The following link defines both title and title*. As per the RFC, title* takes precedence as long as it can be decoded, otherwise it falls back to title.

</spoons/>; rel="chapter"; title="Spoons"; title*=UTF-8'en'Spoons%20%F0%9F%A5%84

So the title of the link above is “Spoons 🥄” but falls back to “Spoons” if the processor can't handle UTF-8 correctly.

# The target content type

The type param hints the target content type attribute. Any type param other than the first one must be ignored.

The value is expected to conform to the IETF RFC6838 media type.

</spoons/>; rel="chapter"; type="text/html"

# The target language

The hreflang param hints the target language attribute. Multiple hreflang param hint that the target resource is available in these languages.

The value is expected to conform to the IETF RFC5646 language tag.

</spoons/>; rel="chapter"; hreflang="de"; hreflang="en"

# The target medium

The media param hints the target medium attribute. Any media param other than the first one must be ignored.

The value is expected to conform to the W3C media queries.

</spoons/>; rel="chapter"; media="screen and (color)"

# Other target attributes

Any other param is considered be a target attribute with unknown semantics. It is suggested that the star rule applied to title and title* is applied to other pairs (e.g. author and author*).

# Why would I want to use a link header instead of “x”?

Links can be defined in many ways depending on the capabilities of the serialisation format. For example, HTML has a well defined way via link, a and area; HAL has the _links property; JSON-LD builds on top of RDF; etc.

But not all serialisation formats have the same expressivity. For example, CSV is a rigid tabular format with no margin for contextual information. In this case, expressing links via a HTTP header is an interesting possibility. CSV on the Web uses it to link to the metadata with a rel="describedby".

# Resources

IETF RFC2046. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types.
IETF RFC3986. Uniform Resource Identifier (URI): Generic Syntax.
IETF RFC5646. Tags for Identifying Languages.
IETF RFC6838. Media Type Specifications and Registration Procedures.
IETF RFC8187. Indicating Character Encoding and Language for HTTP Header Field Parameters.
IETF RFC8288. Web linking.
IANA relation type registry.
W3C RDF 1.1.

A dive into the Link header