1. HTTP (Hypertext Transfer Protocol)

  1. HTTP Clients
    1. Select the HTTP version
    2. Select a method
    3. Select the request URI
    4. Splitting the request-URI
      1. Host
    5. Optional Date header
    6. Connection options
      1. Connect header
    7. List acceptable responses
      1. Accept-Charset
      2. Accept-Encoding
      3. Accept-Language
      4. TE
    8. Authenticating
    9. Proxy Authorization
    10. Page navigation data
      1. Referer
      2. Cookie
    11. User agent information
      1. User-Agent
      2. From
      3. Related headers
    12. Conditional Request Headers
      1. If-Match
      2. If-None-Match
      3. If-Modified-Since
      4. If-Unmodified-Since
      5. If-Range
    13. Encode the payload if any
      1. Content-Type
      2. Content-Encoding
      3. Content-Language
      4. Content-Location
      5. Set Expect header
    14. Lookup request from cache
    15. Parsing the Response
    16. Parse zero or more 1xx responses
    17. Handle status code
    18. Compute the Caching Key for response
    19. Parse Set-Cookie header
      1. Store Session Data
    20. Determine if there is a response-body
      1. If there's a response-body, decode Content-Encoding
      2. Read language of request
      3. Determine Content-Type
    21. Read response body
    22. Overview table

HTTP Clients

This document describes how to implement an HTTP client, from deciding which method and resource to request, to processing the response and overcoming errors.

Select the HTTP version

Absent prior knowledge, use HTTP/1.1. If this is too high for the HTTP server, it will produce an appropriate error, and the request can be retried.

Select a method

An HTTP request begins with some request to ask for information or manipulate the state of the server. The exact kind of request is specified by selecting an HTTP method:

GET
Request a representation and metadata of a resource
HEAD
Request only metadata about a resource
POST
Have a resource on the server act on or process a client-provided payload
PUT
Store a resource on the server at a specific URI
DELETE
Delete a specific resource on the server
CONNECT
Open a bi-directional communication channel
OPTIONS
Request a document about communication options, for cases where HEAD is insufficient
TRACE
Sends back the request message, used for debugging proxies/gateways

Select the request URI

All requests in HTTP contain some sort of request URI which identifies a resource on the server to be acted on. Resource URIs are gathered from several places, typically:

  • Another hypermedia document,

  • A server-provided pattern for generating URIs (such as an HTML form or a URI Template),

  • An external source pointing to an entry point, typically the homepage, or

  • A bookmark saved during a previous use.

Splitting the request-URI

For historical reasons, HTTP requires the request URI be split up into several component parts to generate the request.

For background, consult the Host header.

Two lines are used to form the effective Request URI: the request-line (the first line in the request), and the Host header.

HTTP allows requests with an absolute-form URI in the request-line, however this is not supported by very many servers, and is typically only used by proxies. That is, a request in absolute form typically indicates a request to make a proxied request, instead of an origin request the server can answer authoritatively.

Most requests will be made with the origin-form:

 origin-form    = absolute-path [ "?" query ]

For example:

GET /request-URI HTTP/1.1
Host: www.example.org

It is not possible to explicitly send the scheme being used HTTP/1.0 or HTTP/1.1 using the origin-form, however the server is usually able to detect the scheme being used. If your networked application needs to support responding to arbitrary URI schemes, consider supporting the absolute-form of requests in your server.

Host

The very first header should be the Host header, since it is related to the request-URI and the request-line.

Optional Date header

The Date header is optional for request headers. It can be sent if it would convey useful information to the server.

Connection options

Connect header

Main article: the Connection header.

This is a hop-by-hop header: a header describing the connection details, instead of the message itself. This header, and any headers named by it, must be consumed.

List acceptable responses

Clients may specify the kinds of responses they'd prefer to receive, along a variety of dimensions:

Accept-Charset

Main article: Accept-Charset.

The client can list the kinds of character sets it supports, so that the server may send a variation the client is known to understand.

This is typically omitted by Web browsers; most servers will use UTF-8 which is suitable for virtually all uses.

Accept-Encoding

Main article: Accept-Encoding.

The client can list the kinds of content-codings it supports, for example, compressed streams.

Most clients will want to support Accept-Encoding: gzip, deflate

Accept-Language

Main article: Accept-Language.

The client can list the kinds of natural languages the user would prefer to receive.

TE

The client can list the kinds of Transfer-Encodings it allows (besides "chunked").

Authenticating

If the server requires authentication, the server may add headers to authenticate itself to the server. Additionally, the server may also perform authentication at the transport layer, for example with TLS certificates, or Unix domain sockets.

Proxy Authorization

Intermediate proxies may require their own authentication to use. Many of the HTTP headers available to authenticate to destination servers also have variants for authenticating to proxies, which will accept and remove the header before forwarding it to the destination server.

Page navigation data

Referer

The Referer header (sic, a misspelling of "Referrer" that happens to save a byte over the wire) specifies the resource where the user-agent found and followed the request-URI for the current request.

Cookie

If the client chooses to, it may send state information to the server in a Cookie header, if any has been sent from a Set-Cookie header in a previous response.

The formatting of the Cookie header is different than other HTTP headers, so it must be treated as a special exemption in code.

Be very careful when choosing to relay cookies, this makes applications susceptible to ambient authority (confused deputy) attacks if not carefully designed.

User agent information

User-Agent

The User-Agent header describes for the server the software the client is running, for logging, analytics, or other purposes.

Sometimes servers use this header to determine what behavior the client supports. in these cases, you may need to modify the header to cajole the server to do the right thing, but avoid doing this unless absolutely necessary.

From

The From header field contains an Internet email address for a human user who controls the requesting user agent.

The header is not indended for typical usage. Use it when running an automated script or bot, so server owners can contact the owner in the event of misbehavior.

Related headers

Authorization/authentication headers and cookies can also be used to identify the person making the request, see those respective sections for more information.

Conditional Request Headers

In some cases, the client may want the server to only conditionally evaluate the request, for example, modify a resource only if it is is unchanged on the server, or download the resource only if it has changed since the last download. In these cases, use conditional headers.

If-Match

If-None-Match

If-Modified-Since

If-Unmodified-Since

If-Range

Encode the payload if any

Requests can have an attached document called the request message body. The meaning of this document varies with the definition of the method and the resource saved on the server. If a document is submitted, several headers control how to read the body and how to interpret it.

The specific meaning varies with the request method:

GET
Undefined
HEAD
Undefined
POST
Read by the server, frequently used to create a new resource
PUT
The document to be saved to the server
DELETE
Not allowed
CONNECT
Undefined
OPTIONS
Undefined
TRACE
Not allowed

The request body is typically only used for the PUT and POST methods.

Content-Type

Select a media type that the server will likely understand. The Content-Type header specifies the media type of the document, and therefore what sort of data can be extracted out of it.

Content-Encoding

The client can additionally encode the document before attaching it into the message body using a content coding; for example, compressing or encrypting the document. The Content-Encoding header specifies how the server will need to decode the body in order to arrive at the final document.

Content-Language

If the document is known to have a specific language, it can be conveyed in the Content-Language header.

Content-Location

If the document was requested earlier from a particular effective request URI, it can be provided here.

Set Expect header

If the client has reason to believe the upload may be too large for the server to accept, it can indicate this with an Expect header:

PUT /somewhere/fun HTTP/1.1
Host: origin.example.com
Content-Type: video/h264
Content-Length: 1234567890987
Expect: 100-continue
␍␊

Clients that send this must attach a request-body.

Clients should send the request body after a brief period of time without a response from the server (about a second), in the event of misimplementation or network problems.

If the client receives 417 Expectation Failed instead of 100 Continue, the server (or request path) does not support 100 Continue, and the request should be retried without using the 100-continue feature (remake the request attaching the entire request-body to the request).

Lookup request from cache

If a similar request has been made, the request might be able to be served from cache.

Several headers specify if the response is cachable:

  • Age
  • Cache-Control
  • Expires
  • Pragma
  • Warning

Parsing the Response

At this point we wait for the response from the server.

Parse zero or more 1xx responses

HTTP/1.1 and above defines the 1xx status code class, which is an informational header sent before the final status code, that can encode information for the client before the final response has been generated or decided on.

Handle status code

How to handle the response is determined by the status code. In most cases, the server will respond 200 (OK), indicating the response may be used as expected. Other responses may necessitate reporting an error, retrying after some time, or changing and re-issuing the request to the server.

Compute the Caching Key for response

A caching key can be computed to determine if another future request can be served from a cache, instead of the origin server.

This is derived from an algorithm that takes into account the effective request-URI (see above), and several headers specified in the Vary header of the response.

Parse Set-Cookie header

The Set-Cookie header follows a nonstandard syntax that cannot be folded the same way as other headers can. Even if you don't intend on storing cookies, note that the Set-Cookie header requires special processing.

Store Session Data

The contents of the Set-Cookie header can be stored for subsequent requests.

Determine if there is a response-body

If there's a response-body, decode Content-Encoding

If one or more encodings have been applied to a representation, the sender that applied the encodings MUST generate a Content-Encoding header field that lists the content codings in the order in which they were applied.

Read language of request

HTTP allows the sender to identify the primary language of the entity body (request body or response body) using the Content-Language header.

Determine Content-Type

The Content-Type header determines which media type will be used to decode the entity-body.

Generally for every media type, there is a single specification that specifies how to parse and use the document.

Read response body

The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field.

The number of bytes to read depends on the request method, the response code, and headers present:

  1. If the request is a HEAD request, there is no response body. Response length information, if present, describes how long an otherwise identical GET request would have been.
  2. A CONNECT request becomes a tunnel immediately after the response headers are written, as there is no concept of a response body.
  3. If the outermost applied Transfer-Encoding (if any) is "chunked", the response body ends when the chunked parsing does.
  4. If there is any other Transfer-Encoding, the message can only be ended when the server closes the connection.
  5. If there is a single Content-Length header (or multiple identical values), then read that many bytes.
  6. If this is an HTTP/1.0 request, or if the response specifies Connection: close, read until the server closes the connection. At this step, there is no way to distinguish a connection error from the end of the document.
  7. Otherwise, raise an error that there is no reliable way to determine when the response ends.

Overview table

Specification
RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing §5.5. Effective Request URI