HTTP

= HTTP =

Introduction

 * Gopher was replaced by HTTP.
 * HTTP is used to deliver data (HTML files, image files, query results, etc.) on the World Wide Web.
 * HTTP may use UDP also

=Basics=

Source: Tutorialspoint.com

- Client(browser) initiates an HTTP request; disconnects from the server; waits for a response - Server re-establishes the connection to send a response
 * HTTP is connectionless:

- Any type of data can be sent by HTTP - Both client and server should know how to handle the data content - Required for client and server to specify the content type using MIME-type
 * HTTP is media independent:

- Server and client are aware of each other only during a current request - Afterwards both forget about each other - Neither client nor browser retain information between different requests
 * HTTP is stateless:

= Parameters =

HTTP Version
HTTP/1.0 HTTP/1.1

Uniform Resource Identifiers
The following three URIs are equivalent:

http://abc.com:80/~smith/home.html http://ABC.com/%7Esmith/home.html http://ABC.com:/%7esmith/home.html

Date/Time Formats
- All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception. - HTTP applications are allowed to use any of the following three representations of date/time stamps:

Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994       ; ANSI C's asctime format

Character Sets
- If a value is not specified, the default is the US-ASCII. - Following are the valid character sets:

US-ASCII ISO-8859-1 ISO-8859-7

Content Encodings
-Content encoding value indicates that an encoding algorithm has been used to encode the content before passing it over the network -Used to allow a document to be compressed or otherwise usefully transformed without losing the identity -All content-coding values are case-insensitive. -HTTP/1.1 uses content-coding values in the Accept-Encoding and Content-Encoding header fields

Accept-encoding: gzip Accept-encoding: compress Accept-encoding: deflate

Media Types
-HTTP uses Internet Media Types in the Content-Type and Accept header fields to provide open and extensible data typing and type negotiation. -All the Media-type values are registered with the Internet Assigned Number Authority (IANA). -The type, subtype, and parameter attribute names are case--insensitive

media-type    = type "/" subtype *( ";" parameter )

Accept: image/gif

Language Tags
-HTTP uses language tags within the Accept-Language and Content-Language fields -White spaces are not allowed within the tag and all tags are case- insensitive -Any two-letter primary-tag is an ISO-639 language abbreviation and any two-letter initial subtag is an ISO-3166 country code -A language tag is composed of one or more parts: a primary language tag and a possibly empty series of subtags:

language-tag = primary-tag *( "-" subtag )

en, en-US, en-cockney, i-cherokee, x-pig-latin

= Messages =

-HTTP is based on the client-server architecture model -A stateless request/response protocol that operates by exchanging messages across a reliable TCP/IP connection -HTTP client connects to server for sending one or more HTTP request messages -HTTP server accepts connections to serve HTTP requests by sending HTTP response messages -Uses URI to identify a given resource and to establish a connection. -HTTP messages are passed in a format similar to Internet mail and Multipurpose Internet Mail Extensions MIME

HTTP-message  =  |  ; HTTP/1.1 messages

The generic message format consists of the following four items. A Start-line Zero or more header fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body

Message Start-Line
GET /hello.htm HTTP/1.1    (This is Request-Line sent by the client) HTTP/1.1 200 OK            (This is Status-Line sent by the server)

Header Fields

 * HTTP header fields provide required information about the request or response, or about the object sent in the message body

General-header:  Used for both request and response messages Request-header:  Used only for request messages Response-header: Used only for response messages Entity-header:   Define meta information about the entity-body or, if no body is present, about the resource identified by the request
 * There are four types of HTTP message headers:


 * All headers follow the same generic format.
 * Each of the header field consists of a name followed by a colon and the field value as follows:

User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: www.example.com Accept-Language: en, mi Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00" Accept-Ranges: bytes Content-Length: 51 Vary: Accept-Encoding Content-Type: text/plain

Message Body
-Body part is optional for an HTTP message -If available, used to carry the entity-body associated with the request or response -If entity body is associated, then usually Content-Type and Content-Length headers lines specify the nature of the body associated -Body carries the actual HTTP request data (form data and uploaded, etc) and HTTP response data from the server (files, images, etc)

Hello, World!

= Requests = A Request-line Zero or more header (General|Request|Entity) fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body

Request-Line
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF.

The elements are separated by space SP characters.

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Request Method

 * The method is case-sensitive and should always be mentioned in Uppercase
 * HTTP Server must implement GET, HEAD, OPTIONS methods
 * Safe request methods are GET, HEAD, OPTIONS, TRACE
 * The following table lists all the supported methods in HTTP/1.1

GET:    Retrieve Data HEAD:   Header only without Response Body POST:   Submits Data to DB, web forum, etc PUT:    Replaces target resource with the uploaded content DELETE: Removes target resource given by URI CONNECT: Used when the client wants to establish a transparent connection to a remote host, usually to facilitate SSL-encrypted communication (HTTPS) through an HTTP proxy OPTIONS: Returns the HTTP methods that the server supports for the specified URL TRACE:  Performs a message loop back test to see what (if any) changes or additions have been made by intermediate servers PATCH:

Request-URI
Identifies the resource upon which to apply the request.

Request-URI = "*" | absoluteURI | abs_path | authority

1. Asterisk * is used when: -HTTP request does not apply to a particular resource, but to the server itself -Only allowed when the method used does not apply to a resource

OPTIONS * HTTP/1.1

2. absoluteURI is used when an HTTP request is being made to a proxy

The proxy is requested to forward the request or service from a valid cache, and return the response

GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1

3. Absolute path is most common form of Request-URI is that used to identify a resource on an origin server or gateway

-Cannot be empty -If none is present in the original URI,"/" (server root) must be used

A client retrieving a resource directly from origin server would create a TCP connection to port 80 of the host "www.w3.org" and send the following lines:

GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org

Request Header Fields
Allow the client to pass additional information about the request, and about the client itself, to the server These fields act as request modifiers.

Accept-Charset Accept-Encoding Accept-Language Authorization Expect From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Proxy-Authorization Range Referer TE User-Agent

Examples
-Not sending any request data to the server because we are fetching a plain HTML page from the server -Connection is a general-header -Rest of the headers are request headers
 * An HTTP request to fetch hello.htm

GET /hello.htm HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive

-Given URL /cgi-bin/process.cgi will be used to process the passed data and accordingly, a response will be returned -Content-type tells the server that the passed data is a simple web form data and length will be the actual length of the data put in the message body POST /cgi-bin/process.cgi HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Content-Type: application/x-www-form-urlencoded Content-Length: length Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive
 * How to send form data to the server using request message body:

licenseID=string&content=string&/paramsXML=string

POST /cgi-bin/process.cgi HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Content-Type: text/xml; charset=utf-8 Content-Length: length Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive
 * The following example shows how you can pass plain XML to your web server:

 string

= Responses = A Status-line Zero or more header (General|Response|Entity) fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body

Message Status-Line
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

HTTP Version
HTTP-Version = HTTP/1.1

Status Code

 * A 3-digit integer
 * First digit of the Status-Code defines the class of response
 * Last two digits do not have any categorization role

There are 5 values for the first digit:

Status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on. Status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url.
 * 301 vs 302 Redirect?


 * 401 vs 403 Error?

Response Header Fields
Allow the server to pass additional information about the response which cannot be placed in the Status- Line Accept-Ranges Age ETag Location Proxy-Authenticate Retry-After Server Vary WWW-Authenticate

Examples
HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache/2.2.14 (Win32) Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT Content-Length: 88 Content-Type: text/html Connection: Closed Hello, World!
 * An HTTP response for a request to fetch the hello.htm

HTTP/1.1 404 Not Found Date: Sun, 18 Oct 2012 10:36:20 GMT Server: Apache/2.2.14 (Win32) Content-Length: 230 Connection: Closed Content-Type: text/html; charset=iso-8859-1
 * The following example shows an HTTP response message when the web server could not find the requested page:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 404 Not Found Not Found The requested URL /t.html was not found on this server.

HTTP/1.1 400 Bad Request Date: Sun, 18 Oct 2012 10:36:20 GMT Server: Apache/2.2.14 (Win32) Content-Length: 230 Content-Type: text/html; charset=iso-8859-1 Connection: Closed <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 400 Bad Request Bad Request Your browser sent a request that this server could not understand. The request line contained invalid characters following the protocol string.
 * Following is an example of HTTP response message showing error condition when the web server encountered a wrong HTTP version in the given HTTP request:

= Header Fields =

General-header: have general applicability for both request and response messages Client Request-header: have applicability only for request messages Server Response-header: have applicability only for response messages Entity-header: define meta information about the entity-body or, if no body is present, about the resource identified by the request

Cache-Control
Used to specify directives that MUST be obeyed by all the caching system Cache-Control : cache-request-directive | cache-response-directive Cache-control: no-cache

Connection
Connection : "Connection"
 * Allows the sender to specify options that are desired for that particular connection and must not be communicated by proxies over further connections

Connection: close
 * HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response

Connection: keep-alive
 * By default, HTTP 1.1 uses persistent connections, where the connection does not automatically close after a transaction.
 * HTTP 1.0 does not have persistent connections by default.
 * If a 1.0 client wishes to use persistent connections, it uses the keep-alive parameter:

Transfer-Encoding
Transfer-Encoding: chunked
 * Indicates what type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient
 * This is not the same as content-encoding because transfer-encodings are a property of the message, not of the entity-body
 * All transfer-coding values are case-insensitive

Upgrade
Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/x11
 * Allows the client to specify what additional communication protocols it supports and would like to use if the server finds it appropriate to switch protocols


 * The Upgrade header field is intended to provide a simple mechanism for transition from HTTP/1.1 to some other, incompatible protocol.

Via
Via: 1.0 test, 1.1 alex.com (Apache/1.1)
 * Used by gateways and proxies to indicate the intermediate protocols and recipients
 * Intended to provide a simple mechanism for transition from HTTP/1.1 to some other, incompatible protocol
 * A request message could be sent from an HTTP/1.0 user agent to an internal proxy code-named "test", which uses HTTP/1.1 to forward the request to a public proxy at alex.com, which completes the request by forwarding it to the origin server at www.bob.in
 * The request received by www.bob.in would then have the following Via header field:

Accept
Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c
 * Used to specify certain media types which are acceptable for the response
 * Multiple media types can be listed separated by commas and the optional qvalue represents an acceptable quality level for accept types on a scale of 0 to 1


 * Here text/html and text/x-c and are the preferred media types, but if they do not exist, then send the text/x-dvi entity, and if that does not exist, send the text/plain entity.

Cookie
Cookie: name=value
 * Contains a name/value pair of information stored for that URL

Cookie: name1=value1;name2=value2;name3=value3
 * Multiple cookies can be specified separated by semicolons as follows:

If-Match
If-Match : entity-tag
 * Used with a method to make it conditional
 * This header requests the server to perform the requested method only if the given value in this tag matches the given entity tags represented by ETag


 * An asterisk (*) matches any entity, and the transaction continues only if the entity exists

If-Match: "xyzzy" If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-Match: *


 * If none of the entity tags match, or if "*" is given and no current entity exists, the server must not perform the requested method, and must return a 412 (Precondition Failed) response.

If-Modified-Since
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
 * Used with a method to make it conditional
 * If the requested URL has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.


 * If none of the entity tags match, or if "*" is given and no current entity exists, the server must not perform the requested method, and must return a 412 (Precondition Failed) response.

If-None-Match

 * Used with a method to make it conditional
 * This header requests the server to perform the requested method only if one of the given value in this tag matches the given entity tags represented by ETag

If-None-Match : entity-tag


 * An asterisk (*) matches any entity, and the transaction continues only if the entity does not exist

If-None-Match: "xyzzy" If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-None-Match: *

If-Range
If-Range: Sat, 29 Oct 1994 19:43:31 GMT
 * Used with a conditional GET to request only the portion of the entity that is missing, if it has not been changed, and the entire entity if it has been changed
 * Either an entity tag or a date can be used to identify the partial entity already received


 * Here if the document has not been modified since the given date, the server returns the byte range given by the Range header, otherwise it returns all of the new document.

If-Unmodified-Since
If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT
 * If the requested resource has not been modified since the time specified in this field, the server should perform the requested operation as if the If-Unmodified-Since header were not present.


 * If the request results in anything other than a 2xx or 412 status, the If-Unmodified-Since header should be ignored.

Referer
Referer : absoluteURI | relativeURI
 * Allows the client to specify the address (URI) of the resource from which the URL has been requested.

Referer: http://www.tutorialspoint.org/http/index.htm


 * If the field value is a relative URI, it should be interpreted relative to the Request-URI.

TE
TE  : t-codings
 * Indicates what extension transfer-coding it is willing to accept in the response and whether or not it is willing to accept trailer fields in a chunked transfer-coding.

TE: deflate TE: TE: trailers, deflate;q=0.5
 * The presence of the keyword "trailers" indicates that the client is willing to accept trailer fields in a chunked transfer-coding and it is specified either of the ways:


 * If the TE field-value is empty or if no TE field is present, then only transfer-coding is chunked. A message with no transfer-coding is always acceptable.

Age
Age: 1030
 * The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server; representing time in seconds


 * An HTTP/1.1 server that includes a cache must include an Age header field in every response generated from its own cache.

ETag
ETag : entity-tag
 * Provides the current value of the entity tag for the requested variant

ETag: "xyzzy" ETag: W/"xyzzy" ETag: ""

Proxy-Authenticate
Proxy-Authenticate : challenge
 * The Proxy-Authenticate response-header field must be included as a part of a 407 (Proxy Authentication Required) response

Set-Cookie
Set-Cookie: NAME=VALUE; OPTIONS
 * Contains a name/value pair of information to retain for this URL.


 * Set-Cookie response header comprises the token Set-Cookie, followed by a comma-separated list of one or more cookies.


 * 1) Comment=comment: Can be used to specify any comment associated with the cookie.
 * 2) Domain=domain: Specifies the domain for which the cookie is valid.
 * 3) Expires=Date-time: The date the cookie will expire. If it is blank, the cookie will expire when the visitor quits the browser.
 * 4) Path=path: Specifies the subset of URLs to which this cookie applies.
 * 5) Secure: Instructs the user agent to return the cookie only under a secure connection.


 * Example of a simple cookie header generated by the server:

Set-Cookie: name1=value1,name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT

Allow
Allow: GET, HEAD, PUT
 * Lists the set of methods supported by the resource identified by the Request-URI


 * This field cannot prevent a client from trying other methods

Content-Type

 * Indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent, had the request been a GET.

Content-Type: text/html; charset=ISO-8859-4

Expires
Expires: Thu, 01 Dec 1994 16:00:00 GMT
 * Gives the date/time after which the response is considered stale

Last-Modified
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
 * Indicates the date and time at which the origin server believes the variant was last modified

= Caching =

Cache-control: no-cache
 * HTTP is typically used for distributed information systems, where performance can be improved by the use of response caches.
 * The HTTP/1.1 protocol includes a number of elements intended to make caching work.
 * The goal of caching is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases.
 * The basic cache mechanisms in HTTP/1.1 are implicit directives to caches where server-specifies expiration times and validators.
 * We use the Cache-Control header for this purpose.
 * The Cache-Control header allows a client or server to transmit a variety of directives in either requests or responses.
 * These directives typically override the default caching algorithms.
 * The caching directives are specified in a comma-separated list.

Request Directives

 * 1) no-cache: A cache must not use the response to satisfy a subsequent request without successful revalidation with the origin server.
 * 2) no-store: The cache should not store anything about the client request or server response.
 * 3) max-age = seconds: Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.
 * 4) max-stale [ = seconds ]: Indicates that the client is willing to accept a response that has exceeded its expiration time. If seconds are given, it must not be expired by more than that time.
 * 5) min-fresh = seconds: Indicates that the client is willing to accept a response whose freshness lifetime is not less than its current age plus the specified time in seconds.
 * 6) no-transform: Does not convert the entity-body.
 * 7) only-if-cached: Does not retrieve new data. The cache can send a document only if it is in the cache, and should not contact the origin-server to see if a newer copy exists.

Response Directives

 * 1) public: Indicates that the response may be cached by any cache.
 * 2) private: Indicates that all or part of the response message is intended for a single user and must not be cached by a shared cache.
 * 3) no-cache: A cache must not use the response to satisfy a subsequent request without successful re-validation with the origin server.
 * 4) no-store: The cache should not store anything about the client request or server response.
 * 5) no-transform: Does not convert the entity-body.
 * 6) must-revalidate: The cache must verify the status of stale documents before using it and expired ones should not be used.
 * 7) proxy-revalidate: The proxy-revalidate directive has the same meaning as the must- revalidate directive, except that it does not apply to non-shared user agent caches.
 * 8) max-age = seconds: Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.
 * 9) s-maxage = seconds: The maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. The s-maxage directive is always ignored by a private cache.

=URL Encoding=


 * HTTP URLs can only be sent over the Internet using the ASCII character-set
 * URLs often contain characters outside the ASCII set
 * These unsafe characters must be replaced with a % followed by two hexadecimal digits.

=Security=

DNS Spoofing

 * Clients using HTTP rely heavily on the Domain Name Service, and are thus generally prone to security attacks based on the deliberate mis-association of IP addresses and DNS names.
 * So clients need to be cautious in assuming the continuing validity of an IP number/DNS name association.


 * If HTTP clients cache the results of host name lookups in order to achieve a performance improvement, they must observe the TTL information reported by the DNS.
 * If HTTP clients do not observe this rule, they could be spoofed when a previously-accessed server's IP address changes.

Proxies and Caching

 * HTTP proxies are men-in-the-middle, and represent an opportunity for man-in-the-middle attacks.
 * Proxies have access to security-related information, personal information about individual users and organizations, and proprietary information belonging to users and content providers.
 * Proxy operators should protect the systems on which proxies run, as they would protect any system that contains or transports sensitive information.
 * Caching proxies provide additional potential vulnerabilities, since the contents of the cache represent an attractive target for malicious exploitation.
 * Therefore, cache contents should be protected as sensitive information.

= HTTP1.0 vs HTTP1.1 =


 * HTTP/1.0:


 * HTTP/1.0 uses a new connection for each request/response exchange
 * HTTP/0.9 or 1.0 closed connections after every request.
 * HTTP 1.0 supports GET, POST, HEAD request methods


 * HTTP/1.1:


 * HTTP/1.1 connection may be used for one or more request/response exchanges
 * HTTP/1.1 uses persistent connections hence save bandwidth & reduces latency as it does not require to do TCP Handshake again for every file download (like images, css, etc.)
 * HTTP Pipeline is a feature in v1.1 in which client sends multiple requests before waiting for each response.
 * HTTP 1.1 supports OPTIONS, PUT, DELETE, TRACE, CONNECT request methods

Pipelining

 * HTTP pipelining is a technique in which multiple HTTP requests are sent on a single TCP connection without waiting for the corresponding responses.
 * The pipelining of requests results in a dramatic improvement in the loading times of HTML pages, especially over high latency connections

Persistent Connections
Connection: keep-alive
 * HTTP persistent connection is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.
 * Under HTTP 1.0, if the client supports keep-alive, it adds an additional header to the request:
 * Following this, the connection is not dropped, but is instead kept open.
 * When the client sends another request, it uses the same connection.
 * This will continue until either the client or the server decides that the conversation is over, and one of them drops the connection.
 * In HTTP 1.1, all connections are considered persistent unless declared otherwise.

= Cookie =


 * HTTP cookie is also called Web cookie, Internet cookie, Browser cookie, or Simply cookie.
 * Small piece of data sent from a website and stored on the user's computer while the user is browsing.
 * Helps websites to remember stateful information - items in shopping cart, click of buttons, logging in, pages visited.
 * Or previously entered form fields such as names, addresses, passwords, and credit card numbers.

Authentication cookies used by web servers to know whether the user is logged in or not which account they are logged in with. Tells the site whether to send a page containing sensitive information, or require the user to authenticate themselves by logging in. Session cookie         In-memory cookie, Transient cookie or Non-persistent cookie Exists only in temporary memory while the user navigates the website. Web browsers normally delete session cookies when the user closes the browser. These do not have an expiration date, which is how the browser knows to treat them as session cookies. Persistent cookie      These expires at a specific date or after a specific length of time. Its info is sent to server every time the user visits the website. Referred to as tracking cookies, used by advertisers to record information about a user's web browsing habits. Also used for "legitimate" reasons - keeping users logged into their accounts, avoid re-entering login credentials. Secure cookie Http-only cookie Same-site cookie Third-party cookie Supercookie Zombie cookie


 * References: