HTTP
HTTP
Headers
This section is under construction. |
Introduction
- Gopher was replaced by HTTP.
- HTTP is used to deliver data (HTML files, image files, query results, etc.) on the World Wide Web.
- HTTP may use UDP also
Basics
Source: Tutorialspoint.com
- HTTP is connectionless:
- Client(browser) initiates an HTTP request; disconnects from the server; waits for a response - Server re-establishes the connection to send a response
- HTTP is media independent:
- Any type of data can be sent by HTTP - Both client and server should know how to handle the data content - Required for client and server to specify the content type using MIME-type
- HTTP is stateless:
- Server and client are aware of each other only during a current request - Afterwards both forget about each other - Neither client nor browser retain information between different requests
Parameters
HTTP Version
HTTP/1.0 HTTP/1.1
Uniform Resource Identifiers
The following three URIs are equivalent:
http://abc.com:80/~smith/home.html http://ABC.com/%7Esmith/home.html http://ABC.com:/%7esmith/home.html
Date/Time Formats
- All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception. - HTTP applications are allowed to use any of the following three representations of date/time stamps:
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
Character Sets
- If a value is not specified, the default is the US-ASCII. - Following are the valid character sets:
US-ASCII ISO-8859-1 ISO-8859-7
Content Encodings
-Content encoding value indicates that an encoding algorithm has been used to encode the content before passing it over the network -Used to allow a document to be compressed or otherwise usefully transformed without losing the identity -All content-coding values are case-insensitive. -HTTP/1.1 uses content-coding values in the Accept-Encoding and Content-Encoding header fields
Accept-encoding: gzip Accept-encoding: compress Accept-encoding: deflate
Media Types
-HTTP uses Internet Media Types in the Content-Type and Accept header fields to provide open and extensible data typing and type negotiation. -All the Media-type values are registered with the Internet Assigned Number Authority (IANA). -The type, subtype, and parameter attribute names are case--insensitive
media-type = type "/" subtype *( ";" parameter )
Accept: image/gif
Language Tags
-HTTP uses language tags within the Accept-Language and Content-Language fields -White spaces are not allowed within the tag and all tags are case- insensitive -Any two-letter primary-tag is an ISO-639 language abbreviation and any two-letter initial subtag is an ISO-3166 country code -A language tag is composed of one or more parts: a primary language tag and a possibly empty series of subtags:
language-tag = primary-tag *( "-" subtag )
en, en-US, en-cockney, i-cherokee, x-pig-latin
Messages
-HTTP is based on the client-server architecture model -A stateless request/response protocol that operates by exchanging messages across a reliable TCP/IP connection -HTTP client connects to server for sending one or more HTTP request messages -HTTP server accepts connections to serve HTTP requests by sending HTTP response messages -Uses URI to identify a given resource and to establish a connection. -HTTP messages are passed in a format similar to Internet mail and Multipurpose Internet Mail Extensions MIME
HTTP-message = <Request> | <Response> ; HTTP/1.1 messages
The generic message format consists of the following four items.
A Start-line Zero or more header fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body
Message Start-Line
GET /hello.htm HTTP/1.1 (This is Request-Line sent by the client) HTTP/1.1 200 OK (This is Status-Line sent by the server)
Header Fields
- HTTP header fields provide required information about the request or response, or about the object sent in the message body
- There are four types of HTTP message headers:
General-header: Used for both request and response messages Request-header: Used only for request messages Response-header: Used only for response messages Entity-header: Define meta information about the entity-body or, if no body is present, about the resource identified by the request
- All headers follow the same generic format.
- Each of the header field consists of a name followed by a colon (:) and the field value as follows:
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: www.example.com Accept-Language: en, mi Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00" Accept-Ranges: bytes Content-Length: 51 Vary: Accept-Encoding Content-Type: text/plain
Message Body
-Body part is optional for an HTTP message -If available, used to carry the entity-body associated with the request or response -If entity body is associated, then usually Content-Type and Content-Length headers lines specify the nature of the body associated -Body carries the actual HTTP request data (form data and uploaded, etc) and HTTP response data from the server (files, images, etc)
<html> <body> <h1>Hello, World!</h1> </body> </html>
Requests
A Request-line Zero or more header (General|Request|Entity) fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body
Request-Line
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF.
The elements are separated by space SP characters.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Request Method
- The method is case-sensitive and should always be mentioned in Uppercase
- HTTP Server must implement GET, HEAD, OPTIONS methods
- Safe request methods are GET, HEAD, OPTIONS, TRACE
- The following table lists all the supported methods in HTTP/1.1
GET: Retrieve Data HEAD: Header only without Response Body POST: Submits Data to DB, web forum, etc PUT: Replaces target resource with the uploaded content DELETE: Removes target resource given by URI CONNECT: Used when the client wants to establish a transparent connection to a remote host, usually to facilitate SSL-encrypted communication (HTTPS) through an HTTP proxy OPTIONS: Returns the HTTP methods that the server supports for the specified URL TRACE: Performs a message loop back test to see what (if any) changes or additions have been made by intermediate servers
Request-URI
Identifies the resource upon which to apply the request.
Request-URI = "*" | absoluteURI | abs_path | authority
1. Asterisk * is used when:
-HTTP request does not apply to a particular resource, but to the server itself -Only allowed when the method used does not apply to a resource
OPTIONS * HTTP/1.1
2. absoluteURI is used when an HTTP request is being made to a proxy
The proxy is requested to forward the request or service from a valid cache, and return the response
GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1
3. Absolute path is most common form of Request-URI is that used to identify a resource on an origin server or gateway
-Cannot be empty -If none is present in the original URI,"/" (server root) must be used
A client retrieving a resource directly from origin server would create a TCP connection to port 80 of the host "www.w3.org" and send the following lines:
GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org
Request Header Fields
Allow the client to pass additional information about the request, and about the client itself, to the server These fields act as request modifiers.
Accept-Charset Accept-Encoding Accept-Language Authorization Expect From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Proxy-Authorization Range Referer TE User-Agent
Examples
- An HTTP request to fetch hello.htm
-Not sending any request data to the server because we are fetching a plain HTML page from the server -Connection is a general-header -Rest of the headers are request headers
GET /hello.htm HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive
- How to send form data to the server using request message body:
-Given URL /cgi-bin/process.cgi will be used to process the passed data and accordingly, a response will be returned -Content-type tells the server that the passed data is a simple web form data and length will be the actual length of the data put in the message body
POST /cgi-bin/process.cgi HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Content-Type: application/x-www-form-urlencoded Content-Length: length Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive licenseID=string&content=string&/paramsXML=string
- The following example shows how you can pass plain XML to your web server:
POST /cgi-bin/process.cgi HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.tutorialspoint.com Content-Type: text/xml; charset=utf-8 Content-Length: length Accept-Language: en-us Accept-Encoding: gzip, deflate Connection: Keep-Alive <?xml version="1.0" encoding="utf-8"?> <string xmlns="http://clearforest.com/">string</string>
Responses
A Status-line Zero or more header (General|Response|Entity) fields followed by CRLF An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields Optionally a message-body
Message Status-Line
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
HTTP Version
HTTP-Version = HTTP/1.1
Status Code
- A 3-digit integer
- First digit of the Status-Code defines the class of response
- Last two digits do not have any categorization role
There are 5 values for the first digit:
Category | Type | Code |
---|---|---|
1XX | Informational | 100 = Continue |
2XX | Successful | 200 = OK 201 = Created (URL) 202 = Accepted (request accepted but not acted upon immediately) 203 = Non-authoritative Information(info in header is from local or third-party copy, not from original server) 204 = No Content (in body) |
3XX | Re-directional | 301 = Moved Permanently 302 = Found (temporary redirect)[1] 304 = Not Modified 305 = Use Proxy (URL must be accessed through the proxy mentioned in the Location header) 307 = Temporary Redirect (requested page has moved temporarily to a new url) |
4XX | Client Error | 400 = Bad Request 401 = Unauthorized[1] 402 = Payment Required 403 = Forbidden[1] 404 = Not Found 405 = Method Not Allowed |
5XX | Server Error | 500 = Internal Server Error 501 = Not Implememted 502 = Bad Gateway or Proxy 503 = Service Unavailable 504 = Gateway or Proxy Timeout 505 = HTTP Version Not Supported |
- 301 vs 302 Redirect?
Status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on. Status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url.
This section is under construction. |
- 401 vs 403 Error?
This section is under construction. |
Response Header Fields
Allow the server to pass additional information about the response which cannot be placed in the Status- Line
Accept-Ranges Age ETag Location Proxy-Authenticate Retry-After Server Vary WWW-Authenticate
Examples
- An HTTP response for a request to fetch the hello.htm
HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache/2.2.14 (Win32) Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT Content-Length: 88 Content-Type: text/html Connection: Closed
<html> <body> <h1>Hello, World!</h1> </body> </html>
- The following example shows an HTTP response message when the web server could not find the requested page:
HTTP/1.1 404 Not Found Date: Sun, 18 Oct 2012 10:36:20 GMT Server: Apache/2.2.14 (Win32) Content-Length: 230 Connection: Closed Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head> <title>404 Not Found</title> </head> <body> <h1>Not Found</h1> <p>The requested URL /t.html was not found on this server.</p> </body> </html>
- Following is an example of HTTP response message showing error condition when the web server encountered a wrong HTTP version in the given HTTP request:
HTTP/1.1 400 Bad Request Date: Sun, 18 Oct 2012 10:36:20 GMT Server: Apache/2.2.14 (Win32) Content-Length: 230 Content-Type: text/html; charset=iso-8859-1 Connection: Closed
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head> <title>400 Bad Request</title> </head> <body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.</p> <p>The request line contained invalid characters following the protocol string.</p> </body> </html>
Header Fields
General-header: have general applicability for both request and response messages Client Request-header: have applicability only for request messages Server Response-header: have applicability only for response messages Entity-header: define meta information about the entity-body or, if no body is present, about the resource identified by the request
General Headers
Cache-Control
Used to specify directives that MUST be obeyed by all the caching system
Cache-Control : cache-request-directive | cache-response-directive Cache-control: no-cache
Connection
- Allows the sender to specify options that are desired for that particular connection and must not be communicated by proxies over further connections
Connection : "Connection"
- HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response
Connection: close
- By default, HTTP 1.1 uses persistent connections, where the connection does not automatically close after a transaction.
- HTTP 1.0 does not have persistent connections by default.
- If a 1.0 client wishes to use persistent connections, it uses the keep-alive parameter:
Connection: keep-alive
Transfer-Encoding
- Indicates what type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient
- This is not the same as content-encoding because transfer-encodings are a property of the message, not of the entity-body
- All transfer-coding values are case-insensitive
Transfer-Encoding: chunked
Upgrade
- Allows the client to specify what additional communication protocols it supports and would like to use if the server finds it appropriate to switch protocols
Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/x11
- The Upgrade header field is intended to provide a simple mechanism for transition from HTTP/1.1 to some other, incompatible protocol.
Via
- Used by gateways and proxies to indicate the intermediate protocols and recipients
- Intended to provide a simple mechanism for transition from HTTP/1.1 to some other, incompatible protocol
- A request message could be sent from an HTTP/1.0 user agent to an internal proxy code-named "test", which uses HTTP/1.1 to forward the request to a public proxy at alex.com, which completes the request by forwarding it to the origin server at www.bob.in
- The request received by www.bob.in would then have the following Via header field:
Via: 1.0 test, 1.1 alex.com (Apache/1.1)
Client Request Headers
Accept
- Used to specify certain media types which are acceptable for the response
- Multiple media types can be listed separated by commas and the optional qvalue represents an acceptable quality level for accept types on a scale of 0 to 1
Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c
- Here text/html and text/x-c and are the preferred media types, but if they do not exist, then send the text/x-dvi entity, and if that does not exist, send the text/plain entity.
Cookie
- Contains a name/value pair of information stored for that URL
Cookie: name=value
- Multiple cookies can be specified separated by semicolons as follows:
Cookie: name1=value1;name2=value2;name3=value3
If-Match
- Used with a method to make it conditional
- This header requests the server to perform the requested method only if the given value in this tag matches the given entity tags represented by ETag
If-Match : entity-tag
- An asterisk (*) matches any entity, and the transaction continues only if the entity exists
If-Match: "xyzzy" If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-Match: *
- If none of the entity tags match, or if "*" is given and no current entity exists, the server must not perform the requested method, and must return a 412 (Precondition Failed) response.
If-Modified-Since
- Used with a method to make it conditional
- If the requested URL has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
- If none of the entity tags match, or if "*" is given and no current entity exists, the server must not perform the requested method, and must return a 412 (Precondition Failed) response.
If-None-Match
- Used with a method to make it conditional
- This header requests the server to perform the requested method only if one of the given value in this tag matches the given entity tags represented by ETag
If-None-Match : entity-tag
- An asterisk (*) matches any entity, and the transaction continues only if the entity does not exist
If-None-Match: "xyzzy" If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-None-Match: *
If-Range
- Used with a conditional GET to request only the portion of the entity that is missing, if it has not been changed, and the entire entity if it has been changed
- Either an entity tag or a date can be used to identify the partial entity already received
If-Range: Sat, 29 Oct 1994 19:43:31 GMT
- Here if the document has not been modified since the given date, the server returns the byte range given by the Range header, otherwise it returns all of the new document.
If-Unmodified-Since
- If the requested resource has not been modified since the time specified in this field, the server should perform the requested operation as if the If-Unmodified-Since header were not present.
If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT
- If the request results in anything other than a 2xx or 412 status, the If-Unmodified-Since header should be ignored.
Referer
- Allows the client to specify the address (URI) of the resource from which the URL has been requested.
Referer : absoluteURI | relativeURI
Referer: http://www.tutorialspoint.org/http/index.htm
- If the field value is a relative URI, it should be interpreted relative to the Request-URI.
TE
- Indicates what extension transfer-coding it is willing to accept in the response and whether or not it is willing to accept trailer fields in a chunked transfer-coding.
TE : t-codings
- The presence of the keyword "trailers" indicates that the client is willing to accept trailer fields in a chunked transfer-coding and it is specified either of the ways:
TE: deflate TE: TE: trailers, deflate;q=0.5
- If the TE field-value is empty or if no TE field is present, then only transfer-coding is chunked. A message with no transfer-coding is always acceptable.
Server Response Headers
Age
- The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server; representing time in seconds
Age: 1030
- An HTTP/1.1 server that includes a cache must include an Age header field in every response generated from its own cache.
ETag
- Provides the current value of the entity tag for the requested variant
ETag : entity-tag
ETag: "xyzzy" ETag: W/"xyzzy" ETag: ""
Proxy-Authenticate
- The Proxy-Authenticate response-header field must be included as a part of a 407 (Proxy Authentication Required) response
Proxy-Authenticate : challenge
Set-Cookie
- Contains a name/value pair of information to retain for this URL.
Set-Cookie: NAME=VALUE; OPTIONS
- Set-Cookie response header comprises the token Set-Cookie, followed by a comma-separated list of one or more cookies.
- Comment=comment: Can be used to specify any comment associated with the cookie.
- Domain=domain: Specifies the domain for which the cookie is valid.
- Expires=Date-time: The date the cookie will expire. If it is blank, the cookie will expire when the visitor quits the browser.
- Path=path: Specifies the subset of URLs to which this cookie applies.
- Secure: Instructs the user agent to return the cookie only under a secure connection.
- Example of a simple cookie header generated by the server:
Set-Cookie: name1=value1,name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT
Entity Headers
Allow
- Lists the set of methods supported by the resource identified by the Request-URI
Allow: GET, HEAD, PUT
- This field cannot prevent a client from trying other methods
Content-Type
- Indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent, had the request been a GET.
Content-Type: text/html; charset=ISO-8859-4
Expires
- Gives the date/time after which the response is considered stale
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Last-Modified
- Indicates the date and time at which the origin server believes the variant was last modified
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
Caching
- HTTP is typically used for distributed information systems, where performance can be improved by the use of response caches.
- The HTTP/1.1 protocol includes a number of elements intended to make caching work.
- The goal of caching is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases.
- The basic cache mechanisms in HTTP/1.1 are implicit directives to caches where server-specifies expiration times and validators.
- We use the Cache-Control header for this purpose.
- The Cache-Control header allows a client or server to transmit a variety of directives in either requests or responses.
- These directives typically override the default caching algorithms.
- The caching directives are specified in a comma-separated list.
Cache-control: no-cache
Request Directives
- no-cache: A cache must not use the response to satisfy a subsequent request without successful revalidation with the origin server.
- no-store: The cache should not store anything about the client request or server response.
- max-age = seconds: Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.
- max-stale [ = seconds ]: Indicates that the client is willing to accept a response that has exceeded its expiration time. If seconds are given, it must not be expired by more than that time.
- min-fresh = seconds: Indicates that the client is willing to accept a response whose freshness lifetime is not less than its current age plus the specified time in seconds.
- no-transform: Does not convert the entity-body.
- only-if-cached: Does not retrieve new data. The cache can send a document only if it is in the cache, and should not contact the origin-server to see if a newer copy exists.
Response Directives
- public: Indicates that the response may be cached by any cache.
- private: Indicates that all or part of the response message is intended for a single user and must not be cached by a shared cache.
- no-cache: A cache must not use the response to satisfy a subsequent request without successful re-validation with the origin server.
- no-store: The cache should not store anything about the client request or server response.
- no-transform: Does not convert the entity-body.
- must-revalidate: The cache must verify the status of stale documents before using it and expired ones should not be used.
- proxy-revalidate: The proxy-revalidate directive has the same meaning as the must- revalidate directive, except that it does not apply to non-shared user agent caches.
- max-age = seconds: Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.
- s-maxage = seconds: The maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. The s-maxage directive is always ignored by a private cache.
URL Encoding
- HTTP URLs can only be sent over the Internet using the ASCII character-set
- URLs often contain characters outside the ASCII set
- These unsafe characters must be replaced with a % followed by two hexadecimal digits.
Security
DNS Spoofing
- Clients using HTTP rely heavily on the Domain Name Service, and are thus generally prone to security attacks based on the deliberate mis-association of IP addresses and DNS names.
- So clients need to be cautious in assuming the continuing validity of an IP number/DNS name association.
- If HTTP clients cache the results of host name lookups in order to achieve a performance improvement, they must observe the TTL information reported by the DNS.
- If HTTP clients do not observe this rule, they could be spoofed when a previously-accessed server's IP address changes.
Proxies and Caching
- HTTP proxies are men-in-the-middle, and represent an opportunity for man-in-the-middle attacks.
- Proxies have access to security-related information, personal information about individual users and organizations, and proprietary information belonging to users and content providers.
- Proxy operators should protect the systems on which proxies run, as they would protect any system that contains or transports sensitive information.
- Caching proxies provide additional potential vulnerabilities, since the contents of the cache represent an attractive target for malicious exploitation.
- Therefore, cache contents should be protected as sensitive information.
HTTP1.0 vs HTTP1.1
- HTTP/1.0
- HTTP/1.0 uses a new connection for each request/response exchange
- HTTP/0.9 or 1.0 closed connections after every request.
- HTTP 1.0 supports GET, POST, HEAD request methods
- HTTP/1.1
- HTTP/1.1 connection may be used for one or more request/response exchanges
- HTTP/1.1 uses persistent connections hence save bandwidth & reduces latency as it does not require to do TCP Handshake again for every file download (like images, css, etc.)
- HTTP Pipeline is a feature in v1.1 in which client sends multiple requests before waiting for each response.
- HTTP 1.1 supports OPTIONS, PUT, DELETE, TRACE, CONNECT request methods
Pipelining
- HTTP pipelining is a technique in which multiple HTTP requests are sent on a single TCP connection without waiting for the corresponding responses.
- The pipelining of requests results in a dramatic improvement in the loading times of HTML pages, especially over high latency connections
Persistent Connections
- HTTP persistent connection is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.
- Under HTTP 1.0, if the client supports keep-alive, it adds an additional header to the request:
Connection: keep-alive
- Following this, the connection is not dropped, but is instead kept open.
- When the client sends another request, it uses the same connection.
- This will continue until either the client or the server decides that the conversation is over, and one of them drops the connection.
- In HTTP 1.1, all connections are considered persistent unless declared otherwise.
- References
{{#widget:DISQUS |id=networkm |uniqid=HTTP |url=https://aman.awiki.org/wiki/HTTP }}