HTTP cookie
An
HTTP cookie, or a
Web cookie, is a parcel of text sent by a
server to a
web browser and then sent back unchanged by the browser each time it accesses that server.
HTTP cookies are used for
authenticating, tracking, and maintaining specific information about users, such as site preferences and the contents of their
electronic shopping carts. The term "cookie" is derived from "
magic cookie," a well-known concept in computing which inspired both the idea and the name of HTTP cookies.
Cookies have been of concern for
Internet privacy, since they can be used for tracking browsing behavior. As a result, they have been subject to legislation in various countries such as the
United States and in the
European Union. Cookies have also been criticised because the identification of users they provide is not always accurate and because they could potentially be used for network attacks. Some alternatives to cookies exist, but each has its own drawbacks.
Cookies are also subject to a number of misconceptions, mostly based on the erroneous notion that they are
computer programs. In fact, cookies are simple pieces of data unable to perform any operation by themselves. In particular, they are neither
spyware nor
viruses, despite the detection of cookies from certain sites by many anti-spyware products.
Most modern browsers allow users to decide whether to accept cookies, but rejection makes some
websites unusable. For example, shopping baskets implemented using cookies do not work if cookies are rejected.
Cookies are used by Web servers to differentiate users and to operate in a way that depends on the user. Cookies were invented for realising a virtual
shopping basket: this is a virtual device in which the user can "place" items to purchase, so that users can navigate a site where items are shown, adding or removing items from the shopping basket at any time. Cookies allow for the content of the shopping cart to depend on the user's actions.
Allowing users to log in to a website is another use of cookies. Users typically log in by inserting their credentials into a login page; cookies allow the server to know that the user is already authenticated, and therefore is allowed to access services or perform operations that are restricted to logged-in users.
Several websites also use cookies for
personalization based on users' preferences. Sites that require authentication often use this feature, although it is also present on sites not requiring authentication. Personalization includes presentation and functionality. For example, the
Wikipedia Web site allows authenticated users to choose the webpage
skin they like best; the
Google search engine allows users (even non-registered ones) to decide how many search results per page they want to see.
Cookies are also used to track users across a website. Third-party cookies and
Web bugs, explained below, also allow for tracking across multiple sites. Tracking within a site is typically done with the aim of producing usage statistics, while tracking across sites is typically used by advertising companies to produce anonymous user profiles, which are then used to target advertising (deciding which advertising image to show) based on the user profile.
|
A possible interaction between a Web browser and a server holding a Web page, in which the server sends a cookie to the browser and the browser sends it back when requesting another page. |
Technically, cookies are arbitrary pieces of data chosen by the
Web server and sent to the browser. The browser returns them unchanged to the server, introducing a
state (memory of previous events) into otherwise stateless HTTP transactions. Without cookies, each retrieval of a
Web page or component of a Web page is an isolated event, mostly unrelated to all other views of the pages of the same site. By returning a cookie to a web server, the browser provides the server a means of connecting the current page view with prior page views. Other than being set by a web server, cookies can also be set by a
script in a language such as
JavaScript, if supported and enabled by the Web browser.
Cookie specifications
[Persistent client state - HTTP cookies - Preliminary specification (Netscape)][RFC 2109 and RFC 2965 - HTTP State Management Mechanism (IETF)] suggest that browsers should support a minimal number of cookies or amount of memory for storing them. In particular, an internet browser is expected to be able to store at least 300 cookies of 4 kilobytes each, and at least 20 cookies per server or
domain.
The cookie setter can specify a deletion date, in which case the cookie will be removed on that date. A shopping site might want to help potential customers by remembering the items in their shopping basket, even if they quit their browser without making a purchase and return later, so that they don't have to find the products over again. In this case, they will create a cookie deletion date some distance away before the shopping cart contents are deleted. If the cookie setter does not specify a date, the cookie is removed once the user quits his browser. As a result, specifying a date is a way for making a cookie survive across sessions. For this reason, cookies with an expiration date are called
persistent.
Since their introduction on the Internet, misconceptions about cookies have circulated on the Internet and in the media.
[Contrary to popular belief, cookies are good for you! (on the Internet)][Keith C. Ivey Untangling the Web Cookies: Just a Little Data Snack. 1998] In 2005,
Jupiter Research published the results of a survey,
[Brian Quinton. Study: Users Don't Understand, Can't Delete Cookies. Direct. May 18 2005] according to which a consistent percentage of respondents believed some of the following claims:
* Cookies are like
worms and
viruses in that they can erase data from the user's hard disks;
* Cookies are a form of
spyware in that they can read personal information stored on the user's computer;
* Cookies generate
popups;
* Cookies are used for
spamming;
* Cookies are only used for
advertising.
Cookies are in fact only data, not code: they cannot erase or read information from the user's computer.
[Adam Penenberg. Cookie Monsters. Slate, November 7 2005] However, cookies allow for detecting the Web pages viewed by a user on a given site or set of sites. This information can be collected in a
profile of the user. Such profiles are often anonymous, that is, they do not contain personal information of the user (name, address, etc.) More precisely, they cannot contain personal information unless the user has made it available to some sites. Even if anonymous, these profiles have been the subject of some privacy concerns.
According to the same survey, a large percentage of Internet users do not know how to delete cookies.
Most modern browsers support cookies. However, a user can usually also choose whether cookies should be used or not. The following are common options:
[The unofficial cookie faq] (1) cookies are never accepted, (2) the browser asks the user whether to accept every individual cookie, or (3) cookies are always accepted.
|
The Firefox Cookie Manager, showing the details of various cookies by domain |
The browser may also include the possibility of better specifying which cookies have to be accepted or not. In particular, the user can typically choose one or more of the following options: reject cookies from specific domains; disallow third-party cookies (see below); accept cookies as non-persistent (expiring when the browser is closed); and allow a server to set cookies for a different domain. Additionally, browsers may also allow users to view and delete individual cookies.
Most browsers supporting JavaScript allow the user to see the cookies that are active with respect to a given page by typing
javascript:alert("Cookies: "+document.cookie) in the browser
URL field. Some browsers incorporate a cookie manager for the user to see and selectively delete the cookies currently stored in the browser.
The
P3P specification includes the possibility for a server to state a privacy policy, which specifies which kind of information it collects and for which purpose. These policies include (but are not limited to) the use of information gathered using cookies. According to the P3P specification, a browser can accept or reject cookies by comparing the privacy policy with the stored user preferences or ask the user, presenting them the privacy policy as declared by the server.
Cookies have some important implications on the
privacy and
anonymity of Web users. While cookies are only sent to the server setting them or one in the same
Internet domain, a Web page may contain images or other components stored on servers in other domains. Cookies that are set during retrieval of these components are called
third-party cookies.
|
In this fictional example, an advertising company has placed banners in two Web sites (which do not show any banner in reality). Hosting the banner images on its servers and using third-party cookies, the advertising company is able to track the browsing of users across these two sites. |
Advertising companies use third-party cookies to track a user across multiple sites. In particular, an advertising company can track a user across all pages where it has placed advertising images or
Web bugs. Knowledge of the pages visited by a user allows the advertisement company to target advertisement to the user's presumed preferences.
The possibility of building a profile of users has been considered by some a potential privacy threat, even when the tracking is done on a single domain but especially when tracking is done across multiple domains using third-party cookies. For this reason, some countries have legislation about cookies.
The
United States government has set strict rules on setting cookies in 2000 after it was disclosed that the White House
drug policy office used cookies to track computer users viewing its online anti-drug advertising to see if they then visited sites about drug making and drug use. In 2002, privacy activist
Daniel Brandt found that the
CIA had been leaving persistent cookies on computers for ten years. When notified it was violating policy, CIA stated that these cookies were not intentionally set and stopped setting them.
[CBS News. CIA Caught Sneaking Cookies. March 20 2002.] On
December 25 2005, Brandt discovered that the
National Security Agency had been leaving two persistent cookies on visitors' computers due to a software upgrade. After being informed, the National Security Agency immediately disabled the cookies.
[The Associated Press. Spy Agency Removes Illegal Tracking Files. December 29 2005]The
2002 European Union telecommunication privacy Directive contains rules about the use of cookies. In particular, Article 5, Paragraph 3 of this directive mandates that storing data (like cookies) in a user's computer can only be done if: 1) the user is provided information about how this data is used; and 2) the user is given the possibility of denying this storing operation. However, this article also states that storing data that is necessary for technical reasons is exempted from this rule. This directive was expected to have been applied since October 2003, but a
December 2004 report says (page 38) that this provision was not applied in practice, and that some member countries (
Slovakia,
Latvia,
Greece,
Belgium, and
Luxembourg) did not even transpose it. The same report suggests a thorough analysis of the situation in the Member States.
Besides privacy concerns, there are some other reasons why cookies have been opposed: they do not always accurately identify users, and they can be used for security attacks.
Inaccurate identification
If more than one browser is used on a computer, each has a separate storage area for cookies. Hence cookies do not identify a person, but a combination of a user account, a computer, and a Web browser. Thus, anyone who uses multiple accounts, computers, or browsers has multiple sets of cookies.
Likewise, cookies do not differentiate between multiple users who share a computer and browser, if they do not use different
user accounts.
Cookie theft
During normal operation, cookies are sent back and forth between a server (or a group of servers in the same domain) and the computer of the browsing user. Since cookies may contain sensitive information (user name, a token used for authentication, etc.), their values should not be accessible to other computers. However, cookies sent on ordinary HTTP sessions are visible to all users who can listen in on the network using a
packet sniffer. These cookies should therefore not contain sensitive data. This problem can usually be overcome by using the
https URI scheme, which invokes
Transport Layer Security to encrypt the connection.
|
Cookie theft: a cookie that should be only exchanged between a server and a client is sent to another party. |
Cross-site scripting allows the value of cookies to be sent to servers that are normally not sent these values. Modern browsers allow execution of pieces of code retrieved from the server. If cookies are accessible during execution, their value may be communicated in some form to servers that should not access them. The process allowing an unauthorised party to receive a cookie is called
cookie theft, and encryption does not help against this attack.
["Can you show me what XSS cookie theft looks like?" (excerpt from the Cgisecurity Cross-Site Scripting FAQ)]This possibility is typically exploited by attackers on sites that allow users to post
HTML content. By embedding a suitable piece of code in an HTML post, an attacker may receive cookies of other users. Knowledge of these cookies can then be exploited by connecting to the same site using the stolen cookies, thus being recognised as the user whose cookies have been stolen.
|
Cookie poisoning: an attacker sends a server an invalid cookie, possibly modifying a valid cookie sent it from the server. |
Cookie poisoning
While cookies are supposed to be stored and sent back to the server unchanged, an attacker may modify the value of cookies before sending them back to the server. If, for example, a cookie contains the total value a user has to pay for the items in their shopping basket, changing this value exposes the server to the risk of making the attacker pay less than the supposed price. The process of tampering with the value of cookies is called
cookie poisoning, and is sometimes used after cookie theft to make an attack persistent.
|
In cross-site cooking, the attacker exploits a browser bug to send an invalid cookie to a server. |
Most websites, however, only store a session identifier — a randomly generated unique number used to identify the user's session — in the cookie itself, while all the other information is stored on the server. In this case, the problem of cookie poisoning is largely eliminated.
Cross-site cooking
Each site is supposed to have its own cookies, so a site like
evil.net should not be able to alter or set cookies for another site, like
good.net.
Cross-site cooking vulnerabilities in web browsers allow malicious sites to break this rule. This is similar to cookie poisoning, but the attacker exploits non-malicious users with vulnerable browsers, instead of attacking the actual site directly. The goal of such attacks may be to perform
session fixation.
Some of the operations that can be realised using cookies can also be realised using other mechanisms. However, these alternatives to cookies have their own drawbacks, which make cookies usually preferred to them in practice. Most of the following alternatives allow for user tracking, even if not as reliably as cookies. As a result, privacy is an issue even if cookies are rejected by the browser or not set by the server.
IP address
An unreliable technique for tracking users is based on storing the
IP addresses of the computers requesting the pages. This technique has been available since the introduction of the World Wide Web, as downloading pages requires the server holding them to know the IP address of the computer running the browser or the
proxy, if any is used. This information is available for the server to be stored regardless of whether cookies are used.
However, these addresses are typically less reliable in identifying a user than cookies because computers and proxies may be shared by several users, and the same computer may be assigned different Internet addresses in different work sessions (this is often the case for
dial-up connections). The reliability of this technique can be improved by using another feature of the HTTP protocol: when a browser requests a page because the user has followed a link, the request that is sent to the server contains the URL of the page where the link is located. If the server stores these URLs, the path of page viewed by the user can be tracked more precisely. However, these traces are less reliable than the ones provided by cookies, as several users may access the same page from the same computer,
NAT router, or proxy and then follow two different links. Moreover, this technique only allows tracking and cannot replace cookies in their other uses.
Tracking by IP address can be impossible with some systems that are used to retain
Internet anonymity, such as
Tor. With such systems, not only could one browser carry multiple addresses throughout a session, but multiple users could appear to be coming from the same IP address, thus making IP address use for tracking wholly unreliable.
URL (query string)
A more precise technique is based on embedding information into URLs. The
query string part of the
URL is the one that is typically used for this purpose, but other parts can be used as well. The
PHP session mechanism uses this method if cookies are not enabled.
This method consists in the Web server appending query strings to the links of a Web page it holds when sending it to a browser. When the user follows a link, the browser returns the attached query string to the server.
Query strings used in this way and cookies are very similar, both being arbitrary pieces of information chosen by the server and sent back by the browser. However, there are some differences: since a query string is part of a URL, if that URL is later reused, the same attached piece of information is sent to the server. For example, if the preferences of a user are encoded in the query string of a URL and the user sends this URL to another user by
e-mail, those preferences will be used for that other user as well.
Moreover, even if the same user accesses the same page two times, there is no guarantee that the same query string is used in both views. For example, if the same user arrives to the same page but coming from a page internal to the site the first time and from an external
search engine the second time, the relative query strings are typically different while the cookies would be the same. For more details, see
query string.
Other drawbacks of query strings are related to security: storing data that identifies a session in a query string enables or simplifies
session fixation attacks,
referer logging attacks and other
security exploits. Transferring session identifiers as HTTP cookies is more secure.
Another drawback of query strings has to do with the way the Web was designed. URL's should point to resources and be "opaque". See
Representational_State_Transfer. If you have a URL that includes a query string, it is not the actual location of the resource.
HTTP authentication
As for authentication, the HTTP protocol includes mechanisms, such as the
digest access authentication, that allow access to a Web page only when the user has provided the correct username and password. Once these credentials are given, the browser stores and use them also for accessing subsequent pages, without requiring the user to provide them again. From the point of view of the user, the effect is the same as if cookies were used: username and password are only requested once, and from that point on the user is given access to the site. In the background, the username and password combination is sent to the server in every browser request. This means that someone listening in on this traffic, can simply read this information and store for later use. Session tokens on the other hand, usually expire after not having been used for a while, and thus effectively become useless (i.e. they cannot be used to retrieve the session in which the user was logged-in).
Macromedia Flash Local Stored Objects
If a browser includes the
Macromedia Flash Player plugin, its Local Shared Objects function can be used in a way very similar to cookies. Local Stored Objects may be an attractive choice to web developers because a majority of
Windows users have Flash Player installed, the default size limit is 100 kb, and the security controls are distinct from the user controls for cookies, so Local Shared Objects may be enabled when cookies are not.
Client-Side Persistence
Some web browsers support a script-based persistence mechanism that allows the page to store information locally for later retrieval. Internet Explorer, for example, supports persisting information in the browser's history, in favorites, in an XML store, or directly within a Web page saved to disk.
[Introduction to Persistence, MSDN]JavaScript's window.name
If JavaScript is enabled, the
window.name property of the object
window can be used to persistently store data. This property remains unaltered across the loading and unloading of other web pages. This
hack is little known, and has therefore not been considered a security risk.
[Set the window.name property from website A then check it in website B]The term "HTTP cookie" derives from "
magic cookie", a packet of data a program receives but only uses for sending it again, possibly to its origin, unchanged. Magic cookies were already used in computing when
Lou Montulli had the idea of using them in Web communications in June 1994
[John Schwartz. Giving the Web a memory cost its users privacy. New York Times. September 4 2001]. At the time, he was an employee of
Netscape Communications, which was developing an
e-commerce application for a customer. Cookies provided a solution to the problem of reliably implementing a virtual shopping cart.
[Jay Kesan and Rajiv Shah. Shaping code. Chapter II.B (Netscape's cookies).][David Kristol. HTTP Cookies: Standards, privacy, and politics. ACM Transactions on Internet Technology, 1(2), 151 - 198, 2001. ]Together with John Giannandrea, Montulli wrote the initial Netscape cookie specification the same year. Version 0.9beta of Netscape, released on September 1994, supported cookies. The first actual use of cookies (out of the labs) was made for checking whether visitors to the Netscape Web site had already visited the site. Montulli and Giannandrea applied for a patent for the cookie technology in 1995; it was granted in 1998. Support for cookies was integrated in Internet Explorer in version 2, released in October 1995.
[The history of Internet Explorer]The introduction of cookies was not widely known to the public, at the time. In particular, cookies were accepted by default, and users were not notified of the presence of cookies. Some people were aware of the existence of cookies as early as the first quarter of 1995,
[Roger Clarke. Cookies] but the general public learned about them after the
Financial Times published an article about them on
February 12 1996. In the same year, cookies received lot of media attention, especially because of potential privacy implications. Cookies were discussed in two
U.S. Federal Trade Commission hearings in 1996 and 1997.
The development of the formal cookie specifications was already ongoing. In particular, the first discussions about a formal specification started in April 1995 on the
www-talk mailing list. A special working group within the
IETF was formed. Two alternative proposals for introducing a state in an HTTP transactions had been proposed by Brian Behlendorf and David Kristol, respectively, but the group, headed by Kristol himself, soon decided to use the Netscape specification as a starting point. On February 1996, the working group identified third-party cookies as a considerable privacy threat. The specification produced by the group was eventually published as RFC 2109 in February 1997. It specifies that third-party cookies were either not allowed at all, or at least not enabled by default.
At this time, advertising companies were already using third-party cookies. The recommendation about third-party cookies of
RFC 2109 was not followed by Netscape and Internet Explorer.
RFC 2109 was followed by RFC 2965 in October 2000.
Setting a cookie
Transfer of Web pages follows the
HyperText Transfer Protocol (HTTP). Regardless of cookies, browsers request a page from web servers by sending them a short text called
HTTP request. For example, to access the page
http://www.w3.org/index.html, browsers connect to the server
www.w3.org sending it a request that looks like the following one:
{
GET /index.html HTTP/1.1
| | browser | â†' | server |