Text file
A
text file (or
plain text file) is a
computer file which contains only ordinary textual characters with essentially no formatting. The term 'text file' is typically used in contrast with the term '
binary file', even though any file is fundamentally a sequence of arbitrary
bits, and many computer components (for example, all
hard disk circuitry and most
system software) make no distinction between file types. However, a large percentage of
application programs can understand and use text files in some way, but few programs can typically understand and use the contents of any particular binary file. Hence the distinction can be useful to computer users.
Text files are files where most
bytes (or short sequences of bytes) represent ordinary readable characters such as letters, digits, and punctuation (including spaces), and include some
control characters such as
tabs,
line feeds and
carriage returns. This simplicity allows a wide variety of programs to display their contents.
The similar term
plaintext is most commonly used in a
cryptographic context and refers to unencrypted data. The similarity sometimes causes confusion, especially among those new to computers, cryptography, or data communications.
Generally, a text file contains characters in an
ASCII-based encoding, or much less commonly an
EBCDIC-based encoding, without any embedded information such as
font information,
hyperlinks or inline
images. Text files are often encoded in an extension of ASCII; these include
ISO 8859,
EUC, a
special encoding for
Windows, a special
Mac-Roman encoding for
Mac OS, and
Unicode encoding schemes (common on many platforms) such as
UTF-8 or
UTF-16.
Although text files are often meant for humans to read, they are also commonly used for data storage by computer programs. Text files have some advantages even for data storage because they avoid certain problems with binary files, such as
endianness, padding bytes, or differences in the number of bytes in a
machine word. Further, when
data corruption occurs in a file used for data storage, it is far easier for a human to fix if it is a text file. As a bonus, it may be easier for the program to recover from the error, because text files are pretty verbose, while binary files are usually compact (it's said that text files have a low
entropy rate). Damaging an amount of a text file destroys little information; damaging the same amount of a binary file destroys more information.
MIME
Text files usually have the
MIME type "text/plain", usually with additional information indicating an encoding. Prior to the advent of
Mac OS X, the Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was "TEXT". Under the Windows operating system, a file is regarded as a text file if the suffix of the name of the file (the "
extension") is "
txt". However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language in which the source is written.
ASCII
The ASCII standard allows ASCII-only plain text files (unlike most other file types) to be freely interchanged and readable on
Unix,
Macintosh,
Microsoft Windows,
DOS, and other systems. These differ in their preferred line ending convention (see
new line) and their interpretation of values outside the ASCII range (their
character encoding).
Other Formats
Plain text is often used as a readable representation of other data that is not itself purely textual: for example, a formatted
webpage is not plain text, but its
HTML source is. Similarly,
source code for computer programs is usually stored in text files, but is
compiled into a binary form for execution.
*
Plain text*
Binary file*
Text File Types*
C2: the Power of Plain Text