Gene
 |
This stylistic schematic diagram shows a gene in relation to the double helix structure of DNA and to a chromosome (right). Introns are regions often found in eukaryote genes which are removed in the splicing process: only the exons encode the protein. This diagram labels a region of only 40 or so bases as a gene. In reality many genes are much larger, as are introns and exons. |
Genes are the units of heredity in living
organisms. They are encoded in the organism's genetic material (usually
DNA or
RNA), and control the physical development and
behavior of the organism. During
reproduction, the genetic material is passed on from the parent(s) to the
offspring. Genetic material can also be passed between un-related individuals (e.g. via
transfection, or on
viruses). Genes encode the information necessary to construct the
chemicals (
proteins etc.) needed for the organism to function.
The word "gene" was coined in 1909 by
Danish botanist Wilhelm Johannsen for the fundamental physical and functional unit of heredity. The word gene was derived from
Hugo De Vries' term pangen, itself a derivative of the word
pangenesis which
Darwin (1868) had coined. The word pangenesis is made from the
Greek words
pan (a prefix meaning "whole", "encompassing") and
genesis ("birth") or
genos ("origin").
The term "gene" is shared by many disciplines, including
classical genetics,
molecular genetics,
evolutionary biology and
population genetics. Because each discipline models the
biology of
life differently, the usage of the word gene varies between disciplines. It may refer to either material or conceptual entities.
Following the discovery that
DNA is the genetic material, the growth of
biotechnology, and the project to
sequence the human
genome, the common usage of the word "gene" has increasingly reflected its meaning in
molecular biology, namely the segments of DNA which
cells transcribe into
RNA and
translate, at least in part, into proteins. The
Sequence Ontology project defines a gene as: "A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions".
In common speech, "gene" is often used to refer to the
hereditary cause of a
trait,
disease or condition—as in "the gene for
obesity." Speaking more precisely, a biologist might refer to an
allele or a
mutation that
has been implicated in or
is associated with obesity. This is because biologists know that many factors other than genes decide whether a person is obese or not: eating habits, exercise, prenatal environment, upbringing,
culture and the availability of
food, for example.
Moreover, it is very unlikely that variations within a single gene—or single genetic locus—fully determine one's genetic predisposition for obesity. These aspects of inheritance—the interplay between genes and environment, the influence of many genes—appear to be the norm with regard to many and perhaps most ("complex" or "multi-factoral") traits. The term
phenotype refers to the characteristics that result from this interplay (see
genotype-phenotype distinction).
Properties of genes
In
molecular biology, a gene is considered to be the region of DNA (or RNA, in the case of some viruses) that determines the structure of a
protein (the coding sequence), together with the region of DNA that controls when and where the protein will be produced (the regulatory sequence). The
genetic code determines how the coding DNA sequence is converted into a protein sequence (
transcription and
translation). The genetic code is essentially the same for all known life, from
bacteria to
humans.
Through the proteins they encode, genes govern the
cells in which they reside. In multicellular organisms, they control the
development of the individual from the
fertilized egg and the day-to-day functions of the cells that make up
tissues and
organs. The instrumental roles of their protein products range from mechanical support of the cell structure to the transportation and manufacture of other molecules and to the regulation of other proteins' activities.
The genes that exist today are those that have reproduced successfully in the past. Often, many individual organisms share a gene; thus, the death of an individual need not mean the extinction of the gene. Indeed, if the sacrifice of one individual enhances the survivability of other individuals with the same gene, the death of an individual may enhance the overall survival of the gene. This is the basis of the selfish gene view, popularized by
Richard Dawkins. He points out in his book,
The Selfish Gene, that to be successful genes need have no other "purpose" than to
propagate themselves, even at the expense of their host organism's welfare. A human that behaved in such a way would be described as "selfish", although, ironically, a selfish gene may promote
altruistic behaviours. According to Dawkins, the possibly disappointing answer to the question "what is the meaning of life?" may be "the survival and perpetuation of ribonucleic acids and their associated proteins".
Types of genes
Due to rare, spontaneous errors (e.g. in
DNA replication),
mutations in the sequence of a gene may arise. Once propagated to the next generation, this mutation may lead to variations within a species' population. Variants of a single gene are known as
alleles, and differences in alleles may give rise to differences in traits, for example eye colour. A gene's most common allele is called the
wild type allele, and rare alleles are called
mutants.
In most cases,
RNA is an intermediate product in the process of manufacturing proteins from genes. However, for some gene sequences, the RNA molecules are the actual functional products. For example, RNAs known as
ribozymes are capable of
enzymatic function, and
small interfering RNAs have a regulatory role. The DNA sequences from which such RNAs are transcribed are known as
non-coding RNA, or
RNA genes.
Most living organisms carry their genes and transmit them to offspring as DNA, but some
viruses carry only RNA. Because they use RNA, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. On the other hand, RNA
retroviruses, such as
HIV, require the
reverse transcription of their genome from RNA into DNA before their proteins can be synthesized.
Human gene nomenclature
For each known human gene the
HUGO Gene
Nomenclature Committee (
HGNC) approve a gene name and
symbol (short-form
abbreviation). All approved symbols are stored in the
HGNC Database. Each symbol is unique and each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that people can talk about them. This also facilitates
electronic data retrieval from publications. In preference each symbol maintains parallel
construction in different members of a
gene family and can be used in other
species, especially the
mouse.
Typical numbers of genes in an organism
| organism | genes! base pairs |
|---|
| Plant | <50,000 | <1011 |
| Human, mouse or rat | 25,000 | 3×109 |
| Fruit Fly | 13,767 | 1.3×108 |
| Honey bee | 15,000 | 3×108 |
| Worm | 19,000 | 9.7×107 |
| Fungus | 6,000 | 1.3×107 |
| Bacterium | 500â€"6,000 | 5×105â€"107 |
| Mycoplasma genitalium | 500 | 580,000 |
| DNA virus | 10â€"900 | 5,000â€"800,000 |
| RNA virus | 1â€"25 | 1,000â€"23,000 |
| Viroid | 0â€"1 | ~500 |
The attached table gives typical numbers of genes and genome size for some organisms. Estimates of the number of genes in an organism are somewhat controversial because they depend on the discovery of genes, and no techniques currently exist to prove that a DNA sequence contains no gene. (In early genetics, genes could be identified only if there were mutations, or alleles.) Nonetheless, estimates are made based on current knowledge.
Chemical structure of a gene
Four kinds of sequentially linked
nucleotides compose a DNA molecule or strand (more at
DNA). These four nucleotides constitute the genetic alphabet. A sequence of three consecutive nucleotides, called a
codon, is the protein-coding vocabulary. The sequence of codons in a gene specifies the
amino-acid sequence of the protein it encodes.
In most
eukaryotic species, very little of the DNA in the genome encodes proteins, and the genes may be separated by vast sequences of so-called
junk DNA. Moreover, the genes are often fragmented internally by non-coding sequences called
introns, which can be many times longer than the coding sequence. Introns are removed on the heels of
transcription by
splicing. In the primary molecular sense, they represent parts of a gene, however.
All the genes and intervening DNA together make up the
genome of an organism, which in many species is divided among several
chromosomes and typically present in two or more copies. The location (or
locus) of a gene and the chromosome on which it is situated is in a sense arbitrary. Genes that appear together on the chromosomes of one species, such as humans, may appear on separate chromosomes in another species, such as mice. Two genes positioned near one another on a chromosome may encode proteins that figure in the same cellular process or in completely unrelated processes. As an example of the former, many of the genes involved in spermatogenesis reside together on the
Y chromosome.
Many species carry more than one copy of their genome within each of their
somatic cells. These organisms are called
diploid if they have two copies or
polyploid if they have more than two copies. In such organisms, the copies are practically never identical. With respect to each gene, the copies that an individual possesses are liable to be distinct alleles, which may act synergistically or antagonistically to generate a trait or
phenotype. The ways that gene copies interact are explained by chemical
dominance relationships (more at
genetics,
allele).
Expression of molecular genes
For various reasons, the relationship between DNA strand and a
phenotype trait is not direct. The same DNA strand in two different individuals may result in different traits because of the effect of other DNA strands or the environment.
* The DNA strand is expressed into a trait only if it is
transcribed to
RNA. Because the transcription starts from a specific base-pair sequence (a
promoter) and stops at another (a
terminator), our DNA strand needs to be correctly placed between the two. If not, it is considered as
junk DNA, and is not expressed.
* Cells regulate the activity of genes in part by increasing or decreasing their rate of transcription. Over the short term, this
regulation occurs through the binding or unbinding of proteins, known as
transcription factors, to specific non-coding DNA sequences called
regulatory elements. Therefore, to be expressed, our DNA strand needs to be properly regulated by other DNA strands.
* The DNA strand may also be
silenced through DNA
methylation or by chemical changes to the protein components of chromosomes (see
histone). This is a permanent form of regulation of the transcription.
* The RNA is often edited before its translation into a protein. Eukaryotic cells
splice the transcripts of a gene, by keeping the
exons and removing the
introns. Therefore, the DNA strand needs to be in an exon to be expressed. Because of the complexity of the splicing process, one transcribed RNA may be spliced in alternate ways to produce not one but a variety of proteins (
alternative splicing) from one pre-mRNA. Prokaryotes produce a similar effect by shifting
reading frames during translation.
* The
translation of RNA into a protein also starts with a specific start and stop sequence.
* Once produced, the protein interacts with the many other proteins in the cell, according to the
cell metabolism. This interaction finally produces the trait.
This complex process helps explain the different meanings of "gene":
* a nucleotide sequence in a DNA strand;
* or the transcribed RNA, prior to splicing;
* or the transcribed RNA after splicing, i.e. without the intronsThe latter meaning of gene is the result of more "material entity" than the first one.
Mutations and evolution
Just as there are many factors influencing the expression of a particular DNA strand, there are many ways to have genetic mutations.
For example, natural variations within
regulatory sequences appear to underlie many of the heritable characteristics seen in organisms. The influence of such variations on the trajectory of
evolution through
natural selection may be as large as or larger than variation in sequences that encode proteins. Thus, though regulatory elements are often distinguished from genes in molecular biology, in effect they satisfy the shared and historical sense of the word. Indeed, a breeder or geneticist, in following the inheritance pattern of a trait, has no immediate way to know whether this pattern arises from coding sequences or regulatory sequences. Typically, he or she will simply attribute it to variations within a gene.
Errors during
DNA replication may lead to the
duplication of a gene, which may diverge over time. Though the two sequences may remain the same, or be only slightly altered, they are typically regarded as separate genes (i.e. not as
alleles of the same gene). The same is true when duplicate sequences appear in different species. Yet, though the alleles of a gene differ in sequence, nevertheless they are regarded as a single gene (occupying a single locus).
The existence of genes was first suggested by
Gregor Mendel, who, in the 1860s, studied inheritance in pea plants and hypothesized a factor that conveys traits from parent to offspring. Although he did not use the term
gene, he explained his results in terms of inherited characteristics. Mendel was also the first to hypothesize
independent assortment, the distinction between
dominant and
recessive traits, the distinction between a
heterozygote and
homozygote, and the difference between what would later be described as
genotype and
phenotype. Mendel's concept was finally named when
Wilhelm Johannsen coined the word
gene in 1909.
In the early 1900s, Mendel's work received renewed attention from scientists. In 1910,
Thomas Hunt Morgan showed that genes reside on specific
chromosomes. He later showed that genes occupy specific locations on the chromosome. With this knowledge, Morgan and his students began the first chromosomal map of the fruit fly
Drosophila. In 1928,
Frederick Griffith showed that genes could be transferred. In what is now known as
Griffith's experiment, injections into a mouse of a deadly strain of bacteria that had been heat-killed transferred genetic information to a safe strain of the same bacteria, killing the mouse.
In 1941,
George Wells Beadle and
Edward Lawrie Tatum showed that mutations in genes caused errors in certain steps in
metabolic pathways. This showed that specific genes code for specific proteins, leading to the "one gene, one enzyme" hypothesis.
Oswald Avery,
Collin Macleod, and
Maclyn McCarty showed in 1944 that DNA holds the gene's information. In 1953,
James D. Watson and
Francis Crick demonstrated the molecular structure of
DNA. Together, these discoveries established the
central dogma of molecular biology, which states that proteins are translated from
RNA which is transcribed from DNA. This dogma has since been shown to have exceptions, such as
reverse transcription in
retroviruses.
Richard Roberts and
Phillip Sarp discovered in 1977 that genes can be split into segments. This leads to the idea that one gene can make several proteins. Recently (as of 2003-2006), biological results let the notion of gene appear more slippery. In particular, genes do not seem to sit side by side on DNA like discrete beads. Instead, regions of the DNA producing distinct proteins may overlap, so that the idea emerges that "genes are one long continuum". (
Pearson, 2006)
George C. Williams first explicitly advocated the
gene-centric view of evolution in his book
Adaptation and Natural Selection. Also, he proposed an evolutionary concept of gene to be used when we are talking about natural selection favoring some gene. The definition is: "that which segregates and recombines with appreciable frequency." According to this definition, even an asexual genome could be considered a gene, insofar it have an appreciable permanency through many generations.
The difference is: the molecular gene
transcribes as a unit, and the evolutionary gene
inherits as a unit.
Richard Dawkins'
The Selfish Gene and
The Extended Phenotype defended the idea that the gene is the only
replicator in living systems. This means that only genes transmit their structure largely intact and are potentially immortal in the form of copies. So, genes should be the
unit of selection. In
River Out of Eden, Dawkins further refined the idea of gene-centric selection by describing life as a river of compatible genes flowing through geological time. Scoop up a bucket of genes from the river of genes, and we have an
organism serving as temporary bodies. A river of genes may fork into two branches representing two non-
interbreeding species as a result of geographical separation.
*
DNA*
Gene-centric view of evolution*
Gene expression*
Gene therapy*
Gene family*
Genetic programming*
Genetic algorithm*
Genetics*
Genomes*
Genomes#Minimal genomes*
Genomics*
Homeobox*
Human Genome Project*
List of notable genes*
Meme*
Memetics*
Protein*
Pseudogene*
RNA*
Google Book Search*
*
Science aid: Genetics Genetics for beginners/teens
*
HUGO Gene Nomenclature Committee, HGNC*
the HGNC Database*
Human Genome Organisation, HUGO*
Recount slashes number of human genes (from
New Scientist magazine)
**
National Human Genome Research Institute â€" News Release**
Nature - 21 October 2004 â€" Finishing the euchromatic sequence of the human genome
*
Rat Genome*
Stanford Encyclopedia of Philosophy entry*
iHOP - Information Hyperlinked over Proteins*
UniProt*
IDconverter - Map your ids to other known public DBs*
Entrez Gene - A searchable database of genes*
Asian Genes This website discusses the genetic distance of different Asian groups.
*
real-time PCR Applications Workshops, at the TATAA Biocenter Germany