Key words: Genetics: genetic engineering; gene therapy. Molecular biology.
Molecular biology can be defined broadly as the application of molecular approaches (at the level of DNA and RNA) to understand protein function and regulation in normal and abnormal cellular responses. The application of principles of molecular biology to treat disease or modify organisms for commercial purposes is generally referred to as genetic engineering. Advances in the fields of molecular biology and genetic engineering are beginning to directly impact clinicians in disease prevention, diagnosis, and treatment. Therefore, an understanding of molecular biology is rapidly becoming necessary to fully understand normal physiology and pathophysiology. DNA can now be isolated, purified, amplified, and sequenced routinely and easily. With these advances, clinicians may soon be able to identify patients preoperatively with specific genetic diseases such as malignant hyperthermia and Alzheimer's disease, or quickly identify a specific virus or bacteria in the intensive care unit. Of particular importance to the anesthesiologist, many areas of anesthesiology research (e.g., identification of binding sites of local anesthetics in sodium channels, techniques for single cell pharmacologic testing of clinically relevant ligand-gated channels, and understanding the molecular basis of bleeding disorders) all require knowledge of molecular biology. Therefore, this article is intended as a "primer" on molecular biology and genetic engineering; it begins with a brief review of basic concepts of DNA, RNA, and protein synthesis, then builds on this knowledge to include a variety of techniques used in modern molecular biology specifically of interest and use to the clinician.
Although inheritance of traits such as hair and eye color from parents to children has been appreciated for centuries, systematic study of inherited traits did not occur until the early 1860s, in Czechoslovakia, when Gregor Mendel, an Austrian monk, analyzed morphologic characteristics of plants. His systematic crosses, throughout several generations, between plants of dissimilar phenotypes, led to the concept of dominant and recessive traits passed by "factors" called genes. Genes originally were defined as the region of the genome that segregates as a single unit during meiosis and produces a specific phenotypic trait. In 1926, Herman Muller and Lewis Stadler independently discovered that x-rays cause mutations in genes, giving scientists a wide variety of new mutations to analyze. [2,3]Concurrent with the establishment of classic genetic theories at the turn of the century, chemists investigated the general properties of proteins. In 1905, Emil Fisher, a German chemist, established that proteins were made of individual building blocks called amino acids. In 1951, Frederick Sanger of Cambridge, England, reported the exact sequence of amino acids in insulin, establishing that proteins are composed of unique sequences of amino acids. [4,5]Protein biochemistry remains an important area of research, with studies of protein folding and three-dimensional protein structures providing valuable insight with regard to the function of enzymes and molecules in the body.
Although classic genetics and protein biochemistry were well underway by the early 1900s, the idea that chromosomes (long DNA molecules with associated proteins) carry genetic information was not yet appreciated. Wide skepticism toward DNA as genetic material existed, especially when Phoebus Levene discovered that DNA consisted of only four types of nucleotides, each containing a phosphate group, a sugar, and one of four bases-adenine, thymine, guanine, or cytosine. He postulated that such a simple molecule was unlikely to hold the vast amount of genetic variation known to exist. Therefore, for the next 20 yr, focus remained on proteins as potential genetic material because proteins consist of 20 distinct amino acids, forming apparently endless combinations and resulting in a wide variety of products. A breakthrough in the debate about whether DNA or proteins carry genetic information was made in 1944 by Oswald, MacLeod, and McCarty when they transformed one bacterial strain into another by transferring only nuclear DNA. Knowledge of DNA advanced rapidly; Chargraff demonstrated soon thereafter that, in DNA, adenine pairs only with thymine, whereas cytosine pairs only with guanine. Then, in 1953, Rosalind Franklin and Maurice Wilkins provided radiographic diffraction data of crystallized DNA [8,9]which ultimately led Watson and Crick to postulate the model of DNA as a double helix, with base pairing of purines (adenine and guanine) to pyrimidines (thymine and cytosine) located between two backbone strands of phosphates and sugar residues. This discovery of complementary strands of a double helix serving as templates for replication led scientists to definitively regard DNA as the elusive genetic material. These important discoveries set the stage for many techniques used in modern molecular biology.
Basic Concepts of DNA Biochemistry
The molecular structure of every protein present in living organisms is encoded by DNA. The backbone of DNA consists of sugar (deoxyribose) residues linked together by phosphodiester bonds. A phosphate moiety links the 3' carbon of one sugar to the 5' carbon of the next sugar group. Each sugar has a base attached-either a purine (adenine, A; guanine, G) or a pyrimidine (thymine, T; cytosine, C). Together, the base and sugar are called a nucleoside. In a DNA molecule, two strands of nucleotides wind together in an antiparallel fashion (one strand in the 5' to 3' direction and one strand 3' to 5') to form a double helix. Every adenine binds pairs via two hydrogen bonds to thymine, whereas cytosine pairs to guanine with three hydrogen bonds (Figure 1). During replication, hydrogen bonds between A-T and C-G are broken base by base, and DNA polymerase catalyzes the addition of a complementary strand to each single-stranded DNA molecule. This enzyme requires a primer (a short strand of complementary DNA) to initiate replication. DNA polymerase requires that a deoxyribonucleotide pair with the template strand to be recognized; therefore, the template strand determines which base pair is added. Because each of the new DNA strands contain one of the original strands and a newly synthesized strand, DNA replication is said to be semi-conservative. Although hydrogen bonds are, individually, relatively weak, each DNA molecule contains so many base pairs that, under physiologic conditions (other than during replication), complementary DNA strands never spontaneously separate. In the laboratory, however, base pair interactions can be disrupted with strong alkali or with temperatures near 100 degrees Celsius, a process called denaturation. Annealing occurs when the temperature is decreased and DNA base pairs recombine specifically to form the original double helix. DNA molecules are packaged into compact structures by small proteins called histones. Histones contain a high proportion of positively charged amino acids, which helps in tightly packaging the negatively charged DNA double helix. Histones are unique to eukaryotes and highly abundant, with approximately 60 million molecules per cell. Histones and nuclear DNA in eukaryotes are collectively referred to as chromatin.
Although most DNA is found in the nucleus, mitochondria contain a separate set of double-stranded DNA. All mitochondrial DNA is maternally inherited and mutates at a rate 10 times greater than nuclear DNA. Although mitochondrial DNA represents only 4% of all DNA in human cells, high mutation rates and the maternal inheritance pattern make this DNA source important in phylogenetic studies.
RNA and Transcription
Whereas DNA encodes genetic information, RNA is the intermediate molecule required for the synthesis of proteins from DNA. RNA differs from DNA by the following: 1) ribose (instead of deoxyribose) in the sugar backbone, 2) the base uracil replaces thymine, and 3) RNA is single stranded, as opposed to double-stranded DNA. Three types of RNA are involved in protein synthesis-messenger RNA (mRNA), ribosomal RNA (rRNA) and transfer RNA (tRNA). Before proteins can be synthesized, genetic information contained in DNA must be transferred to complementary bases in mRNA, a process called transcription. Transcription occurs in the nucleus, with mRNA produced by the enzyme RNA polymerase. Theoretically, any region of the DNA double helix can be copied into two different RNA molecules-one from each DNA strand, with the RNA produced identical in nucleotide sequence to the opposite, nontemplate DNA strand. However, a promoter, a specific DNA sequence found approximately 25-200 base pairs proximal to the 5' transcription initiation site, determines which of the two strands will be replicated by orienting RNA polymerase in a specific direction. Only DNA encoding proteins necessary for a cell's structure and function at that moment undergo transcription.
Before being transported to the cytoplasm, mRNA undergoes modification. A methylated guanine is added to the 5' end, a process known as capping, which is important for efficient translation in the cytoplasm. Multiple adenine bases are attached to the 3' end to form a poly-A "tail" that is thought to aid in the transport of mRNA from the nucleus to the cytoplasm. Further modification includes splicing, where introns (areas within the coding region of the gene that do not code for protein) are excised by enzymes called spliceosomes, and the remaining exons (areas of DNA that code for protein) are joined together. The final "mature" mRNA is then ready to be transferred to the cytoplasm.
Although a wide variety of cell types (from lymphocytes to neurons) are present in multicellular organism, DNA contained in all cells remains constant, although different proteins are produced in different cells, depending on its function. Control of gene expression may be regulated at a variety of steps, including the following: 1) transcriptional control (how and when a gene is transcribed), 2) RNA processing control (how a primary RNA transcript is spliced), 3) RNA transport control (which mRNA is moved from the nucleus to the cytoplasm), 4) translational control (which RNA is translated), 5) mRNA degradation control (which RNA remains stable in the cytoplasm vs. being degraded), and 6) protein activity control (selectively activating or inactivating proteins after they are made). For most cells, the majority of regulation occurs at the transcriptional level, thereby precluding the production of unnecessary RNA intermediates or proteins. Transcriptional regulation occurs when specific DNA sequences are recognized by gene regulatory proteins. Gene regulatory proteins recognize a specific sequence of the DNA helix and modulate which of the thousands of genes in a cell will be transcribed. For example, gene regulated proteins can bind to specific DNA sequences known as enhancers, which are distant from the promoter region and which activate transcription. When an enhancer is bound by a gene regulatory protein, the DNA between the enhancer and promoter loops out and allows the enhancer to interact directly with the RNA polymerase. After DNA is transcribed into mRNA, the cell can then splice the transcript in various ways and produce different polypeptide chains from the same gene-this is known as alternative RNA splicing. These and other methods of regulation of gene expression provide the basis of cell differentiation in both structure and function.
The Genetic Code
The genetic code enables the "message" of DNA (via RNA) to be translated into a specific protein. This process takes place in the cytoplasm. Proteins contain a specific sequence of amino acids that form three-dimensional structures according to the chemical composition of the individual amino acids. The order of amino acids for a given protein is determined by the sequence of nucleotides in mRNA. Three sequential mRNA nucleotides, known as a codon, encode each amino acid. Transfer RNA (tRNA) functions as an adapter between mRNA and protein. The tRNA molecule contains a region that decodes the mRNA (called an anticodon loop, which contains base pairs complementary to the three bases read in mRNA), and another region carrying the corresponding amino acid. Therefore, each codon in mRNA ultimately corresponds to a specific amino acid in the resulting peptide/protein (Table 1). This relation is known as the genetic code and is universal for all living organisms, strongly suggesting that all cells have descended from a single line of primitive cells. Because RNA is constructed from four types of nucleotides, 64 combinations of 3 nucleotides are possible. Three of these sequences identify termination of a polypeptide chain and are called stop codons. The remaining 61 codons specify 20 amino acids, so most amino acids are represented by several different codons. The genetic code is, therefore, said to be degenerate. Degeneracy implies that either there is more than one tRNA for each amino acid or that a single tRNA molecule can base pair with more than one codon; both of these situations occur. Some tRNA molecules only require accurate base pairing of the first two nucleotides and tolerate a mismatch of the third. This is known as "wobble" base pairing and conserves tRNA molecules, because only 31 different kinds of tRNA molecules (instead of 61) are required to match all 20 amino acids.
Protein Synthesis and Modification
Protein synthesis begins with ribosomes attaching to mRNA and moving along the molecule in a 5' to 3' direction. Ribosomes are responsible for bringing mRNA codons into position, where they can be recognized by tRNA. The process of the "message" of RNA producing specific proteins is known as translation. During translation, as additional amino acids are aligned and added, the enzyme peptidyl transferase forms peptide bonds between neighboring amino acids. Translation ends when a stop codon is identified and the newly formed protein is released. Proteins frequently undergo posttranslational modification, including glycosylation (the attachment of sugar groups on sections of proteins exposed to the extracellular matrix), palmitoylation (the attachment of fatty acid moieties to help anchor proteins in the membrane), and myristylation (the addition of myristic acid to N-terminal glycine). These (and other) modifications enable the protein to function more effectively.
Once basic DNA structure was understood, ways to manipulate DNA easily were sought, to gain further understanding of genes and chromosomes. Throughout the past 20 yr, techniques have been developed to determine the exact nucleotide sequences of genes and to amplify or isolate specific DNA fragments. These new methods to obtain basic information at a cellular level were then applied to clinical medicine. In the next section, we describe these methods and their relevance to clinical practice.
Restriction Enzymes and Mapping
Restriction enzymes provided a method by which scientists could specifically alter DNA. Restriction enzymes are bacterial proteins that recognize and cut DNA at specific nucleotide sequences, usually 4-8 base pairs in length. [11-13]The resulting DNA fragments vary in length depending on the exact nucleotide sequence present. These DNA fragments can be separated by size, using the principle that small DNA fragments move through an electrical gradient on a gel faster than larger fragments; this process is called gel electrophoresis. Gel electrophoresis can be accomplished with various media, two of the most common being agarose and acrylamide gels. Agarose gels separate fragments based mainly on charge density, whereas acrylamide gels are a porus media in which DNA fragments are separated by both charge and size. DNA fragments can then be visualized using ethidium bromide, a compound that intercalates into DNA, causing it to fluoresce when exposed to ultraviolet light. Comparison with DNA fragments of known molecular weight (standards) enables exact DNA fragment size to be determined. Utilization of different restriction enzymes to cut the same DNA sequence results in a series of DNA fragment sizes, depending on the restriction enzyme used. Combining information from several enzymes produces a restriction map. A restriction map is analogous to a fingerprint, because each DNA sequence produces a unique pattern.
There are many situations in which restriction enzymes are used in medicine. One example is the identification of patients with sickle cell disease, where the substitution of valine for glutamic acid in the sixth amino acid of the beta-globin chain of hemoglobin alters the function of the protein in hypoxia. This mutation changes the restriction map, so restriction analysis can be used to easily differentiate between patients with normal hemoglobin and those with the disease. Restriction enzymes also can be used to create new recombinant DNA molecules. This is possible because many restriction enzymes cut double-stranded DNA a few nucleotides apart, creating overhanging pieces known as "sticky ends" (Figure 2). If the same restriction enzyme is used on two different pieces of DNA, complementary "sticky ends" align due to hydrogen bonding (a process called annealing) and are joined together by DNA ligase to form a new "designer DNA" sequence.
Determining the exact nucleotide sequence of a given fragment of DNA is essential in molecular biology. Sequencing is usually performed by either the Maxam-Gilbert chemical modification method or, more commonly, by the Sanger method (Figure 3). The Sanger method of DNA sequencing uses dideoxynucleotides, which have a normal 5' end (capable of incorporating into an elongating DNA chain) but a 2',3' dideoxy group (which prevents formation of a phosphodiester bond with the next nucleotide). The DNA strand to be sequenced is incubated with a short sequence of DNA (called an oligonucleotide primer) complementary to the end of the DNA of interest, an excess of all four nucleotides (dideoxynucleotides; one of which is radiolabeled for convenient visualization of final DNA products), and a low concentration of a chosen dideoxynucleotide. DNA polymerase is added, and DNA is produced until a dideoxynucleotide is incorporated. In a sequencing reaction, thousands of DNA molecules are being synthesized simultaneously. In this setting, dideoxynucleotides are incorporated randomly, resulting in a series of DNA fragments of increasing length, each truncated by incorporation of a dideoxynucleotide molecule, indicating the presence of a specific nucleotide at that point in the sequence. Such reactions are performed with each of the four dideoxynucleotides. The resulting labeled DNA fragments are separated, by size, through an electric field on an acrylamide gel, and the pattern of bands on the gel reveals the DNA sequence. DNA sequencing has, and continues to have, enormous importance in the laboratory and for clinical medicine. The Human Genome Project is a collaborative effort to determine the DNA sequence of the entire human genome (approximately 3 billion base pairs). Knowledge of the exact sequence of normal genes should provide a means to identify genetic diseases. Already, sequences of genes known to be important in various diseases such as Huntington's disease, Alzheimer's disease, and cystic fibrosis (as well as many others) have been determined. [17-19]
Cloning and DNA Libraries
Genes can be defined as discrete DNA sequences that, when transcribed into RNA, contain both regulatory regions as well as RNA sequences ultimately translated into protein. The process of isolating a gene of interest from all other genes in the genome is called cloning (Figure 4). To isolate a specific DNA segment, either chromosomal DNA or complementary DNA (cDNA; a DNA back copy of mRNA) is cut into small pieces by restriction enzymes and the fragments inserted into portions of a virus or bacteria; these collections of DNA are called "libraries." Examination of a virus or bacterial library to find a gene of interest can be performed in several ways. If part of the desired DNA sequence is already available, a portion of this sequence can be radiolabeled and allowed to anneal with complementary DNA sequences in the library (a process called hybridization). If the sequence of DNA is not known, protein products of the desired DNA may be able to be identified by using specific antibodies to the protein. Another method is to use a similar (but not identical) DNA sequence with less harsh experimental conditions than are normally used to identify related genes. These less stringent conditions allow the similar segments of DNA to hybridize without requiring perfect base pair matching. Finally, another method of isolating important genes is called expression cloning. In expression cloning, a functional response for the encoded protein is tested and used as a guide to isolate specific DNA sequences that encode the protein of interest.
Once the gene is identified, multiple copies need to be produced; the process of producing multiple copies of DNA is called amplification. One way a gene can be amplified is by using small circular pieces of DNA known as plasmids. Both the plasmid and gene are cut with the same restriction enzyme, enabling the foreign gene to be placed (ligated) into the plasmid. Many plasmids contain antibiotic resistant genes that make them easy to identify. The newly created plasmid that contains the foreign DNA of interest is incorporated into bacteria, a process known as transformation. Only bacteria that have successfully incorporated the plasmid will grow in nutrients that contain antibiotic. Transformed bacteria then replicate. Most plasmids used in molecular biology are "relax-control" plasmids, meaning that, in addition to replicating with each bacterial cell division, the plasmid also replicates many times within a single cell; the net result is rapid amplification of plasmid DNA. Plasmid DNA is then isolated by rupturing bacterial cells and purifying the plasmid DNA. The foreign DNA fragment can be isolated from plasmid DNA using the same restriction enzyme used for insertion. Messenger RNA and the encoded protein can be produced efficiently by using plasmid expression vectors that contain a highly active promoter region. Bacterial, yeast, or mammalian cells are then transfected with the recombinant DNA, resulting in large quantities of the desired protein being produced. Cells are then lysed and the protein purified from other host cell proteins using various methods, one of which is chromatography.
Many clinical advances can be attributed to general methods used in cloning. Tissue plasminogen activator is one example of a gene whose encoded protein is now mass produced using recombinant techniques. Genes for many human diseases have been identified and cloned, including those important in hemophilia, Duchenne's muscular dystrophy, and cystic fibrosis. [20-23]Cloning techniques have helped make possible major advances in the study and treatment of many human diseases.
Polymerase Chain Reaction
Insertion of cloned DNA and subsequent amplification in bacterial cells is not the only method available to amplify segments of DNA. In the mid 1980s, Kary Mullis developed the polymerase chain reaction (PCR). Polymerase chain reaction exponentially multiplies specific segments of DNA (Figure 5). To specify the region to be amplified, it is necessary to synthesize two short oligonucleotides (primers), each complementary to one strand of each of the ends of the DNA of interest. To begin the reaction, DNA is denatured by heating, resulting in the separation of the two complementary DNA strands. Denatured DNA, primers, all four nucleotides, buffer, and the enzyme DNA polymerase are cooled to 42 degrees-65 degrees Celsius, the temperature at which primers anneal to complementary DNA. The temperature is then raised to 72 degrees Celsius, the optimal temperature for DNA polymerase. The DNA polymerase used in PCR reactions is unique in that it is isolated from thermophilic bacterium and is stable at much higher temperatures than other polymerases. DNA polymerase adds the appropriate nucleotides to the DNA template, starting at the primers, thereby forming two new, double-stranded DNA segments. The temperature is then elevated back to 92-95 degrees Celsius, where the double-stranded DNA denatures and now forms four new templates for the next cycle in the reaction. The cycle of denaturing double-stranded DNA helices, hybridizing primers, and then incorporating nucleotides to growing templates is repeated 25-30 times. Because this reaction is exponential, 30 cycles produce more than one million copies of the targeted DNA segment.
Polymerase chain reaction is a very powerful technique, with wide applications. It can be used to provide ample amounts of DNA from a known gene. By modifying primer sequences slightly, mutations can be introduced into genes and the functional result studied. Clinically, PCR amplification of small quantities of DNA can detect infectious agents or identify residual cancer cells. Polymerase chain reaction amplification of DNA followed by restriction enzyme analysis enables diagnosis of diseases such as sickle cell anemia from a single sample of blood. Recently, PCR followed by DNA analysis has begun to be used to determine parenthood in paternity battles and identify perpetrators in rape and murder cases. Polymerase chain reaction is highly specific and can amplify a segment of DNA even if only one or two copies of the sequence are present in a sample, making it useful in many applications in medicine. Polymerase chain reaction is not without difficulties, including its high sensitivity. Many genes have slightly different sequences that are of no clinical consequence. Such variations in the general population are called polymorphisms (see section on Genetic Testing-Techniques for a more detailed discussion of polymorphisms). Therefore, PCR may detect differences in a specific DNA fragment although it may not be clinically relevant. A further problem with PCR is the risk of contamination of the study sample; in this case, the resultant amplified DNA might be a contaminant rather than the targeted DNA, potentially leading to misdiagnosis. However, PCR remains a valuable adjunct for molecular biologists and clinicians, being faster and easier than standard cloning methods.
Southern and Northern Blotting
Whereas restriction enzyme analysis, cloning of genes, and PCR are used to study specific genes in detail, more general techniques such as Southern and Northern blotting can be applied to study DNA and RNA, respectively (Figure 6). Southern blotting analyzes the structure and location of a gene. Genomic DNA is cut with restriction enzymes and the resulting fragments are separated by size on an agarose gel. The fragments are then transferred to a solid support (nitro-cellulose or nylon) using an electric field or more slowly with a buffer gradient. A labeled DNA probe specific for the gene to be studied is then hybridized to DNA on the filter. The presence (and relative amount) of a gene, as well as a physical map of the gene, can be produced by analyzing the resultant fragments. This restriction map can be used to compare the DNA sample with others and detect difference in genes between individuals. Southern blotting is used to identify major gene rearrangements and deletions and can be used to detect genetically inherited gene abnormalities in a patient or their family. In the process of cloning a gene, Southern blotting provides a convenient method to identify a single gene within a larger-sized DNA fragment, and a method to compare genes between species. Northern blotting analyzes the size and expression of specific mRNA. Total RNA is separated by gel electrophoreses, and a RNA or DNA radiolabeled probe is used to identify and quantitate specific RNA sequences. Northern analysis is frequently used to identify the size of mRNA message for a known gene in various tissues and cells. Northern blotting also can be used to identify an increase in the expression of specific mRNA in response to various stimuli.
In Situ Hybridization
All of the molecular biology methods described thus far can be used with DNA or RNA isolated from a single tissue or cultured cells. However, none of these techniques maintains tissue architecture so that DNA or RNA can be localized to specific cells within a tissue. In situ hybridization determines RNA expression at a cellular level (Figure 7). In this technique, thin slices of tissue (5-20 microns thick) are fixed on slides and then incubated with labeled RNA or DNA probes. In this way, specific cells that contain the RNA of interest are highlighted, and may give insight into tissue function. However, because RNA expression may not equal protein expression, comparison with autoradiography or immunohistochemical approaches (which label protein) is important. A clinical use of in situ hybridization is to isolate virally infected cells (Figure 7).
Genetic diseases traditionally have been diagnosed by clinical criteria or biochemical tests. Clinical diagnosis is often ambiguous because specific features of a disorder may take years to develop. Biochemical tests used to detect the presence or absence of a gene product may give equivocal results; in addition, prenatal testing and the identification of a carrier state are frequently difficult. With the advent of new molecular techniques, specific genes and gene mutations can now be identified long before the appearance of clinical symptoms. Another benefit of new molecular biology-based diagnostic testing is that these tests require only a small sample of DNA (such as found in a single tube of blood) instead of tissue biopsies. Because of these advantages, molecular-based genetic testing has become commonplace.
Genetic testing uses restriction enzyme analysis, DNA and RNA hybridization, and PCR (alone or in combination) to detect subtle point mutations, deletions, or insertions of DNA. Frameshift mutations (where a nucleic acid is added or deleted, causing the triplet code to be offset), premature termination of translation (resulting in aberrantly small protein), and insertion of multiple repetitions of nonsense sequence, are mutations more likely to be pathogenic than simple base pair mutations. Mutations that involve the simple change of one amino acid for another may or may not have clinical significance. The effect of an amino acid change may be able to be predicted based on the structure of the protein, or it may be necessary to reproduce the new phenotype in cell culture or even in an animal model to prove the pathogenicity of the substitution. Single amino acid changes without observable biologic consequence are known as benign polymorphisms, and are quite common in the general population. Therefore, geneticists must be able to distinguish pathogenic mutations from nonpathogenic polymorphisms, an often difficult task. This differentiation is straightforward in a disease such as sickle cell anemia, which is homogeneous, in that all patients with the disease have a valine substituted for glutamic acid on the beta-globin gene in hemoglobin. However, genetic testing is complicated because many inherited diseases are not the result of a single mutation, but rather multiple mutations, all resulting in the same phenotype. For example, 70% of Northern Europeans with cystic fibrosis have a three base pair deletion that results in the loss of the amino acid phenylalanine from the cystic fibrosis transmembrane conductance regulator. It is possible to screen for this deletion and diagnose patients with this mutation (Figure 8). However, the remaining 30% of people with cystic fibrosis have more than 200 different mutations that result in similar phenotypes. This heterogeneity occurs in many inherited diseases and, therefore, a negative test for one specific mutation does not rule out possibility of the disease, but rather only of the specific mutation.
A further problem with current genetic testing is that very small deletions or additions in DNA that are responsible for disease remain extremely difficult to identify. Even large deletions that involve long sequences of DNA may sometimes be difficult to detect. This is because nonsex chromosomes come in pairs, so even a large defect in one gene might be masked by a second normal gene. Although the normal gene should produce the expected hybridization band on a Southern blot, and this band should be less intense than if two normal genes are present, quantitative analysis by Southern blotting is difficult and only relative at best. These problems help to explain why even though many genes and mutations have been identified as important in various diseases and genetic testing has become commonplace, thus far, relatively few diagnostic tests unequivocably diagnose disease.
Identification of a specific mutation responsible for a given disease facilitates genetic testing. However, when the specific mutation is not known, direct testing is not possible. In this situation, alternative techniques such as linkage analysis may prove useful. Linkage analysis can be used in families where specific DNA sequences (markers) are always found in individuals with the disease, but not in those without the disease. For some diseases, within a given family, markers are always inherited with the disease, even though the exact sequence of DNA markers may vary among individual family members. Difficulties involved in linkage analysis are that two or more generations of affected family members are required for study, and markers for each individual must be determined separately. Also, when a gene has a weak influence on disease expression, more families must be studied to draw meaningful conclusions. Therefore, linkage analysis cannot be performed in a single affected family member or when relatives are uncooperative. Linkage analysis has been used in families to diagnosis Duchenne's muscular dystrophy, hemophilia, and spinal muscular atrophy. [28-30]Unfortunately, many diseases, such as schizophrenia, severe obesity, and diabetes, are multifactorial and cannot yet be screened, even using linkage analysis.
Two new methods have been developed to examine the entire genome instead of individual markers. Representational difference analysis compares the differences between two genomes, whereas genomic mismatch scanning (GMS) identifies identical sequences between two samples. Representational difference analysis uses hybridization techniques to pick out regions of DNA that differ from one another. However, this method requires that almost every fragment hybridize to a complementary DNA fragment; the human genome is so large that representational difference analysis can take weeks. Genomic mismatch scanning screens DNA from affected relatives of unrelated families to identify similar gene sequences. The underlying concept is that one pair of relatives will have many similar areas in their genome, but by examining genomes of affected individuals from unrelated families, there should be regions consistently identical across families. These regions would be linked to the gene for the disease. These two methods offer new promise in mapping and identifying inherited disease.
The ability to screen for genetic disorders raises many ethical questions, including the psychologic impact of screening on patients, consequences of results on health insurance benefits and employment, and, ultimately, decisions made that result from information obtained by prenatal screening. To address these ethical issues, 3% of the budget of the Human Genome Project has been allocated to exploration and preparation of social, legal, and ethical consequences of mapping the human genome. When screening for multiple genetic diseases becomes available, difficult decisions will be required in regard to whom to test and whether such individuals desire testing. Information gained from screening must be used carefully and only in the appropriate context; although molecular testing may identify the presence of a defective gene, this does not prove the disease will occur, or give any insight as to the age of onset of the disease, the role the environment may play, or the severity of the disease in a specific individual. All these issues have the potential to impact on society in terms of unemployment, education, insurance benefits, and how we view each other (i.e., as superior or inferior). The unborn also may be affected, with results from prenatal genetic testing influencing whether a given genetic aberration is deemed acceptable, or even parents' potential desire to choose specific attributes for their offspring (from hair color to intelligence to sex). Therefore, it remains imperative that, along with advances in our understanding of the human genome, work continues in the fields of ethics and sociology to help resultant new information be used in the best way possible for the overall good of humanity.
Besides being useful in diagnosing disease, molecular biology techniques are important in disease treatment. Gene therapy can be defined as therapeutic intervention via molecular modification. Three major areas need to be addressed for gene therapy to be effective-identification of the specific gene of interest, identification and isolation of target cells for gene delivery, and determination of the method of transfer. Each of these areas will be addressed in the following section.
Understanding the Genetic Defect
Before gene therapy can begin, it is important to have a comprehensive understanding of the molecular basis underlying a specific disease process. This is sometimes the most difficult aspect of gene therapy. Once a gene, or set of genes, has been determined as important in a disease, the genes must be individually cloned, including important regulatory sequences. If the goal is simply to replace a missing gene or provide abundant amounts of a normal gene (where disease occurs from abnormalities in the native gene, resulting in defective protein product function), then only the coding sequence may be required. However, regulatory sequences normally surrounding a gene of interest may be necessary for efficient RNA and protein production once the gene is transferred to a new cell. Specific promoter sequences that direct gene expression only in certain cells also may be used to target the gene to a specific tissue. Once the gene is cloned, the next step is to identify and isolate target cells for gene delivery.
Determination of a target cell or tissue for gene delivery is an important and complex task. The first cells used in gene therapy were lymphocytes. In these initial experiments, circulating lymphocytes from patients with severe combined immunodeficiency secondary to adenine-deaminase deficiency were removed and infected with retroviruses that contained a normal adenine-deaminase gene. The altered lymphocytes were then reinjected into the patient, restoring partial immunity to the individual. However, this therapeutic "correction" of adenine-deaminase deficiency lasted only as long as the lymphocytes lived. When the reengineered lymphocytes matured and died, it was necessary to repeat the entire therapeutic procedure. One way to circumvent this problem would be to target hematopoietic stem cells instead of lymphocytes, potentially curing the genetic disease permanently. However, stem cells only constitute a small proportion of cells in the bone marrow, are difficult to obtain, and are not readily susceptible to infection by retroviruses. Therefore, only a small subpopulation of stem cells can be altered genetically and might not be capable of producing the desired clinical effect. [35,36]
Another cell targeted for gene therapy is the hepatocyte. Many genetic diseases involve the liver, including galactosemia, phenylketonuria, and familial hypercholesterolemia. Typically, hepatocytes are removed, cultured, infected with the desired gene, and then reinjected into the portal vein, where they migrate into the liver. The liver has tremendous regenerative capacity, and the new hepatocytes survive quite well once reintroduced. However, currently, a large portion of the liver must be removed from the patient to obtain cells and to stimulate hepatocyte regeneration. 
In contrast to the earlier examples, not all targeted cells need to be removed from the body to be used for gene therapy. Aerosols that carry adenoviruses that contain the cystic fibrosis transmembrane receptor gene can be inhaled by patients with cystic fibrosis. [38,39]Epithelial cells are infected in vivo, and these cells have been shown to express the corrected gene for as long as 6 weeks. However, problems with immunogenicity and inflammation with repeated treatment currently limits this therapeutic approach. Intravenous injection of a gene with a promoter that targets the specific tissue of cell of interest would be ideal, but current gene therapy technology is currently far from this ideal.
Once a gene has been identified as important in a disease, and the appropriate cell or tissue is targeted, the next step is to deliver the foreign DNA into the cell so it can be integrated into DNA in the nucleus, and ultimately expressed as the desired protein product. Transfer of DNA can occur within the patient (in vivo) or on living cells removed from the body (ex vivo) and subsequently returned to the patient. Several approaches have been taken in this regard. Transfer of DNA or RNA into cells using direct "physical" approaches (microinjection, calcium phosphate precipitation, lipofection, and electroporation) has been tried, with limited success. Microinjection can be very efficient, but is extremely limited in clinical practice because of the small number of cells that can be injected. Each cell is injected individually, and for effective treatment, as many as 108to 109cells must be injected, a daunting task. Electroporation is a more efficient technique and uses brief, high-voltage electrical pulses to form transient nanometer-sized pores in the plasma membrane of cells; DNA directly enters the cells through these pores. Electroporation is useful for transient expression of cloned genes and to establish cell lines with integrated copies of a given gene. Calcium phosphate precipitation is the most widely used method of cell transfection, even though its mechanism of action is not entirely clear. It is thought that the transfected DNA enters the cell by endocytosis and is then transferred to the nucleus. However, even calcium phosphate-mediated transfection has a reported efficiency of as much as 20%. This method is useful, though, when large numbers of cells are required. Direct injection of DNA or RNA into a tissue (i.e., the fourth ventricle for brain delivery) has also had limited success. In general, physical methods of DNA transfer do not result in integration of foreign DNA into the targeted cell's genome, which necessitates reinjection or repeat therapy. In contrast, viruses have proved more efficient in delivering genes and stably incorporating foreign DNA into targeted cells. Advantages and disadvantages of viral transfer of DNA are explained later.
Three types of viral approaches are generally used currently for gene therapy-retroviruses, adenoviruses, and adeno-associated viruses; a fourth approach (herpes virus) is also beginning to be examined. Each of these viruses has distinct advantages and disadvantages in terms of gene therapy. Retroviruses are RNA viruses that produce viral DNA incorporated into the host cell genome, so any foreign DNA placed in a retrovirus should be expressed in the cell indefinitely. Retroviruses are easy to manipulate, and can infect a wide variety of cell types with a high degree of efficiency. Problems with retroviruses include the inability of the virus to accommodate large inserts of foreign DNA as well as the potential for oncogenesis due to random incorporation in the cell's genome, potentially resulting in disruption of regulatory DNA sequences required for normal cell growth. Disrupted genes might then produce protein with abnormal activity. In addition, retroviruses can only infect replicating cells, limiting their use for in vivo treatment because most human cells are not actively replicating.
In contrast to retroviruses, adenoviruses can infect nondividing cells and easily accommodate large fragments of foreign DNA. [41-43]A further advantage of adenoviruses is that their genome remains separate from host DNA, therefore decreasing the likelihood of deleterious mutations and minimizing disruption of normal gen regulation. The major, current problem with using adenovirus in gene therapy is that expression of adenoviral proteins induces a vigorous host immune response, resulting in inflammation and decreased foreign gene expression. This is the primary explanation for disappointing results in initial human trials that used adenovirus gene therapy in cystic fibrosis. However, very high titers of adenovirus were used in these trials, and newer, more highly infective (and hopefully less immunogenic) adenoviruses have been produced since. Only time will tell whether these newly engineered viruses will be effective in gene therapy.
Adeno-associated virus is a defective human parvovirus also capable of infecting various mammalian cells. Preliminary data suggests that this virus, like adenovirus, is capable of infecting nondividing cells. A distinct advantage with adeno-associated virus is that it expresses no viral antigens and, therefore, is nonimmunogenic. It also has never been associated with disease in humans. However, to replicate, adeno-associated viruses require a helper virus, usually adenovirus. Another difficulty with adeno-associated viruses is that only small pieces of foreign DNA can be incorporated.
The most direct method of gene transfer is simply to substitute a normal gene for an abnormal or nonfunctional gene by homologous recombination. This is analogous to organ transplantation at a molecular level. Enzymes known as site recombinases catalyze the excision and insertion of genetic material by recognizing specific nucleotide sequences. Inserting a tissue-specific promoter with the new gene provides a method to localize the production of the gene product. Currently, this process has been successful only in cell lines and embryonic stem cells. [44-46]Another theoretical approach to transfer very large amounts of genetic material (including the desired gene and all its associated control regions) is to create an artificial chromosome. Because the chromosome would function independently and not integrate into the genome, there is no possibility of unwanted mutations due to random incorporation of foreign DNA into host chromosomes, and stable expression should occur. Problems that limit the introduction of the artificial chromosome method into gene therapy include the need to identify sequences for both centromere and telomere function in mammalian cells. Finally, in some cases, correction of a disease may not require physical transference of foreign DNA into cells. For instance, a defective gene could be bypassed by "turning on" genes that possess a similar function. One example of this approach would be the use of fetal gamma-globin genes to correct disorders of adult hemoglobin beta-chain synthesis in thalassemia and sickle cell disease. 
The Recombinant DNA Advisory Committee has approved more than 60 protocols for human gene therapy, examples of which are shown in Table 2. Guidelines for clinical trials include life-threatening diseases, where current therapy is inadequate. The gene for the disease should have been isolated, cloned, and characterized. Along with continued trials to deliver the cystic fibrosis transmembrane conductance regulator gene to respiratory epithelium, trials have begun to treat alpha-1-antitrypsin deficiency and phenylketonuria. [49,50]There are more than 30 genetic diseases currently corrected by bone marrow transplantation* that, theoretically, should be amenable to treatment by gene therapy. Several approaches to cancer therapy are currently being investigated, including enhancement of the immune response to tumors, insertion of genes into tumor cells, which then invoke cell death, and methods to modify tumor suppressor genes. [51,52]Many technologic challenges remain to be overcome, in addition to knowledge in regard to the regulation of mammalian genes and the sequences required for gene stability. It is still not clear whether it is safe to incorporate genes into nuclear DNA or whether it will ever be possible to produce stable extra chromosomes. In addition, the immune response to new gene products also needs to be evaluated. Although somatic gene therapy (which affects only the individual being treated) raises many ethical questions, as discussed previously, genetic modification of germ cells (which affects future generations) enters an entirely new realm of medical ethics.
Application of Genetic Techniques to Malignant Hyperthermia
Although concepts of gene therapy and genetic testing can be simple, the actual application of these techniques to diagnose and treat human disease may be quite complicated. Malignant hyperthermia provides an example of a disease in which even identification of susceptible individuals is difficult. Malignant hyperthermia is a clinical syndrome in which genetically susceptible individuals experience hypotension, tachycardia, skeletal muscle rigidity, metabolic acidosis, fever, and dysrrhythmias in response to inhalational anesthetics and depolarizing skeletal muscle relaxants. There is no recognizable phenotype of malignant hyperthermia-susceptible individuals, and current diagnostic tests are based on in vitro muscle contraction responses to caffeine and halothane. Difficulties with these tests include the invasiveness of the muscle biopsy required, expense of surgery and laboratory testing, and high sensitivity but low specificity. Currently, a primary goal of malignant hyperthermia research is to develop a noninvasive, inexpensive, and accurate test for malignant hyperthermia. The primary genetic defect in malignant hyperthermia-susceptible individuals is an abnormality in the skeletal muscle release channel, the ryanodine receptor, whose gene is located in the q12-13.2 region of chromosome 19. [53,54]The ryanodine receptor protein is a tetrameric structure that acts as a calcium release channel and "foot" structure that bridges the sarcoplasmic reticulum and the t-tubules in skeletal muscle. The most common malignant hyperthermia-susceptible mutation is a point mutation that results in a change from Arg to Cys at position 615 of the ryanodine receptor gene. Although this mutation is well characterized, only 3-5% of malignant hyperthermia families demonstrate this specific defect. Three other independent mutations (gly248arg, ile403met, arg2434his) also have been identified, and a second malignant hyperthermia susceptibility locus has been identified and localized to the q11.2-24 region of chromosome 17. Because known mutations account for only a small fraction of human malignant hyperthermia, the development of diagnostic tests and genetic therapy for human malignant hyperthermia will require extensive additional research.
Throughout the past 20 yr, molecular biology has expanded the horizons of clinical medicine in both diagnosis and treatment. The goal of this primer is to provide the clinician with an introduction to a variety of molecular biology techniques and their application in genetic engineering and medicine. This background information should facilitate an understanding of the application of genetic techniques in studies presented in the literature and in the daily practice of clinical medicine.
*Hobbs JR: Correction of 34 genetic diseases by displacement bone marrow transplantation. Plasma Therapy and Transfusion Technology 1985; 6:221-46.