Although the nucleus is the largest and most obvious organelle within the cell, the processes within it have proved rather more difficult to study than those in the surrounding cytoplasm. This may have been due to the difficulty of biochemical separation of its constituents. Largely as a result of improved technology over the last decade, we now know that the nucleus is the most spatially organized and probably the most dynamically active part of the entire cell. The production of DNA, RNA and the assembly of ribosomes involves a massive level of interaction with the cytoplasm and also constant repositioning of the nuclear contents. We shall consider nuclear structure from the outside inwards, beginning with the boundary between nucleus and cytoplasm—the nuclear envelope.
The nuclear envelope separates the nuclear contents from the cytoplasm, also controlling a constant and massive molecular interchange between the two compartments. So why do eukaryotes go to this trouble when prokaryotes like bacteria have no such partitioning yet reproduce themselves at such amazing rates? Successful as they are, at least in terms of numbers, bacteria can be considered essentially ‘one trick ponies’. They have reached their limit as simple single cells, despite reproducing prolifically, and retaining sufficient genetic variability to consistently (and unfortunately) produce antibiotic-resistant strains. If the total number of organisms on the planet is equated with success, then bacteria come out on top. Conversely, in terms of biological complexity, they are also the most simple and consequently ‘primitive’ organisms on the planet. Bacteria also are the oldest at some four billion years, and as such provided the raw material for all subsequent life. The largest step in the evolution of living things on earth was the switch from prokaryotic to eukaryotic cell organization, i.e. the acquisition of a nucleus containing genetic material inside an isolating membrane. This has led to the proliferation and immense variation of life as we know it today. Just how the nucleus was acquired is uncertain, but it was probably as a result of phagocytosis of a small bacterium by a larger one. The smaller bacterium then ‘took over’ control of the larger one or endosymbiosis occurred with the partitioning of the DNA inside a membrane. Although cell biology is generally in agreement about the origin of mitochondria and chloroplasts by engulfment, no such consensus exists for the origin of the nucleus.
Enclosing the genetic blueprint of the cell within its own compartment has fostered the diversity we see in both unicellular and multicellular eukaryotic creatures. Each human produces around 150,000 different proteins, not in every cell, but across the various specialized tissues throughout the body. This is possible in spite of the fact that we only have 23,600 genes because the genetic message can be modified inside the nucleus after transcription (the transfer of information from DNA to RNA), and outside the nucleus (by the addition of simple chemical structures such as fats and sugars), thus increasing the overall number of possible protein products. For comparison, the simplest bacterium is probably M ycoplasma genitalium (found on primate genitalia) which has around 500 genes. The common gut bacterium E scherichia coli has 4,300 genes, whereas the smallest flu virus (which needs to hijack the machinery of the cells it infects to reproduce itself) has but 11 genes.
Separation of nuclear contents and cytoplasm has resulted in eukaryotic cells becoming much larger and more complex in comparison to prokaryotes. The circular molecule of DNA in bacteria is tacked on to the inside of the cell membrane at various points and may stretch around the whole cell. This is fine for a DNA genome with 4.6 million nucleotide base pairs (nucleotide sequences carry the genetic code) as is the case for E . coli . With a 1000-fold increase in length for the DNA in human cells however, accessing a specific gene of around a few hundred base pairs from the overall 3.1 billion base pairs has to be easier when the DNA is concentrated within the nucleus. The restriction of DNA to the nucleus in eukaryote cells also avoids any potential interference with the sophisticated workings of the cytoskeleton and cytoplasmic organelles. There are no such problems in prokaryotes, where the DNA is short (and circular), and there is little if any cytoskeletal structure.
The nuclear envelope and pore complexes
The nuclear envelope consists of two distinct membranes, the outermost being formed by endoplasmic reticulum, which is separated from the inner nuclear membrane by a perinuclear space (Figure 5a). The inner nuclear membrane is lined with a network of fibrous proteins which form a structure known as the nuclear lamina. Both the nuclear membrane and the lamina below it are pierced by nuclear pore complexes, which control the flow of everything into and out of the nucleus, apart from very small molecules which can pass directly through the nuclear envelope. There are around 5,000 pore complexes distributed over the surface of the nucleus in mammalian cells. Nuclear pore complexes are made from 50 proteins (nucleoporins), and are the largest molecular machines in the cell. Nuclear pores connect the inner and outer nuclear membranes, and also project eight cable-like proteins into the cytoplasm, and eight further fibres into the nucleus, forming a structure rather like a basket ( Figure 7d , e). Inbound cargoes of molecules attach to the fibres extending outwards from the pore, and are then passed through the membrane channel and out of the basket into the nucleus.
7. The nucleus. (a) Section through an intact nucleus, (b) surface view,
with the internal chromatin exposed by removal of part of the pore covered nuclear envelope, and (c) the nucleolus (nuc) and nucleoskeleton after DNA removal, (d), (e) nuclear pores viewed from the outside and inside of the nucleus, (f) leukaemia cells containing distorted nuclei
A scaled-up analogy of pore traffic activity would involve a short length of drainpipe (as the pore channel) through which a mixture of tennis balls, golf balls, and marbles would pass in both directions at a rate of 1000 journeys per second. The flow is controlled by nucleoporin proteins which project into the channel, sorting and propelling the various molecules in the correct direction.
Each protein cargo is ‘tagged’ by an amino acid sequence that acts like a luggage label to ensure that they finish on the correct side of the nuclear membrane. The actual passage through the pore requires attachment of ‘chaperone’ proteins called importins or exportins, which accompany the cargo through the pore but are then chopped off as the cargo exits the pore and reattached to more cargo. Ribosomes are assembled from RNA (made in the nucleus) and proteins (made in the cytoplasm), and consequently generate a high level of pore traffic regardless of other nuclear/cytoplasmic exchange. In a HeLa cell, ten million ribosomes are produced each day. Seven thousand are produced each minute, each having around 80 proteins, requiring the production of half a million proteins per minute in the cytoplasm. These proteins are imported into the nucleus at a rate of 100 per pore per minute, passing (amongst other traffic) three ribosomal subunits on their way out of the nucleus. Certain diseases are directly associated with nucleoporin proteins. In primary biliary cirrhosis, proteins (autoimmune antibodies) are produced that attack nucleoporins, eventually leading to complete cirrhosis of the liver.
Although the nuclear envelope clearly separates nucleus and cytoplasm, it also physically links them. Proteins called nesprins, which are anchored in the inner nuclear membrane, reach across the perinuclear space, pass through the outer nuclear membrane, and extend for some distance into the cytoplasm, where they attach to the cytoskeleton. Nesprins are some of the largest proteins in the cell. This attachment of molecules from within the inside of the nucleus to cytoskeletal elements (which themselves are linked to the plasma membrane) means that there is a potential molecular linkage directly from the cell surface through to the nucleus, an interesting but as yet unexplained linkage.
The nuclear lamina
The nuclear lamina was originally visualized in the electron microscope as a fibrous matrix on the inside of the inner nuclear membrane. These protein filaments resist stretching and form the ‘high-tensile cables’, closely related to the intermediate filaments of the cytoskeleton. Thus, the nuclear lamina protects the nuclear contents from mechanical stress, and also anchors the position of the nucleus in the cell, providing sites for attachment to the cytoskeleton in the cytoplasm. A structure called the centrosome, which is the main microtubule organizing centre of the cell, is also kept close to the nuclear surface by attachments to the nuclear lamina. Besides these mechanical functions, the nuclear lamina also plays a major role in the overall organization of nuclear contents affecting both gene regulation and the passage of genetic information to the cytoplasm. Gene defects that lead to disruption of the nuclear envelope and lamina result in severe consequences, termed ‘nuclear envelopathies’ or ‘laminopathies’. The conditions are usually inherited, are generally incurable, and include some extremely rare muscular dystrophic conditions. The rarity of genetic conditions resulting from malfunctioning building blocks of any cell component might make them appear trivial, but more likely, the searching requirements of building an organism without all the full complement of correct parts is likely to stop development proceeding much farther than a few divisions of the zygote.
The genetic constituents of the nucleus
Although the nucleus might have been recognized by Antonie van Leeuwenhoek in the late 17th century, it was not until 1831 that it was reported as a specific structure in orchid epidermal cells by a Scottish botanist, Robert Brown (better known for recognizing ‘Brownian movement’ of pollen grains in water). In 1879, Walther Flemming observed that the nucleus broke down into small fragments at cell division, followed by re-formation of the fragments called chromosomes to make new nuclei in the daughter cells. It was not until 1902 that Walter Sutton and Theodor Boveri independently linked chromosomes directly to mammalian inheritance. Thomas Morgan’s work with fruit flies (D rosophila ) at the start of the 20th century showed specific characters positioned along the length of the chromosomes, followed by the realization by Oswald Avery in 1944 that the genetic material was DNA. Some nine years later, James Watson and Francis Crick showed the structure of DNA to be a double helix, for which they shared the Nobel Prize in 1962 with Maurice Wilkins, whose laboratory had provided the evidence that led to the discovery. Rosalind Franklin, whose X-ray diffraction images of DNA from the Wilkins lab had been the key to DNA structure, died of cancer aged 37 in 1958, and Nobel Prizes are not awarded posthumously. Watson and Crick published the classic double helix model in 1953. The final piece in the jigsaw of DNA structure was produced by Watson with the realization that the pairing of the nucleotide bases, adenine with thymine and guanine with cytosine, not only provided the rungs holding the twisting ladder of DNA together, but also provided a code for accurate replication and a template for protein assembly. Crick continued to study and elucidate the base pairing required for coding proteins, and this led to the fundamental ‘dogma’ that ‘DNA makes RNA and RNA makes protein’. The discovery of DNA structure marked an enormous advance in biology, probably the most significant since Darwin’s publication of On the Origin of Species .
We have a lot of DNA
If the double-stranded DNA in each human nucleus was laid out as a single molecule it would measure around one and a half metres in length. The genetic information it carries is stored in the order of four nucleotide bases—cytosine (C), guanine (G), adenine (A), and thymine (T)—along its length. Groups of three bases encode an amino acid (e.g. TTA encodes the amino acid leucine, TTT—phenylalanine). A single gene may require hundreds or thousands of bases to generate a single protein. In terms of the information stored along this length of DNA, it would take 200 telephone directories to print out the three billion base sequences.
All the 23,600 human genes, however, fit into around 2 cm of our DNA, which leaves 98.5% unaccounted for. This was originally considered to be ‘junk’ DNA. The term junk is perhaps more indicative of the ignorance of early researchers, and thus ‘noncoding’ (i.e. not coding for genes) is a better description. As it seems unlikely that the cell should go to the trouble of replicating more than nine-tenths of its DNA each time it divides for no reason, we would do best to consider this vast majority of our DNA to have an unknown rather than no function. At least some non-coding DNA is certainly important to the cell, as damage restricted to non-coding regions has been found to be just as effective at causing cell death as that in coding regions. Noncoding DNA contains pseudogenes, sequences that are no longer used to make proteins. These may be the remains of information accumulated over an evolutionary lifetime, which may be silent for millions of years, but can be reactivated and actively transcribed. Some non-coding DNA almost certainly represents the incorporation of viral DNA from past infections. Once infected, individuals rarely completely lose virus DNA. Over evolutionary times scales, these collections could reach significant amounts, estimated at 8% for the human genome.
The genes themselves are complex structures, having a starting code (promoter) built into the beginning of each gene and an exit code (terminator) at the end. Amid the coding sequences (exons) there are intervening non-coding sequences (introns), which need to be removed before use. In general, if a primitive organism has a particular gene, then organisms of increasing complexity will contain a number of related genes in proportion to their position on the evolutionary scale. This suggests that with time, genes are often duplicated and then evolve their sequences separately.
How is DNA packaged?
To accommodate one and a half metres of double-stranded DNA within a spherical nucleus roughly one thirty-thousandth of this length, it is clear that the DNA has to be packed in a fairly sophisticated manner. Packaging such a long molecule must allow for genes to be accessible, and also for the whole of the DNA molecule to be duplicated so that exact copies can be passed to each daughter cell. At cell division, discrete blocks of DNA which exist within the nucleus but are not visible as distinct bodies, undergo further levels of coiling and supercoiling, a process called condensation. This produces the discrete chromosomes which are the familiar image of our genetic matter ( Figure 8c, d, e ). During final chromosome condensation, the nuclear envelope is broken down, and the chromosomes are distributed to each daughter cell (see Chapter 4 for more details). Nuclei are then rebuilt in each daughter cell, during which the rigid and rod-like chromosomes appear to lose their individual identity as they decondense and merge back into the overall structure of the daughter cell nuclei. The question of where chromosomes go in non-dividing (interphase) nuclei was answered a century after their initial discovery thanks to a technique called fluorescence in situ hybridization (FISH), which was developed by Joe Gall and Mary-Lou Pardue in 1969. A related technique called chromosome painting incorporates multiple fluorescent probes, allowing individual chromosomes to be recognized within the interphase nucleus. Chromosome painting shows that each chromosome occupies a distinct territory within the nucleus, usually with attachments at the nuclear lamina. Interphase chromosomes occupy about half of the internal nuclear space, the rest being filled by a host of other nuclear components, such as nucleoli and Cajal bodies (see later). The contents of the nucleus are by no means flxed, and there is a constant fiux and movement of all nuclear components over both long and short distances, which requires energy.
8. DNA and chromosomes. (a) Naked DNA and nucleosomes make ‘beads on a string’, (b) chromatin fibres within the nucleus, (c) a set of human chromosomes (known as a karyotype), (d) chromosomes during final condensation, and (e) human metaphase chromosomes
Although it is ‘naked’ in prokaryotes, DNA in eukaryotic cells is always associated with other molecules, and is packaged via a series of stages. Human DNA is first combined with groups of structural proteins called histones. In the first stage of packaging, DNA becomes wrapped twice around groups of eight histone molecules to form a structure known as a nucleosome, leading to a ‘beads on a string appearance ( Figure 8a ). Adjacent nucleosomes then become attached to each other by another histone called H1, in a zig-zag fashion, forming a fibre ten nanometres in diameter. This fibre is then twisted into a solenoid confi guration (a hollow tube 30 nanometres in diameter) called chromatin. Chromatin is the standard confi guration of eukaryote DNA packaging ( Figure 8b ), and exists in two forms called heterochromatin and euchromatin. Heterochromatin is more densely packaged, producing darker staining, and tends to be peripherally distributed within the nucleus ( Figure 7a ). Much of the DNA in heterochromatin has short nucleotide base sequences that are repeated thousands of times (repetitive DNA) and may have a structural rather than genetic function, to anchor DNA within the nucleus. In contrast, euchromatin is much less condensed and not as densely stained, and comprises almost all of the genetically active part of the DNA. When interphase chromosomes undergo final condensation just before division, the euchromatin and heterochromatin form alternating blocks along the length of the chromosomes, which can be stained to produce a consistent banding pattern. This lengthwise series of subdivisions provides a ‘road map’ that has allowed individual genes to be accurately positioned not only on specific chromosomes, but to specific positions along the length of the chromosome. During final chromosome condensation, the chromatin is further looped, folded, coiled, and supercoiled, massively reducing the overall DNA length to a point where the packing ratio of DNA in a chromosome at division reaches 10,000 to 1 ( Figure 8c , d, e). A good analogy to appreciate this amazing organization would be to take a skipping rope the length of a football field and fold it into an overall length of about half an inch. Recent technological advances have shown that the largest of the human chromosomes (chromosome 1) has DNA with 246 million base pairs, and disruptions in their sequence have been linked to over 350 human diseases including cancers and neurological and developmental disorders.
Now the entire genome has been sequenced, one might suspect that the chromosomes themselves might become less relevant because an individual’s DNA can be analysed, compared with normal, and problems diagnosed essentially by computer. It is worth pointing out that at the time of writing only seven individuals on the planet have had their DNA sequenced. These include Craig Venter, pioneer of DNA decoding, James Watson (fittingly), two Koreans, a Chinese, a Yoruban (a member of an ethnic group from Nigeria), and a leukaemia victim. The cost of sequencing the first human genome in 2003 was around $500 million, and the most recent nearer $250,000. For sequencing to be a feasible diagnostic routine, a cost of one thousand dollars is the target, which may be technologically feasible in the near future. However, the main barrier to the medical use of genomes is that diseases such as cancer, diabetes, or Alzheimer’s are invariably caused by many DNA variations, making it difficult to identify clear targets for either drug intervention or diagnostic indicators, and consequently limiting the idea of personalized medicine based on an individual genome—at least for the time being. Recently, the Wellcome Foundation have announced a project to sequence 1000 genomes, from a mixture of healthy people and those suffering from a variety of medical conditions, generating a statistically significant comparison.
The nucleolus
Separate bodies within the nucleus were first recognized by Felice Fontana in 1774 and named ‘nucleoli’. Nucleoli are the largest of the distinct bodies within the nucleus and each nucleus will have up to five, clearly visible by light microscopy without need for specific staining. Nucleoli are formed from a mixture of proteins and nucleic acids, which are shown by electron microscopy ( Figure 7c ) to be organized into a ‘tripartite’ internal structure of a fibrillar centre, a dense fibrillar component, and the granular component. Nucleoli are devoted to the production of ribosomes, and the tripartite structure reflects the three events that take place there: transcription of ribosomal RNA (see Chapter 4), processing of ribosomal RNA, and ribosomal assembly. The DNA responsible for these processes is situated on five different human chromosomes at division at sites called nucleolar organizing regions (NORs). These regions come together after division to form three or four nucleoli, where the ribosomal genes are transcribed and ribosomal subunits partially assembled, ready for export from the nucleus. This concentration of genes, transcription machinery, processing, and assembly into one site allows amazing rates of production—dividing human cells make ten million ribosomes in less than a day, so the nucleolus is essentially a ribosome factory, with an efficiency that would have been the envy of Henry Ford.
Because of the demands of ribosome production, the nucleolus is also remarkably responsive to any source of stress that a cell may experience. Heat, cold, osmotic stress, and a variety of drugs all change nucleolar structure, as do viral infection and nutrient starvation. In order to see whether a cell is healthy or not, inspect the nucleolus first.
Cajal bodies, snurposomes, and spliceosomes
In 1906, Ramony Cajal from Madrid and Camillo Golgi from Pavia shared the Nobel Prize for their work on the structure of the nervous system. Golgi discovered the apparatus or complex that took his name, while Cajal found dense staining bodies close to the nucleolus, which he originally called accessory bodies, and were subsequently named coiled bodies because of the coiled nature of their main protein, coilin. In 1999, Joe Gall suggested they should be called Cajal bodies. Cajal bodies also contain bodies called Gems and Gall’s wonderfully titled spliceosomes and snurposomes (similar to coiled bodies but restricted to amphibian oocyte nuclei), all involved in the processing of RNA in the nucleus after transcription. Snurposomes contain small nuclear ribonuclear proteins (snRNPs, pronounced ‘snurps’), and spliceosomes are sites where splicing of RNA takes place. In the last few years, many other intranuclear bodies have been identified, although our understanding of their biological function is still limited. Just to mention a few, there are PML bodies (also called Kremer bodies), speckles, paraspeckles, and clastosomes.
Organizing the nuclear interior
From relatively recent research, it has become apparent that far from being a mere repository for DNA, the nucleus is just as varied and dynamic as the cytoplasm in terms of content and activity. The segregation of the cytoplasm by membranes into individual organelles allows routine biochemical separation and analysis. Teasing out different components in the nucleus is trickier, as there are no such ‘bordered’ sub-compartments, although the high density of the nucleolus does permit its isolation from nuclear extracts fragmented by a sonic probe. From this starting point, using mass spectroscopy, some 700 human nucleolar proteins have been identified so far in a European collaboration led by Angus Lamond at Dundee University.
Interphase chromosomes occupy around half the total nuclear volume, and are separated from each other by the interchromosome space, which is filled with nucleoplasm, a viscous liquid, equivalent to the cytoplasm outside the nucleus. We now know that, as well as having their own ‘domains’, interphase chromosomes move around the nuclear interior. Gene-rich chromosomes (which have more euchromatin and carry the majority of active genes) tend to be found in the central parts of the nucleus, where most transcriptional activity takes place. Consequently the gene-poor chromosomes (which have more heterochromatin) are more peripherally positioned, and adjacent to the inside of the nuclear envelope where the proteins of the nuclear lamina provide a fibrous network ideal for the anchoring of nuclear contents . If the nuclear lamina is defective, then the normally anchored genetically inactive chromatin might stray into a transcriptionally active region of the nucleus and become inappropriately expressed, as happens in some of the diseases called laminopathies, such as Duchenne muscular dystrophy.
Although electron microscopy has produced vast amounts of information on the workings of the cytoplasm, it has been relatively less successful for the nucleus. This is due to the intense packaging and fibrous nature of the nuclear contents, which makes it virtually impossible to follow a length of chromatin over any distance in the thin sections required for transmission electron microscopy. Add the sheer size of the nucleus (1000 times the volume of a mitochondrion) and it would require 200–300 serial sections to be cut, collected, and photographed at around 100 images per section before any three-dimensional reconstruction could be attempted, which is currently not a feasible task. Novel approaches such as the selective removal of components can simplify things. A fibrous structure can be seen by scanning EM after the biochemical removal of DNA and chromatin. This network of fibres running through the nucleus in thicker sections is called the nuclear matrix of scaffold ( Figure 7 c). Although this type of approach was initially controversial because the extensive biochemical protocols during preparation might create new structures (artefacts) rather than revealing the original organization, the idea of a fibrous supporting network running throughout the nuclear interior (the nucleoskeleton) is now generally accepted.