Background

Genomic sequences are now available for many different organisms which, when combined with biocomputing analysis result in the annotation of most of the coding regions that define the protein repertoire of the living creature. Systematic protein purification experiments revealed that proteins act rarely as single entities but are generally associated into well-defined complexes, 80% of which contain between 5 and 12 distinct proteins [1]. Interestingly, several proteins show some degree of infidelity and can be found in distinct complexes. Moreover the documented complexes correspond only to the most stable molecular interactions that resist the harsh protein purification conditions. Many more transient interactions are likely to occur between proteins and protein complexes to build up the intricate and robust molecular interaction network that governs cell fate.

Macromolecular complexes are therefore at the center of most biological processes. They integrate spatially several catalytic or structural activities with built-in regulatory functions. In most of the cases, conformational changes that range from atomic to molecular scale are instrumental to explain the function of these complexes. Altogether these dynamic properties, associated with the size of the particles ranging between 10 and 40 nm substantiates the name of nanomachines often attributed to these complexes. These nanomachines are targeted by most of the currently available drugs used to cure human diseases but for their vast majority the drugs inhibit a catalytic activity carried by a single subunit. Only in rare occasions the intrinsic mechanical properties or the specific protein-protein interaction network of a complex is targeted by drugs. The ribosome is one of such nanomachines, responsible for protein synthesis and for which several examples of drugs targeting the mechanical properties are at hand [2]. Macrolydes and other antibiotics affect the translocation of the ribosome along the mRNA and thus inhibit protein synthesis. Fusidic acid was shown to prevent the dynamic turnover of the elongation factor G and thus affects the interaction of the ribosome with this regulatory factor. Finally antibiotics such as Dalfopristin or Quinopristin were found to bind to the ribosome exit channel and to block mechanically the progression of the nascent polypeptide. Few other examples of drugs targeting so clearly the intrinsic mechanical properties of a complex were described so far. This is related to the poor structural information available to date on complexes since most of the atomic structures deposited in the protein data bank are single polypeptides.

This tutorial aims at describing the molecular organization of the general transcription factor TFIID as a paramount multi-protein complex and to emphasize the role of cryo-electron microscopy (cryo-EM) and digital image analysis to integrate structural and functional information in order to reach a mechanistic model of the complex.

Methods

Cryo-EM of frozen hydrated molecular complexes

Imaging of single particles by electron microscopy and numerical analysis of image datasets have proven invaluable tools to describe the structural organization of large macromolecular assemblies. Since the discovery of negative stain by Brenner and Horne in 1959, single particles embedded in a layer of heavy atom salts can be visualized through the high contrast provided by the electron-dense material that surrounds the biological macromolecule composed of low atomic number elements, which poorly scatter electrons (Figure 1) [3]. Despite its ability to provide high contrast, to reveal fine structural details and to sustain fragile structures, the negative staining approach is limited to the description of surface features and rarely extends beyond 15-20 Å resolution.

Figure 1
figure 1

Preparation of purified molecular complexes for electron microscopy. In negative stain the specimen is adsorbed on a carbon film, embedded in a layer of heavy metal salts and dried. In frozen hydrated conditions the molecules are embedded in a thin layer of vitrified buffer suspended in a hole of the carbon film. Corresponding electron micrographs are shown (right panels). The bar represents 50 nm.

A major breakthrough was achieved by the discovery in the early 1980' of a robust specimen preparation method that preserved specimen hydration in the vacuum of the electron microscope [4, 5]. The method relies on the fast vitrification of a thin aqueous layer containing the specimen by plunging into a liquid ethane slush (Figure 1). This procedure prevents ice crystal formation that segregates particles and ruins image quality. The frozen hydrated sample has to be observed at low temperature, typically close to liquid nitrogen temperature, to prevent phase transitions and special cold stages were developed for cryo-EM observations. This groundbreaking technology opened new horizons for the observation of macromolecular complexes. It allows unconstrained particle conformations in the absence of any crystal contacts and in close to physiological ionic strength and pH conditions. In contrast to crystallized conditions, in which a particular conformation is selected, a flexible particle will be able to adopt all permitted conformations. Conformational flexibility may be detrimental for structure determination since fine structural details may be averaged out, but cryo-EM records conformational intermediates and thus holds the promise to detect and describe particle dynamics. Early electron diffraction experiments showed that in such frozen hydrated conditions, the structure of the specimen is preserved down to atomic resolutions, thus showing for the first time that electron microscopy images have the potential to reveal the same structural information than X-ray diffraction [6].

The resolving power of modern electron microscopes is sufficient to image single atoms. In material sciences, the specimen is very stable and a huge number of electrons can interact with the sample often without affecting its structure. As a result, individual atoms can be detected with a good statistical significance or signal to noise ratio (SNR). Imaging of biological samples fully benefit from the most recent developments in instrumentation such as field emission guns which give a much brighter and more coherent electron beam, detectors and microscope automation. Specific instrumentation is needed to observe frozen hydrated samples which includes cold stages to keep the specimen temperature below -170°C, anti-contamination devices to prevent deposition of traces of water present in the microscope onto the cold specimen as well as low-dose imaging protocols to avoid irradiation of the specimen before data acquisition. Structural damage induced by the electron beam is a strongly limiting factor for biological specimen. Inelastic interactions of incident electrons with sample atoms dissipate energy that can break covalent bonds and generate highly reactive side chains. It is generally accepted that the atomic structure of the specimen is preserved when electron doses are kept below 5 electrons per square angstrom (e-2), however this number varies with the acceleration voltage of the electrons - at 300 kV it can be up to 25 e-2 [7]. In these conditions the molecular images are so noisy that the fine structural details cannot be detected. As a rule of thumb, at an electron dose of 5 e-2, details in the range of 50 Å can be detected with a SNR of two while smaller details are below this detection limit. To reconcile low specimen irradiation which leads to noisy images, with a high SNR objective to detect small details, it is necessary to split the dose required to detect atoms (say 2000 e-2) over several independent particles (in this case 400) to kept the dose below 5 e-2 and to add-up the signal coming from all these images.

The ongoing development of highly sensitive direct detection cameras and single electron counting devices are important to record highly enlarged images of biological complexes with the best quantum detection efficiency and with reduced noise [8]. Automation of cameras and electron microscopes facilitates the recording of several hundreds to thousands of images per day each containing 100-200 particles thus generating huge image datasets which, as we will describe, will be of importance to reach the final spatial resolution [9]. The need for reduced electron irradiation also led to dedicated "low dose" data acquisition strategies in which microscope adjustments such as focus and astigmatism corrections are performed on an area remote of the area of interest to be recorded "blindly".

Single particle image analysis

The objectives of single particle image analysis are dual [10]. The first goal is to improve the SNR of the original images by averaging the signal from independent particles. Assuming a Gaussian distribution of all sources of noise that can affect the original molecular image (statistics of electron-matter interaction, detector noise, cosmic rays, etc...), image averaging will improve the SNR and increase the spatial resolution that can be detected. The second goal of image analysis is to reach a volumetric description of the sample. Standard transmission electron microscopes provide 2-D projections of the 3-D electron density map of the sample, multiplied by a microscope-specific Contrast Transfer Function, which has to be corrected for. The objective is to determine the projection (or viewing) direction of each 2-D image with respect to the 3-D object it originates from and to reconstruct a 3-D model by combining many 2-D views. A brief overview of the image analysis protocol is shown in Figure 2.

Figure 2
figure 2

Schematic representation of the key steps in the analysis of single particle images.

Alignment and clustering

Images of a same particle can be averaged to improve the SNR only if two criteria are met: firstly they have to correspond to the same view or projection of the particle and secondly the images have to be in the same register, or in other words aligned in translation and in rotation one with respect to the others. The spatial resolution that can be reached will depend on the number of images that can be averaged, on how similar the views are, and on the alignment quality. If a tolerance of 10° in viewing direction is accepted, the finest dimension that can be resolved for a globular particle with 10 nm in diameter cannot be smaller than 5.sin10° = 0.8 nm. To reach a resolution of 0.2 nm the variation in viewing direction cannot be larger than 2.2°.

A molecular image is aligned against a reference image by correlating image intensities. The correlation coefficient between two images is a measure of their similarities and all possible translations and rotations will be explored to find the correlation maximum, which will be considered as their best alignment. The quality of the alignment depends on many parameters such as the initial SNR of the image, the size and the shape of the particle.

In a real image data set, the particles have different orientations that lead to different views that need to be separated before calculating an average image. The images need therefore to be clustered into groups containing the most similar images. The image intensity variance should be minimized within the same group, while it should be maximized between different groups. In practical terms the image data set is first subjected to a multivariate statistical analysis (Principal Component Analysis or Correspondence Analysis) to detect the most meaningful trends in the data set and the clustering is then performed on the most significant Eigenvectors using Hierarchical Ascendant Classification schemes.

In an ideal image data set, the particles are randomly oriented which will produce an infinite number of projections. This condition is not always met when particles are adsorbed on a supporting carbon film, which may lead to preferred orientations. Nevertheless, the number of different orientations is very large and it is virtually impossible to find two perfectly identical particle images. It is therefore important to consider an angular projection sector within which we consider the images to be identical at a defined spatial resolution. An image class can to a first approximation be considered as a group of molecular images viewed along the same angular sector. If we consider the above mentioned 10 nm globular particles, a 10° tolerance in projection angle will result in an uncertainty of 0.8 nm. A projection sector of 10° leads to 244 different views and the dataset should be separated in as many classes.

The alignment and clustering steps are highly interdependent and will be used iteratively to improve the quality of the class averages. A better alignment will lead to an improved clustering which will impact the resolution of the class averages. Such high resolution class averages will further improve the alignment of the original images in a multi-reference alignment protocol.

Three-dimensional model building

The class averages correspond to distinct views of the particle but their projection direction is not known a priori. A common-line based method was designed to attribute the relative projection directions of a set of class averages [11], but this method may lead to ambiguous results especially when several conformations of the particle coexist. Two experimental methods, based on the acquisition of tilted images of the same particle have been developed.

In the tomography approach a goniometric electron microscopy stage is used to record tilted views of the same object, typically between +70° and -70° with angular increments of 1 or 2° [12]. After alignment of the images on a common origin, a 3-D model can be calculated for each particle by combining all views for which the exact projection direction is experimentally determined by the position of the tilt axis and the tilt angle. This method suffers from several drawbacks that have been partially addressed. Electron dose and therefore radiolytic damage accumulates during the sequential acquisition of around 140 images of the same particle, but the development of very sensitive low noise cameras restricts the total accumulated dose to 20 to 40 e-2, enough to reach a resolution of 3-4 nm. The data collection scheme produces a wedge of missing projections and this problem can be overcome by turning the grid 90° in plane and by recording a second tilt series. The missing wedge will then be reduced to a missing pyramid and the quality of the reconstructed volume is generally improved. Alternatively independently reconstructed single particle volumes can be aligned in 3-D and averaged. Since for each orientation of the particle the missing information is different, the averaged volume is essentially devoid of missing wedge artifacts.

In the random conical tilt method, the data collection strategy consists in recording first a 45-60° tilted view of an electron microscopy field containing several particles and, in a second exposure, an untilted view of the same field [13]. The untilted images are analyzed as single particles thus producing classes containing several images of similarly oriented particles each differing by their in-plane or azimuthal angle. This angle, the position of the tilt axis and the tilt angle, informs about the viewing direction of each corresponding tilted image and allows calculating a 3-D model for each class of images. With this data collection strategy, irradiation is limited to a single exposure and the missing information is restricted to a cone.

Model refinement

The experimental 3-D models are considered as low resolution "starting models" that will be used to determine the viewing direction of independently determined class averages obtained from a much larger image dataset. The starting models will be computationally "reprojected" along many directions to generate a set of reference images of known projection direction. The subsequent alignment of the class averages, or of the original images, against these reprojections in a process called reference-matching will determine the viewing direction for each high resolution class average and lead to an improved, or refined, 3-D model.

Address the dynamic properties of the complexes

The fast vitrification of the specimen in liquid ethane preserves the hydrated state of the protein complexes, but also cryo-immobilizes their different conformational states. This heterogeneity can hinder high-resolution structure refinement if different conformations are combined in a single class; however it contains essential information about the dynamic properties of the sample. For isolated particles, different conformations can be sorted out computationally when the data set is large enough, thus providing information on mobile parts of the complex. For transient multi-component systems, the relative abundance of the components present in an equilibrium state informs about the interaction constants. It is therefore crucial to detect and separate the conformational states of the specimen both to improve the resolution of each individual state and to describe the dynamics of the examined protein complex. Several methods exist to detect and visualize de novo structural heterogeneities in the specimen [14]. Rough movements of domain can be detected by either single-particle tomography or random conical-tilt experiments. More subtle differences can be tracked by using Eigen-analysis of resampled cryo-EM images [15, 16]. In this method the images dataset has to be aligned to an average reference structure to determine the relative particle orientation. A large number of volumes is built from a randomly created subset of the dataset and these volumes are subjected to multivariate statistical analysis followed by hierarchal classification to identify the structural differences.

Results

The general transcription factor TFIID

Gene expression programs in metazoans are under tight control to achieve growth, development and differentiation of the tissues that make up living organism. A large extend of regulation is performed at the transcriptional level when the information carried by specific DNA sequence (the genes) is transcribed into a messenger RNA molecule (mRNA) by the RNA polymerase II enzyme. Misregulated gene expression underlies many human pathologies, as indicated by germ-line and somatic mutations in transcription regulatory genes that lead to genetic disease [1720], developmental syndromes[21], neurological diseases [22], epigenetic perturbations [23] and cancer [21, 24]. Most intensively studied is the initiation step, which determines which genes are turned on to express a specific piece of genetic information in response to external signaling events. Initiation of transcription is controlled by a large number of multiprotein complexes whose action results in the assembly of a transcription Preinitiation Complex (PIC) on the promoter DNA upstream of the coding sequence. The ultimate goal of the PIC is to position the RNA polymerase II at the transcription start site and to initiate the synthesis of mRNA [25, 26].

The general transcription factor TFIID is a key player in the initiation process since it is the first factor to interact with promoter DNA and directs the following steps that result in the onset of transcription [27]. This 1 MDa TFIID multiprotein complex contains a protein recognizing the TATA-box in the gene promoter (TBP) and 13 T BP A ssociated F actors (TAFs) whose sizes vary between 10 and 250 kDa. To modulate gene expression in response to external signals, small activator or repressor proteins bind upstream of gene promoters and recruit the transcriptional coactivators and the general transcriptional machinery. In this process human TFIID not only recognizes the promoter DNA region of genes but also acts as a transcriptional coactivator by interacting with several such activators like p53, Sp1 and c-Jun [28]. TFIID thus acts as a bridge between transcriptional activator proteins and the PIC.

Structure of TFIID, a hybrid approach

How is gene transcription initiated and what is the role of activators and co-activators in this process? How is the activation signal transmitted from the activator to the general transcription factors and finally to the RNA polymerase? How do cells integrate and respond to regulatory signals? What sets different gene expression levels in specific cell types? Which errors in this process lead to disease? To answer these questions the functions of TFIID have to be explained in mechanistic details by determining its biophysical and structural properties.

A large body of structural data is available at atomic scale for single subunit TAF domains and small TFIID sub-assemblies such as the TATA box binding protein [29], the histone-fold containing TAF heterodimers [3032], the N-terminus of TAF5 [33], The HEAT repeats of TAF6 [34] or the TAF1 bromodomain [35]. Single crystal X-ray diffraction or NMR spectroscopy have determined the atomic structures of parts of TFIID that sum up to about 40% of its total mass but little is known about the organization of these bricks into a functional TFIID assembly that is active in transcription. The full complex is reluctant to crystallization since this large multisubunit complex is difficult to produce in large quantities and with purity suitable for crystal growth. This observation is general to most fields of biology and led to the development of so-called hybrid methods that integrate structural information from different sources. Cryo-electron microscopy is instrumental to this integration since it provides medium (10-20 Å) resolution maps of large complexes into which atomic-scale information obtained by X-ray crystallography or NMR spectroscopy can be fitted [36].

Low-resolution studies by negative-stain [37, 38] and, more recently cryo-EM [39] have revealed the general shape of TFIID and allowed approximate localization of several subunits by means of antibody labeling [40, 41]. Samples used in these studies were prepared from endogenous sources and resulted in spatial resolutions that were seriously hampered by the dynamic properties, heterogeneous nature and the low abundance of material. The lack of recombinant TFIID complexes of suitable quality and quantity for molecular level studies has been an insurmountable bottleneck to date for structural but also for functional studies. Recent developments in recombinant protein production were instrumental for solving the structure of the core-TFIID complex at 11 Å resolution most probably because several sources of heterogeneity arising from TAF isoforms and posttranslational modifications were reduced [42].

To gain insights into the function of TFIID, the interaction of yeast TFIID with the promoter DNA was studied in the presence of TFIIA (a general transcription factor required for specific recognition of the TATA element) and the Rap1 activator [43]. The Cryo-EM results revealed the network of interactions and the conformational changes occurring during complex formation. The path of DNA was detected in the complex and these findings extended our understanding on the DNA recognition modalities by TFIID in the presence of trans-acting factors. The resulting structure has shed new light on the intramolecular communication pathways conveying transcription activation signals through the TFIID coactivator (Figure 3). This study revealed an interaction between TFIIA and Rap1 that form a protein bridge between TBP and the DNA-bound Rap1 which results in a large change in the position of TFIIA and of TBP. Interestingly, the concomitant binding of promoter DNA to TFIID-bound Rap1 and to TBP loops out the intervening DNA, thereby accommodating variable distances between Rap1 binding sites and transcription start site.

Figure 3
figure 3

Structure of TFIID. (A) Structure of TFIID and of its complexes with TFIIA, DNA and the Rap1 activator. (B) Structure of the core-TFIID at 11.4 Å resolution and fitting of the known atomic structures. (C) Position of the core-TFIID within the complete TFIID complex. The bar represents 7.6 nm in (A), 5 nm in (B) and 5.8 nm in (C)

Conclusions

The development of cryo-EM and image analysis software has provided new insights into the structural organization and the dynamic reorganization of large macromolecular complexes. Recent improvements in electron microscopy instrumentation allow for automated processing and recording of large image datasets, with an improved image quality due to more stable cold stages and advanced electron optics. With these developments unprecedented close to atomic resolutions were obtained for highly symmetric biological assemblies such as icosahedral viruses. It can be anticipated that the analysis of large datasets as well as new data acquisition strategies that compensate for particle movement during acquisition, will routinely provide molecular models better than 5A in the near future. The analysis of molecular flexibility still requires algorithmic developments to describe concomitantly the high resolution structure and the continuous conformational space of a macromolecular complex. The unique asset of Cryo-EM however resides in the possibility to record images of single particles which collectively contain both structural and dynamic information.