Proximity-based proteomics reveals the thylakoid lumen proteome in the cyanobacterium Synechococcus sp. PCC 7002

Cyanobacteria possess unique intracellular organization. Many proteomic studies have examined different features of cyanobacteria to learn about the intracellular structures and their respective functions. While these studies have made great progress in understanding cyanobacterial physiology, the conventional fractionation methods used to purify cellular structures have limitations; specifically, certain regions of cells cannot be purified with existing fractionation methods. Proximity-based proteomics techniques were developed to overcome the limitations of biochemical fractionation for proteomics. Proximity-based proteomics relies on spatiotemporal protein labeling followed by mass spectrometry of the labeled proteins to determine the proteome of the region of interest. We performed proximity-based proteomics in the cyanobacterium Synechococcus sp. PCC 7002 with the APEX2 enzyme, an engineered ascorbate peroxidase. We determined the proteome of the thylakoid lumen, a region of the cell that has remained challenging to study with existing methods, using a translational fusion between APEX2 and PsbU, a lumenal subunit of photosystem II. Our results demonstrate the power of APEX2 as a tool to study the cell biology of intracellular features and processes, including photosystem II assembly in cyanobacteria, with enhanced spatiotemporal resolution. Supplementary Information The online version contains supplementary material available at 10.1007/s11120-020-00806-y.


Introduction
The intracellular spatial organization of cyanobacteria is unique among prokaryotes. As Gram-negative bacteria, cyanobacteria possess the typical inner and outer membrane systems enclosing a cell wall comprised of peptidoglycan. However, most cyanobacterial species also possess thylakoid membranes, an extra set of intracellular membranes where photosynthesis occurs, as well as carboxysomes, proteinaceous organelles used for carbon fixation. The distinctive intracellular spatial organization and protein complexes found within cyanobacteria have drawn particular interest to the cell biology of these organisms. Furthermore, cyanobacteria can also be used as a model for plant chloroplasts, as they share structural and biochemical similarities and have a common evolutionary ancestor. As a result, many proteomic studies of specific cyanobacterial structures, i.e. thylakoid membranes, have been performed (Agarwal et al. 2010;Baers et al. 2019;Cheregi et al. 2015;Fulda et al. 2000;Gao et al. 2014a;Herranen et al. 2004;Huang et al. 2002Huang et al. , 2004Huang et al. , 2006Kashino et al. 2002;Kurian et al. 2006a;Li et al. 2012;Liberton et al. 2016;Oliveira et al. 2016;Pisareva et al. 2007Pisareva et al. , 2011Rajalahti et al. 2007;Rowland et al. 2010;Sergeyenko and Los 2000;Srivastava et al. 2005;Trautner and Vermaas 2013;Wang et al. 2000;Zhang et al. 2009). These studies have made great progress towards understanding the physiology of cyanobacteria, but lack the spatial resolution necessary to resolve the composition of many intracellular structures resistant to traditional biochemical fractionation and purification methodologies.
Previously, proteomic studies of cyanobacterial components were limited to fractionation and separation techniques which could introduce artifacts and result in ambiguous cellular localizations. For example, mechanical disruption of cells often leads to cross-contamination between fractions and is, therefore, impractical for nonmembrane-bound regions or complex structures such as the thylakoid lumen. However, a technique termed proximity-based proteomics was recently developed in mammalian cells to allow for proteomic analysis of cellular regions or protein interactomes that were unable to be purified using existing techniques (Kim and Roux 2016). Proximity-based proteomics relies on targeting a specific enzyme to a region of interest as a protein fusion to a full-length protein or signal sequence. The enzyme then performs chemistry in live cells to label proteins within a small radius (10-20 nm) of itself (Rhee et al. 2013). After cell lysis, the labeled proteins can then be separated from unlabeled proteins and analyzed using mass spectrometry. Several proximity-based proteomics techniques exist, but the most common use enzymes that biotinylate proteins (Kim and Roux 2016). We chose to use APEX2, an engineered ascorbate peroxidase that catalyzes a reaction between biotin-phenol (BP) and hydrogen peroxide (H 2 O 2 ) to create a BP radical that covalently attaches to proteins (Hung et al. 2016;Lam et al. 2015) (Fig. 1a). The reactivity and short half-life of biotin-phenol gives this technique a high-spatial specificity. Furthermore, APEX2 has been shown to be catalytically active in multiple cellular compartments and exhibits a short (1 min) labeling time, allowing for high temporal specificity (Hung et al. 2016;Lam et al. 2015).
Here, we demonstrate the feasibility and potential of a proximity-based proteomics technique using APEX2 in Synechococcus sp. PCC 7002 (PCC 7002), a model cyanobacterium and promising chassis for biotechnological applications (Markley et al. 2015;Ruffing et al. 2016;Xu et al. 2011). To showcase the ability of APEX2 to interrogate regions of the cell where proteomics studies have not yet been possible due to limitations of existing biochemical methods, we targeted APEX2 to the thylakoid lumen by fusing it to PsbU, an extrinsic photosystem II (PSII) protein (Nishiyama et al. 1998), and identified the PsbUassociated proteome by mass spectrometry. Determining the thylakoid lumen proteome is vital for understanding the physiological roles of the thylakoid membrane system and the reactions of oxygenic photosynthesis. Fig. 1 APEX2-dependent labeling specifically biotinylates proteins in PCC 7002. a APEX2 reacts with BP in the presence of H 2 O 2 to produce a BP radical. Biotinylated proteins are generated when the BP radical reacts with peptides, forming a covalent bond. b Cells expressing GFP and GFP-APEX2 (green) imaged using fluorescence microscopy. Scale bars are 2 µm. Chlorophyll channel (red) indicates thylakoid membrane. c 5 µg of protein from cells expressing either GFP or GFP-APEX2 was separated by SDS-PAGE and transferred to a membrane for immunoblot analysis using streptavidin to detect APEX2 activity. anti-RbcL antibody was used as a loading control and the same membrane was stripped and re-probed with anti-GFP antibody to check for expression of GFP (28 kDa) or GFP-APEX2 (54 kDa)

Characterization of APEX2 labeling in PCC 7002
To determine if APEX2-dependent labeling of proteins was possible in cyanobacteria, GFP or GFP-APEX2 was incorporated into the genome of PCC 7002. Cytoplasmic localization of GFP and GFP-APEX2 was confirmed using fluorescence microscopy (Fig. 1b). To perform APEX2dependent biotinylation, cells were incubated with BP for 30 min and then exposed to H 2 O 2 for 1 min. After quenching the reaction, cells were lysed by bead beating and a streptavidin blot confirmed the ability of APEX2 to biotinylate proteins in PCC 7002 (Fig. 1c). Biotin labeling was only detected in the presence of APEX2, BP, and H 2 O 2 , demonstrating reaction specificity in vivo. Furthermore, the rapid reaction enables precise temporal control of labeling.

Purification of cytoplasmic APEX2-biotinylated proteins from PCC 7002
Proteins biotinylated in vivo were enriched for further analysis by affinity purification. APEX2-dependent biotinylation was performed in cells expressing GFP or GFP-APEX2 in the cytoplasm. Affinity purification of biotinylated proteins was performed by incubating cellular lysates with streptavidin-coated magnetic beads. The background level of biotinylation was very low as biotinylated protein was only detected in cells expressing GFP-APEX2, but not cells expressing GFP alone (Fig. 2a, b). To confirm cytoplasmic APEX2 labels cytoplasmic proteins, immunoblots using antibodies against expected cytoplasmic proteins were performed (Fig. 2c, d). Since the BP radical reacts with proteins within a 10-20 nm radius of its origin, APEX2 itself is expected to be biotinylated. Biotinylated GFP-APEX2 fusion protein was detected using an anti-GFP antibody, confirming the expected self-reactivity (Fig. 2c). Additionally, the large subunit of rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase), RbcL, an abundant cytoplasmic protein, was only enriched on beads incubated with cells expressing GFP-APEX2 as detected using a specific anti-RbcL antibody (Fig. 2d). The high molecular weight RbcL band in lysates is likely the result of higher-order complexes formed in vivo; RbcL assembles into large protein assemblies to form the carboxysome, a bacterial microcompartment (Cameron et al. 2013). Following the more stringent enrichment and elution process, these complexes have been disrupted and RbcL migrates as expected.

PsbU-APEX2 and cytoplasmic APEX2 label different sets of proteins
APEX2 was fused to a protein localized to the thylakoid lumen to demonstrate the ability of proximity-based in vivo. Cells expressing GFP or GFP-APEX2 were incubated with BP and exposed to H 2 O 2 . Biotinylated proteins were captured from cell lysates on streptavidin coated magnetic beads. Fractions from each enrichment step were separated by SDS-PAGE and then silver stained for contrast or transferred to a nitrocellulose membrane and probed with specific antibodies. a Silver stain of noted fractions from unlabeled (GFP) or labeled (GFP-APEX2) lysates. b Biotinylated proteins are only detected in fractions containing APEX2 and are enriched on streptavidin beads. c Expected self-labeling (biotinylation) of GFP-APEX2 (54 kDa, marked with *) is confirmed by immunoblotting against GFP. d RbcL (55 kDa), a cytoplasmic protein expected to be labeled by GFP-APEX2 was specifically captured on beads incubated with GFP-APEX2 proteomics to interrogate subcellular regions that have not been successfully purified using traditional methods. To accomplish this, the localizations of several candidate proteins fused to GFP were examined by fluorescence microscopy. Of these candidates, PsbU, an extrinsic subunit of PSII, exhibited the most promising localization and therefore was selected to target APEX2 to the thylakoid lumen. The PsbU-APEX2 gene fusion is expressed from neutral site 1 in the chromosome under a constitutive promoter. APEX2dependent labeling and biotinylated protein purification was performed on cells expressing thylakoid lumenal PsbU-APEX2 and cells expressing cytoplasmic GFP-APEX2. A silver stain of purified biotinylated proteins from GFP-APEX2 and PsbU-APEX2 shows different banding patterns, suggesting that a different set of proteins is labeled by the different APEX2 fusions (Fig. 3a). The thylakoid localization of PsbU-GFP was confirmed using fluorescence microscopy (Fig. 3b). The localization of PsbU-GFP was used as a proxy for the localization of PsbU-APEX2, since GFP and APEX2 are both C-terminal tags of a similar size. To identify the proteins labeled by the different APEX2 fusion proteins, biotinylated proteins were purified from two independent samples of both PsbU-APEX2 labeled and GFP-APEX2 labeled cells, and the resulting peptides following tryptic digestion were separated and detected using LC-MS/MS. Protein identification required a minimum of 2 spectral counts and 2 peptides in each sample. 99 proteins were identified exclusively in both PsbU-APEX2 replicates and 297 proteins were identified exclusively in both GFP-APEX2 replicates. 438 proteins were identified in both PsbU-APEX2 and both GFP-APEX2 replicates (Fig. 3c).

Biotinylated proteins enriched in PsbU-APEX2 samples
Mass spectrometry data were further analyzed to determine which proteins were labeled by PsbU-APEX2. PsbU is a lumenal extrinsic subunit of PSII and therefore the majority of PsbU-APEX2 is expected to be localized to the thylakoid membrane or lumen. However, because PsbU-APEX2 is translated in the cytoplasm and then translocated to its final destination in the lumen, we also expected that a small population of PsbU-APEX2 could be present in the cytoplasm, resulting in labeling of cytoplasmic proteins. Therefore, GFP-APEX2 was used as a control instead of a sample lacking APEX2/BP/H 2 O 2 , since it would control for the Fig. 3 PsbU-APEX2 and Cytoplasmic APEX2 label different sets of proteins. a Silver stain of the biotinylated protein purification from PCC 7002 expressing GFP, GFP-APEX2, PsbU, PsbU-GFP, or PsbU-APEX2 after APEX2dependent biotinylation. b Localization of PsbU-GFP and GFP-APEX2 were visualized with fluorescence microscopy (Green). Chlorophyll channel (red) indicates thylakoid membrane. Scale bars are 2 µm. c Biotinylated proteins from strains expressing GFP-APEX2 and PsbU-APEX2 identified by mass spectrometry. d Functional categories of the proteins enriched in PsbU-APEX2 samples obtained from quantitative analysis of mass spectrometry data (number of proteins; percentage of 123 total proteins). The proteins used for this analysis are listed in Table 1. (Also see Supplementary Tables 1 and 2) small cytoplasmic population of PsbU-APEX2 in addition to proteins nonspecifically bound to the streptavidin beads and endogenously biotinylated proteins.
An analysis of the mass spectrometry data using Max-Quant Label Free Quantitation (LFQ) intensities and normalized spectral counts was used to determine the identity of proteins specifically enriched with PsbU-APEX2 compared to the GFP-APEX2 control (Old et al. 2005). As part of this analysis, proteins were organized by descending enrichment value (log 2 (PsbU-APEX2 LFQ intensity/GFP-APEX2 LFQ intensity) or log 2 (PsbU-APEX2 normalized spectral counts/ GFP-APEX2 normalized spectral counts). A true-positive list was constructed from PCC 7002 proteins homologous to Synechocystis sp. PCC 6803 (PCC 6803) proteins with evidence for thylakoid lumen or thylakoid membrane localization (Agarwal et al. 2010;Aldridge et al. 2008;Baers et al. 2019;Fulda et al. 2002;Heinz et al. 2016;Herranen et al. 2004;Kashino et al. 2002Kashino et al. , 2006Komenda et al. 2006;Liberton et al. 2016;Ohkawa et al. 2002;Pisareva et al. 2011;Rajalahti et al. 2007;Rengstl et al. 2011;Rowland et al. 2010;Sacharz et al. 2015;Schultze et al. 2009;Srivastava et al. 2005;Wang et al. 2000;Xu et al. 2008;Zak et al. 1999Zak et al. , 2001Zhang et al. 2004). A false-positive list of PCC 7002 proteins was constructed from homologous proteins found in the soluble proteome of PCC 6803 that do not have signal sequences or transmembrane helices, as these proteins are expected to be cytoplasmic (Baers et al. 2019;Choi et al. 2000;Fulda et al. 2006;Fuszard et al. 2013;Gan et al. 2005;Gao et al. 2014bGao et al. , 2015Gao et al. , 2009Kurian et al. 2006b;Mata-Cabana et al. 2007;Mehta et al. 2014;Mikkat et al. 2014;Pandhal et al. 2009;Pérez-Pérez et al. 2006;Plohnke et al. 2015;Rowland et al. 2011;Simon et al. 2002;Slabas et al. 2006). As expected, proteins from the true-positive list have significantly higher enrichment values than proteins from the false-positive list (Fig. S2). Using the true-and false-positive lists, we identified a cutoff value to discriminate between enriched proteins and those that bound to the beads non-specifically or were enriched by GFP-APEX2. This analysis was performed using both enrichment values for both PsbU-APEX2 replicates (Table S1). Therefore, two analyses were performed on each PsbU-APEX2 replicate, one using enrichment values calculated with LFQ intensity values and a second using enrichment values calculated with normalized spectral counts. To be as stringent as possible, only the 123 proteins above the cutoff in all four analyses were reported, which we called PsbU-APEX2-enriched proteins (Table 1). The PsbU-APEX2 enriched proteins include a subset of the 99 proteins exclusive to the PsbU-APEX2 replicates, as well as additional proteins enriched in abundance over the GFP-APEX2 replicates. Major functions of enriched proteins are shown in Fig. 3d.
The list of 123 PsbU-APEX2 enriched proteins includes many proteins expected to be present within the thylakoid lumen and membrane (Table 1). The majority of proteins (73) have PCC 6803 homologs previously localized to thylakoid membrane or lumen (See Table 1). Out of the 50 proteins that have not been previously localized to the thylakoid membrane, 17 have no PCC 6803 homolog, 12 have no localization data for specific cellular structures or regions, and 21 have only previously been localized to somewhere other than the thylakoid membrane or lumen, such as the plasma membrane or periplasm. This analysis of previous localizations of homologous proteins in the literature was performed in lieu of experimental validation of the localization of enriched proteins. There is no other method to biochemically separate the thylakoid lumen from other intracellular structures, and while fluorescence microscopy of GFP-tagged proteins could be used to determine if a protein associates with the thylakoid membranes, it does not have the resolution to determine if a protein is on the cytoplasmic or lumenal side of the thylakoid membrane. Previous localizations of homologous proteins were used because most localization studies in cyanobacteria have been done in other species, specifically PCC 6803, and very few have been completed in PCC 7002. To further support the hypothesis that PsbU-APEX2 enriched proteins are part of a cellular compartment and not cytoplasmic, the presence of signal sequences and transmembrane helices were predicted from their protein sequences (see Table S2). The majority (105) of enriched proteins possess either a signal sequence or at least one transmembrane helix.
cytoplasmic side of the thylakoid membrane demonstrates the specificity of PsbU-APEX2 to label proteins within the lumen and thylakoid membrane. The cytoplasmic facing subunits that were enriched in the PsbU-APEX2 samples are PsaC and PsaD. These subunits and PsaE are within the top 15% of proteins ranked by membrane association, and are more tightly associated with the membrane than the phycobilisome proteins and the cytoplasmic subunits of NDH and ATP synthase (Gao et al. 2015). PsbU-APEX2 will be more efficient at labeling cytoplasmic side proteins closely associated with the thylakoid membrane, like PsaC and PsaD, since proteins closely associated with the thylakoid membrane are within the biotinylation radius of lumenal PsbU-APEX2 for more time. Following that same logic, freely diffusing cytoplasmic GFP-APEX2 is likely more efficient at biotinylating freely diffusing cytoplasmic proteins than proteins closely associated with the thylakoid membrane. Many factors involved in the assembly of PSII were also PsbU-APEX2 enriched (Fig. 5). Proteins both early and late in the assembly process were enriched. SecY and Alb3, proteins involved in inserting the PsbA into the membrane were enriched (Chidgey et al. 2014;Linhartová et al. 2014;Sachelaru et al. 2013). PratA, a protein that is thought to deliver Mn 2+ to PsbA, and CtpA, which processes the C-terminal tail of PsbA, have previously been localized to the periplasm and plasma membrane, respectively, but were exclusively found in PsbU-APEX2 samples in this study (Anbudurai et al. 1994;Klinkert et al. 2004;Komenda et al. 2006;Schottkowski et al. 2009;Stengel et al. 2012;Zak et al. 2001). PsbP, Ycf48, and Psb27 are PSII assembly factors enriched by PsbU-APEX2 that are thought to be localized within the thylakoid lumen (Heinz 2016). The assembly factors Ycf39 and Psb28, along with the PSII repair factor Psb29, are on the cytoplasmic side of membranes and were not enriched by PsbU-APEX2 (Becková et al. 2017;Heinz 2016). The lumenal proteins YtfC and A2294 (homologous to sll0408 in PCC 6803) are homologs to factors important for PSII assembly in plants that were also PsbU-APEX2 enriched (Heinz 2016). Proteins involved in PSII repair were also enriched by PsbU-APEX2. For example, Psb32, a protein that protects PSII from photodamage and aids in PSII repair, was exclusive to PsbU-APEX2 samples (Wegener et al. 2011). Additionally, FtsH2, a protein involved in the repair of damaged PSII, was also enriched in PsbU-APEX2

Fig. 4 Enrichment of Protein Complex Subunits in the Thylakoid
Membrane. The protein complexes present in the thylakoid membrane are color-coded by their enrichment; the key is located on the right side of the figure. Light and dark green subunits are both enriched in the PsbU-APEX2 samples over the GFP-APEX2 samples; the dark green samples were unique to the PsbU-APEX2 samples, while the light green subunits were also identified in the GFP-APEX2. Yellow subunits represent proteins identified in both PsbU-APEX2 and GFP-APEX2 samples but not enriched in PsbU-APEX2. Red subunits are proteins unique to the GFP-APEX2 sam-ples. Gray proteins were not identified by mass spectrometry in this study. The identity of each protein complexes is either above or below the complex. The proteins associated with a specific complex are named with the following prefixes followed by the letter or number the protein is labeled with: Psb for PSII, Pet for cyt b 6 f, Psa for PSI, Ndh for NADH dehydrogenase, and Atp for ATP synthase. The exceptions to this are Fd (PetF) and FNR (PetH). Note-there are two different proteins both called AtpG; the yellow subunit refers to A0733 and the light green subunit refers to A0737 1 3 samples (Komenda et al. 2006(Komenda et al. , 2010. PsbQ, a protein present in the most active PSII fraction that is thought to define the complete assembly of PSII was also exclusive to PsbU-APEX2 (Roose et al. 2007). The variety of early and late assembly factors enriched by PsbU-APEX2 demonstrate the ability of APEX2-based proximity-based proteomics to capture assembly intermediates of protein complexes of low abundance. In the future, this technique could be used to gain novel insights into low abundance assembly intermediates of protein complexes in other processes.
Many proteins involved in other cellular processes were localized to the thylakoid membrane and lumen in this study. At least ten proteases were enriched in PsbU-APEX2, including the thylakoid signal peptidase LepB (Zhbanko et al. 2005). PsbU-APEX2 enriched proteins also include proteins involved in transport of numerous different known and unknown substrates. Many of the proteins involved in transport and protein trafficking, assembly, and processing have previously been localized to the periplasm or the plasma membrane, and have not been localized to the thylakoid membrane. Furthermore, many other proteins enriched by PsbU-APEX2 have been localized to the plasma membrane and/or the periplasm in addition to the thylakoid membrane. The biological relevance of the plasma membrane and periplasmic proteins enriched by PsbU-APEX2 is unclear. It is possibly an artifact of overexpression of PsbU-APEX2. However, the cyanobacterium Gloeobacter violaceus does not contain a thylakoid membrane (Mareš et al. 2013) and instead performs oxygenic photosynthesis in the inner membrane. If the thylakoid membrane and lumen originated from the plasma membrane and periplasmic space, respectively, perhaps it is not surprising that some proteins are found in both cellular fractions. Furthermore, ultrastructural studies of PCC 6803 using cryo-electron tomography identified sites of contact between the thylakoid and plasma membrane (Rast et al. 2019). Additional possibilities include dual localization of proteins, low fidelity of the sorting mechanism of translocated proteins into the lumen and the periplasm, and post-translocation sorting of proteins into their final localization. Further experiments are needed to determine the biological relevance of the periplasmic and inner membrane proteins observed.
In addition to large protein complexes involved in energy metabolism, PSII assembly factors, and proteases, the PsbU-APEX2-enriched proteins include proteins with other functions. For example, several thioredoxins, including the thylakoid specific thioredoxin A2695, were enriched (Zhu et al. 2016). A beta-carotene desaturase (A1248) was also identified. Proteins involved in maintaining the cell wall (A0339 and A0578) and S-layer proteins (A2605 and A1020) were also enriched. Another protein (A1522) with homology to biotin carboxylases was also enriched by PsbU-APEX2 in this study. Additionally, there are several proteins that have not been previously localized and have unknown functions (A1127, A1207, A1664, A2166, A2439, A2578, A2847, and G0157). These proteins could be the subject of future research.
The experiments performed here demonstrate the potential of APEX2 to interrogate the proteome of regions of cyanobacteria that have not been previously biochemically purified, like the thylakoid lumen. It also demonstrates the ability of APEX2 to capture low abundance protein complex Fig. 5 Enrichment of PSII assembly factors. The PSII assembly and repair components known in PCC 6803 are shown. The proteins are color-coded by their enrichment, the key is located on the top. Light and dark green subunits are both enriched in the PsbU-APEX2 samples over the GFP-APEX2 samples; the dark green samples were unique to the PsbU-APEX2 samples, while the light green subunits were also identified in the GFP-APEX2. Yellow subunits represent proteins identified in both PsbU-APEX2 and GFP-APEX2 samples but not enriched in PsbU-APEX2. Red subunits are proteins unique to the GFP-APEX2 samples. Gray proteins were not identified by mass spectrometry in this study. The prefix "Psb" should be added to any proteins labeled with only a letter or number to obtain the name of the protein assembly intermediates. In the future, this technique can be used to monitor the proteomes of specific regions of the cell under different environmental conditions. Additionally, APEX2 can be used to determine the topology of membrane proteins and identify candidates for protein-protein interactions (Lee et al. 2016;Lobingier et al. 2017;Mavylutov et al. 2018;Paek et al. 2017). Proximity-based proteomics using APEX2 has the potential to be a powerful tool in the pursuit of understanding the physiology of photosynthetic organisms.

Creation of PCC 7002 strains
The psbU gene (SynPCC7002_A0322) was amplified from PCC 7002 while APEX2 was amplified from a plasmid gifted to us by Alice Ting (Addgene plasmid # 72,558; http://n2t.net/addge ne:72558 ; RRID:Addgene_72558). Plasmids were assembled using Gibson Assembly (Gibson et al. 2009) with neutral site 1 as the homology arms, p ccmK2 as the promoter (Cameron et al. 2013;Ruffing et al. 2016), and kanamycin resistance for selection. The Gibson reactions were transformed into DH5α E. coli, and minipreps of liquid cultures started from single colonies were performed to collect plasmid. Plasmid was transformed into PCC 7002 (Stevens and Porter, 1980) and colonies containing the desired insert were serially passaged in the presence of antibiotic until segregated.

Biotinylation of proteins by APEX2 in PCC 7002
Biotinylation of proteins was performed using a modified protocol from Hung et al. and Hwang and Espenshade that was optimized for PCC 7002 (2016;. Briefly, 50 mL cultures of PCC 7002 strains were grown in A + media (Stevens et al. 1973) in air at 37 °C with a light intensity of 185 µmol photons m −2 s −1 for 2 days to an OD 730 of about 0.5. Several µL of culture were saved to image on the microscope. The culture was pelleted at 4300×g for 10 min at 4 °C. The supernatant was poured off and cells were resuspended in 4 mL A + medium with 2.5 mM BP and transferred to a six-well plate. Six-well plates were incubated shaking in air at 37 °C with a light intensity of 185 µmol photons m −2 s −1 for 30 min. Samples were then pelleted in a 1.5 mL tube and resuspended in 1 mL phosphate buffered saline pH 7.8 (Bio-Rad) (PBS). 10 µL of 100 mM H 2 O 2 was added and cells were inverted for 30 s before pelleting for 30 s. Supernatant was removed and cells were resuspended in quencher solution (PBS with 10 mM sodium ascorbate, 5 mM Trolox and 10 mM sodium azide) and pelleted. This step was repeated two additional times. The supernatant was removed and the cell pellets were frozen at − 80 °C for storage and to facilitate cell lysis.

Cell lysis
The cell pellet was resuspended in RIPA lysis buffer with quenchers (50 mM Tris pH 7.4, 150 mM NaCl, 0.1% (w/v) SDS, 0.5% (w/v) sodium deoxycholate, 1% (v/v) Triton X-100, 10 mM sodium ascorbate, 5 mM Trolox, 10 mM sodium azide, 1 mM PMSF). Cells were lysed using bead beating, with 30 cycles of 20 s on and 20 s off on ice. The lysate and beads were pelleted at 2000×g and the supernatant was collected. The supernatant was then pelleted for 5 min at 15000×g and the supernatant was collected and flash frozen.

Protein concentration measurement
The protein concentration of cell lysate was quantified using the Pierce 660 nm Protein Assay (Thermo Fisher).

Purification of biotinylated proteins
Streptavidin magnetic beads (Pierce) were washed twice in RIPA lysis buffer (50 mM Tris pH 7.4, 150 mM NaCl, 0.1% (w/v) SDS, 0.5% (w/v) sodium deoxycholate, 1% (v/v) Triton X-100) and the supernatant was removed. 800 µL of RIPA lysis buffer with quenchers containing 50 µg of protein for every 50 µL of streptavidin magnetic beads was added. Beads were incubated with protein for 1 h at room temperature on a rotator. The beads were then washed twice with RIPA lysis buffer, once with 1 M KCl, once with 0.1 M Na 2 CO 3 , once with 8 M urea in 10 mM Tris pH 7.5, and once again with RIPA lysis buffer.

Elution of biotinylated proteins for gels and blots
Beads were denatured at 98 °C for 10 min in 30 µL of elution buffer (3X Laemmli buffer, 2 mM biotin, 20 mM DTT) to elute biotinylated proteins. The eluate was collected and diluted with 60 µL of water to run on gels.

Preparation for mass spectrometry
Beads were washed an additional 5 times with 50 mM NH 4 HCO 3 containing 0.2% (w/v) sodium deoxycholate. The supernatant was removed and beads were resuspended in 50 µL 10 mM TCEP and 40 mM chloroacetamide and incubated at 37 °C for 30 min to reduce and alkylate the proteins. 150 µL water containing 0.225% (w/v) sodium deoxycholate and 0.2 µg Promega sequencing grade modified trypsin was added. An on-bead digestion was performed overnight on a rotator at 37 °C. Beads were pelleted and the supernatant was collected. Formic acid was added to 2% (w/v) to stop digestion. Sodium deoxycholate was removed using 3 phase transfers with ethyl acetate. The samples were desalted using in-house STAGE tips with 3 M Empore SDB-RPS membrane and dried using a vacuum centrifugation.

Silver stain protocol
Proteins were separated on a 10% SDS-PAGE gel and stained using the short silver nitrate staining protocol described in by Chevallet et al. (2006).

Immunoblotting
Proteins were separated on a 10% SDS-PAGE gel and immunoblots were performed following the protocol from Green and Sambrook (2012). Protein was transferred to a nitrocellulose membrane, or a polyvinylidene fluoride (PVDF) membrane if fluorescent secondary antibodies were used. After blocking membranes overnight, membranes were incubated with GFP (Invitrogen, cat. no. A6455) or RbcL (Agrisera, cat. no. AS03037) antibodies, or streptavidin-HRP (Life Technologies, cat. no. R960-25). Membranes probed for GFP or RbcL were then incubated with a secondary antibody conjugated to HRP or AlexaFluor 488 (Thermo Fisher, cat. no. A-11008 or cat. no. 31460). Membranes were visualized using chemiluminescence after exposure to the Clarity Western ECL substrate (Bio-Rad) or fluorescence. If necessary, blots were stripped using ReBlot Plus Mild Solution (Millipore).

Fluorescence microscopy
Cells were spotted onto an agar pad (A + with 1% agar) and placed onto a microscope slide. Cells were imaged on a customized Nikon TiE inverted wide-field microscope with a Near-IR-based Perfect Focus system. Images were acquired with an ORCA Flash4.0 V2 + Digital sCMOS camera (Hamamatsu) using a Nikon CF160 Plan Apochromat Lambda 100 × oil immersion objective (1.45 N.A.). Chlorophyll fluorescence of thylakoid membranes was imaged using a 640 nm LED light source (SpectraX) for excitation and a standard Cy5 emission filter. GFP localization was imaged using a 470 nm LED light source (SpectraX) for excitation and a standard GFP emission filter.

LC-MS/MS data analysis
MaxQuant/Andromeda (version 1.6.1.10) was used to process raw files from the Q Exactive HF-X and search the peak lists against a database consisting of Uniprot PCC 7002 proteome (UP000001688, total 3,179 entries, downloaded at 6/22/2019). The search allowed trypsin specificity with a maximum two missed-cleavage and set carbamidomethyl modification on cysteine as a fixed modification and protein N-terminal acetylation and oxidation on methionine as variable modifications. MaxQuant used 4.5 ppm main search tolerance for precursor ions, 20 ppm MS/MS match tolerance, searching top 12 peaks per 100 Da. False discovery rates for both protein and peptide were 0.01 with a minimum of seven amino acid peptide length. Label-free quantification was enabled with minimum 2 LFQ ratio counts and a fast LFQ option. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD021787 (Perez-Riverol et al. 2019).
Only proteins with at least two unique peptides and two spectral counts were considered identified in an individual sample. PCC 7002 proteins identified in both GFP-APEX2 replicates and/or both PsbU-APEX2 replicates were retained for further analysis, including the PsbU-APEX enriched protein analysis and the Venn diagram (Table S1). A presence/ absence Venn diagram was constructed (Fig. 3c). A protein must be identified in both replicates of a sample to appear in the Venn diagram. Proteins identified in both replicates of a sample and only one replicate of the other sample (176 proteins) were not added to the Venn diagram as their localization was unclear.

PsbU-APEX2 enriched protein analysis
The log 2 ratio of the MAXQUANT LFQ intensities and the log 2 ratio of normalized spectral counts were used as metrics to determine enrichment in the PsbU-APEX2 A and PsbU-APEX2 B samples over the GFP-APEX2 B sample (log 2 (U/G)) (Old et al. 2005). If a protein was not identified in a sample, the LFQ intensity was set to zero. To determine the cutoff for proteins enriched in PsbU-APEX2 samples, identified proteins were cross-referenced with true positive (TP) or false positive (FP) lists. The TP lists were assembled using localization data from studies of the thylakoid lumen or thylakoid membrane in PCC 6803. All proteins experimentally localized or predicted to localize to the thylakoid lumen in any study were included in the TP list (Aldridge et al. 2008;Fulda et al. 2002;Heinz et al. 2016;Kashino et al. 2006;Rajalahti et al. 2007). To include integral thylakoid membrane proteins, proteins localized to the thylakoid membrane in at least 4 studies that had at least 1 predicted transmembrane helix were also added to the TP list (Agarwal et al. 2010;Baers et al. 2019;Herranen et al. 2004;Kashino et al. 2002;Komenda et al. 2006;Liberton et al. 2016;Ohkawa et al. 2002;Pisareva et al. 2011;Rengstl et al. 2011;Rowland et al. 2010;Sacharz et al. 2015;Schultze et al. 2009;Srivastava et al. 2005;Wang et al. 2000;Xu et al. 2008;Zak et al. 1999Zak et al. , 2001Zhang et al. 2004). The FP list was assembled using data from studies of the soluble proteome of PCC 6803. The FP list contained proteins that were found in the soluble proteome in at least 4 studies, had no predicted signal sequence or transmembrane helix, and was found in 1 or less studies of the thylakoid membrane (Baers et al. 2019;Choi et al. 2000;Fulda et al. 2006;Fuszard et al. 2013;Gan et al. 2005;Gao et al. 2014bGao et al. , 2015Gao et al. , 2009Kurian et al. 2006b;Mata-Cabana et al. 2007;Mehta et al. 2014;Mikkat et al. 2014;Pandhal et al. 2009;Pérez-Pérez et al. 2006;Plohnke et al. 2015;Rowland et al. 2011;Simon et al. 2002;Slabas et al. 2006). The TP and FP lists are in Supplementary Table 3. A total of four analyses were performed, one for each enrichment metric (Log 2 (U/G) using LFQ intensity and Log 2 (U/G) using normalized spectral counts) in each PsbU-APEX2 sample. For each protein in every analysis, the true positive rate (TPR) and the false-positive rate (FPR) were calculated. The TPR for a specific protein was the number of TP proteins with an enrichment greater than or equal to the enrichment of the specific protein divided by the total number of TP proteins found in the experiment. The FPR for a specific protein was the number of FP proteins with an enrichment greater than or equal to the enrichment of the specific protein divided by the total number of FP proteins. The cutoff for each sample was the enrichment with the greatest difference between the TPR and FPR value. The proteins above the cut-off of the in all 4 analyses are reported in Table 1 and were used to make Fig. 3d.

Signal sequence prediction
To predict if a protein had a signal sequence and the cut site to the remove the signal sequence, all proteins in the UniProt reference proteome for PCC 7002 were analyzed with Sig-nalP-5.0 using both the Gram-positive and Gram-negative bacterial options.

Transmembrane helices prediction
To predict if a protein had transmembrane helices, all proteins in the UniProt reference proteome for PCC 7002 were analyzed using the TMHMM Server v. 2.0. included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.