Review Article - (2016) Volume 8, Issue 2
Protein expression using bacterial systems has advanced substantially over the past few decades, but Escherichia coli is still the most commonly utilized expression host, despite issues related to protein solubility. Several solutions, such as different host strains, different vectors, and incubation with co-chaperones, have been developed to minimize protein aggregation and ensure high-quality protein production. Here, we review commonly used methods to increase protein solubility, with a focus on the Clp/Hsp100 family and pneumococcal ClpL, a novel member of the Clp/Hsp100 family that is highly induced in Streptococcus pneumoniae during heat shock. Unlike the DnaK system, which requires an additional co-chaperone system to reinstate the natural conformation of denatured target proteins, pneumococcal ClpL is able to disaggregate denatured proteins independently, without requiring a co-chaperone system. Accordingly, ClpL could be a useful chaperone system to solubilize foreign proteins during protein overexpression.
Keywords: Protein aggregation; ClpL; Chaperone; Recombinant protein expression; Inclusion bodies
Many systems are used to produce heterologous proteins in bacteria, but Escherichia coli is frequently used since it is easy to manipulate and has a short life cycle. Additionally, extensive genetic tools have been developed to facilitate the production of recombinant proteins. It is also easy to scale-up and costs relatively little to culture compared to other systems. Thus, E. coli has been the first choice for the production of recombinant proteins [1]. However, the use of E. coli as a host has a few limitations: 1) recombinant proteins may fail to form an appropriate conformational structure and 2) proteins that are produced in a large quantity tend to aggregate to form an insoluble inclusion body, which lacks structure and function. Although there are no universal solutions for these problems, many different approaches have been developed and these methods have substantially improved protein production. Methods for the production of high-quality, high-quantity proteins have largely been established by trial and error by changing various parameters, such as host strain, expression vector, protein sequences (without affecting the functional domain), chemical additions, and incubation with chaperones. Recently, while characterizing heat shock proteins (HSPs) of Streptococcus pneumoniae, we found that pneumococcal ClpL, a member of the Clp/Hsp100 family and a ClpB homologue of E. coli, is able to disaggregate denatured proteins without co-chaperone systems, such as the DnaK system. In this review, we summarize several strategies commonly used to increase protein solubility with a focus on the effects of incubation with chaperones, especially pneumococcal ClpL, to improve protein production. Thus, we suggest that it is highly feasible to produce recombinant proteins in E. coli by co-expressing ClpL.
Strategies for recombinant protein expression
Vector systems: Since recombinant technologies have emerged, many different types of vectors have been introduced in the market. When a recombinant protein is induced in high quantities with a strong promoter or a high concentration of an inducer, it tends to become degraded or to aggregate to form an inclusion body, and accordingly it does not regain its original conformation. It is important to choose an appropriate vector system to overexpress a recombinant protein. The promoter and fusion tag appear to be the most important factors when selecting a vector system because they affect the solubilization of the recombinant protein. One of the most commonly used vector systems is the pET vector from Novagen (Madison, WI, USA) and Clontech (Mountain View, CA, USA) because it contains a His-tag to enable the easy isolation of target proteins using a nickel column. The His-tag is relatively small compared to other fusion proteins, such as glutathione- S-transferase (GST) or maltose-binding protein (MBP); accordingly, it is less likely to affect the function of a target protein. However, when a target protein aggregates and forms inclusion bodies, the target gene can be cloned in other vectors containing fusion proteins, e.g., GST [2], MBP [3], NusA [4], Sumo [5], and Trx [6]. Representative vectors include the pGEX system produced by GE Healthcare (Little Chalfont, UK) and the pMal system produced by New England Biolabs (Ipswich, MA, USA). Fusion proteins are somewhat large and may interfere with the activity of target proteins if the tagging protein is not removed from recombinant proteins. Thus, multiple enzyme sites have been incorporated into vectors to facilitate the cleavage of fusion tags after overexpression and purification. Proteases that are commonly used include enterokinase (cleavage site DDDDK ↓ [7], factor Xa (cleavage site IEGR ↓) [8], SUMO protease (C-terminal end of the conserved Gly- Gly sequence in SUMO) [5], and thrombin (cleavage site LVPR↓GS) [9-11].
Host strains: Many different E. coli strains are available that are genetically manipulated to produce target proteins. For example, E. coli BL21 and its derivatives are widely used. This strain has deletions of the Lon and OmpT proteases; therefore, heterologous proteins can be expressed more stably [12]. Additionally, its derivatives, including BL21 (DE3) pLys S and BL21 (DE3) pLys E, have a lysozyme that is an inhibitor of T7 polymerase. Thus, leaky expression of T7 polymerase in the strains is prevented by the lysozyme, resulting in higher host cell survival and increased target protein production [13]. As codons of target genes are often not present in the host, E. coli hosts are not able to produce target proteins. To resolve this issue, E. coli hosts, such as CodonPlus E. coli developed by Stratagene (La Jolla, CA, USA) and the Rosetta strain developed by Novagene have extra tRNA codons (AUA, AGG, AGA, CUA, CCC and GGA) [14,15]. When E. coli hosts produce target proteins at a high level, inclusion bodies tend to form. Reducing the incubation temperature helps bacteria produce more soluble proteins, but chaperone activity decreases drastically at lower temperatures [16]. To address this issue, the ArcticExpress E. coli strain developed by Agilent Technologies (Santa Clara, CA, USA) has a modified chaperone, which has protein folding activity at 4-10°C [17].
Chaperone co-expression: Heterologous proteins sometimes fail to reach their proper structural conformation, especially when the protein concentration in the cytoplasm of E. coli is too high. When the macromolecular concentration exceeds 300–400 mg/ml [18,19], appropriate protein folding is remarkably challenging. Chaperones help nascent polypeptide chains form their native conformations (Figure 1). In a crowded environment, the availability of chaperones might be limited and over-titrated, leading to protein instability and aggregation [20,21]. Thus, the addition of individual chaperones or chaperone sets may boost the availability of chaperones and help solubilize target proteins during overexpression [22,23]. The chaperone systems that are commonly used in E. coli are the DnaK, GroELS, and Clp/Hsp100 systems (Figure 1); commercially available chaperones are listed in Table 1.
Name | Chaperone | Promoter | Company |
---|---|---|---|
pG-KJE8 | DnaK, DnaJ, GrpE (DnaK system), GroELS | araB and Pzt-1 | Takara |
pGro7 | GroELS | araB | Takara |
pKJE7 | DnaK, DnaJ, GrpE (DnaK system) | araB | Takara |
pG-Tf2 | Tig (trigger factor), GroELS | Pzt-1 | Takara |
pTf16 | Tig (trigger factor) | araB | Takara |
pBB530 | GrpE (DnaK system) | PA1/lacO1 | Addgene |
pBB535 | DnaK, DnaJ (DnaK system) | PA1/lacO1 | Addgene |
pBB540 | GrpE (DnaK system), ClpB (HSP 100 family) | PA1/lacO1 | Addgene |
pBB542 | DnaK, DnaJ, GroESL(Large amounts) | PA1/lacO1 | Addgene |
pBB550 | DnaK, DnaJ, GroESL(Small amounts) | Plac/lacO1 | Addgene |
pBB872 | ibpB, ibpA (small HSPs) | PA1/lacO1 | Addgene |
pColdI-IV | None (Cold shock protein promoter) | CspA | Takara |
Table 1: Chaperone co-expression systems.
DnaK system and GroELS
Living organisms face hostile environments, such as oxidative stress, nutrient starvation, and heat shock, and they cope with these environments by inducing stress-related proteins. HSPs are a group of well-studied stress proteins involved in many cellular processes, including cell division, translocation, thermo-tolerance, oxidative resistance, and chemical stress resistance. In prokaryotes and eukaryotes, many different types of HSPs have been identified; they have similar functions and molecular weights. HSPs are generally classified into DnaK, GroELS, Clp/HSP100, HtpG or small HSPs in prokaryotes depending on their molecular weights. The DnaK system, consisting of the DnaK (HSP 70), DnaJ (HSP 40), and GrpE (NEFs; nucleotide exchange factors), plays essential roles in protein folding and environmental stress resistance [24]. An important aspect of the DnaK system is that it works together with the ClpB chaperone, a member of the Clp/HSP100 family, during protein disaggregation (Figure 1) [25]. GroEL is an oligomer of over 800 kDa that is assembled as two homoheptameric rings. One ring of GroEL possesses chaperone activity to fold misfolded or nonnative proteins via ATP hydrolysis with or without co-chaperonin [26-28].
The Clp/HSp100 family of the AAA+ ATPase superfamily
Caseinolytic protease (Clp)/heat shock protein 100 belongs to the AAA+ (ATPase associated with various cellular activities) superfamily. This group of proteins refolds denatured proteins into their native form, acting as chaperones, and is also involved in the proteolysis of unnatural forms of proteins, thereby decreasing damaged and denatured proteins in an ATP-dependent manner [29]. Clp/HSP100 family proteins contain an N-terminal domain and one or two AAA+ nucleotide-binding domains (NBD1 and NBD2) separated by coiledcoil of alpha-helices middle domains (M-domain). The Clp/Hsp100 proteins are classified into two groups depending on the number of NBDs [30]. Class I proteins include ClpA, B, C, D, E, and L and contain two NBDs (NBD1 and NBD2). Class II proteins, such as ClpM, N, X, and Y, have only one NBD. The NBDs are responsible for ATP binding and hydrolysis.
The Clp/Hsp100 protein family is involved in protein disaggregation and protein degradation and quality control [30]. ClpB/Hsp104 and ClpL mediate protein disaggregation, whereas other Clp/Hsp100 proteins, including ClpA and X, promote protein degradation in collaboration with the ClpP protease [30-32]. In the Clp/Hsp100 family, the M-domain influences many activities, including ATPase activity and disaggregation as well as interactions with co-chaperones, such as DnaK, Hsp70, and MecA [33,34]. Though some Clp/Hsp100 proteins, including ClpB, E, and L, contain an M domain, ClpC and ClpA have a shorter M domain and no M domain, respectively. Additionally, the sequence of the M domain in the Clp/Hsp100 family varies considerably. The importance of the M domain is well described for the mutant form of ClpB. When the M domain is deleted, ClpB loses its disaggregation activity, resulting in impaired thermo-tolerance [35,36]. Partial deletions or point mutations of the M domain in ClpB result in a similar or less stringent phenotype as compared to the complete deletion [36-40]. However, unlike bacterial ClpB and yeast HSP104, even though ClpA does not contain an M domain, it can disaggregate proteins without a co-chaperone system, such as the DnaK system, suggesting that disaggregation reactions do not require the DnaK system [41].
The function of Clp/Hsp100 proteins is modulated by a group of proteins termed adaptor proteins, which are required for chaperone activity as well as substrate proteolysis [42]. Examples of adaptor protein are MecA, ClpS, YpbH, etc. ClpC and ClpA require the adaptor proteins MecA and ClpS, respectively, for chaperone activity [43]. Adaptor protein ClpS in E. coli ushers substrate proteins to ClpAP for degradation. ClpA is positioned at either or both ends of the proteolytic core, where it modulates substrates to enter the catalytic chamber of ClpP through the narrow central pore of ClpA [32].
ClpL of Streptococcus pneumoniae
Clp/Hsp100 proteins are found in a wide range of taxa; ClpA is found in gram-negative eubacteria, such as E. coli , ClpB is found in most eubacteria and eukaryotes, ClpC is found in cyanobacteria, plants, and most gram-positive eubacteria, ClpD is found exclusively in plants, and ClpE is found in certain gram-positive eubacteria [44]. S. pneumoniae, a causative agent of otitis media, pneumonia, and meningitis, encounters heat stress conditions and undergoes heat shock responses when it penetrates, colonizes, and invades the blood and brain. S. pneumoniae induces an array of heat shock proteins, such as the DnaK system, GroELS, and Clp/Hsp100, to cope with these hostile environments. These HSPs are highly conserved in both prokaryotes and eukaryotes, and are utilized against other environmental stresses, such as exposure to ethanol, antibiotics, and heavy metals, to enhance the survival rate of bacteria. The Clp/Hsp100 class chaperones of S. pneumoniae are ClpC, E, L and X [45], and an ATP-dependent protease of S. pneumoniae is ClpP, which plays an important role in in stress responses, competence, and virulence [46]. ClpC has several functions, such as the control of growth inhibition at high temperatures, autolysis, adherence, and transformation [47], and ClpE is required for growth at high temperatures and virulence [48]. ClpX is crucial for bacterial survival [45].
Previously, we indicated that ClpL is one of four major HSPs in S. pneumoniae that is induced by heat shock and mediates thermotolerance and protein disaggregation [31,45,49,50]. ClpL homologs have also been identified in many gram-positive pathogens, including Staphylococcus aureus, Listeria monocytogenes, Streptococcus pyogenes, S. agalactiae, S. mutans, S. sanguinis, and the trypanosomatid protozoa responsible for leishmaniasis (Leishmania major) as well as lactic acid bacteria [31]. We and other groups have found that recombinant ClpL can refold denatured rhodanese and CtsR [49,51]. ClpL mutants are susceptible to penicillin and show thinner cell walls than the wildtype strains. Conversely, a pneumococcal strain overexpressing ClpL develops higher resistance to penicillin and a thicker cell wall. Consistent with this finding, when S. pneumoniae is exposed to penicillin, the bacteria exhibits upregulation of ClpL and penicillin binding protein 2 (Pbp2X), a major cell wall synthesis enzyme, which improves survival in extreme conditions [50]. Recently, an antibiotic acyldepsipeptide (ADEP) activates ClpP, which results in complete eradication of S. aureus in combination with other antibiotics [52]. Likewise, ClpL in S. pneumoniae might be a highly promising target for antibiotic therapy.
ClpL functions as a molecular chaperone without a cooperative chaperone system
Pneumococcal ClpL and ClpP are highly induced after heat shock and are implicated in the regulation of a wide array of virulenceassociated genes, including cbpA, cps2A, ply, and psaA [49]. Although ClpL homologs are present in several different organisms, they have not been characterized at the molecular and biochemical levels. Recently, we conducted a biochemical study of ClpL of S. pneumoniae and found several interesting characteristics, which are distinct from those of other members of the Clp/Hsp100 family [31,50,53]. Recombinant ClpL displayed several functions, including nucleotide hydrolysis, refolding, holdase activity, and disaggregation, and used either Mg2+ or Mn2+ as a cofactor for optimal activity. Mg2+ andMn2+ are divalent cations that are commonly found in biological systems, and Mg2+ is mainly utilized by AAA+ ATPase chaperones. For example, ATPase activity of ClpB, the closest E. coli homologue of ClpL, is enhanced by Mg2+, but not byMn2+. In the case of E. coli ClpA,Mn2+ actually inhibits ATPase activity. However, in other systems, such as eukaryotic Hsp104 and the GroELS system of E. coli , the activity is enhanced in the presence ofMn2+ [54]. Thus, although ClpL of S. pneumoniae belongs to the bacterial Clp/ Hsp100 family, it is unique because ATPase activity is enhanced byMn2+ [31].
Interestingly, ClpL does not require co-chaperones, such as the DnaK system, for chaperone activity and hexamerization (Figure 2). In contrast, ClpB requires the DnaK system for chaperone activity, and ClpA and ClpC collaborate with ClpP to provide protease activity [32]. However, pneumococcal ClpL does not collaborate with other co-chaperone systems or ClpP; it has the capacity to disaggregate denatured proteins without co-chaperones and does not have a ClpP binding loop, a domain that is known to mediate the interaction between Clp chaperone and protease complex [55]. Using Clustal Omega (http://www.uniprot.org/align/), we compared the protein sequences of the Clp/Hsp100 family to identify differences between ClpL and other family members (Figure 3). Clp/Hsp100 family proteins contain an N-terminal domain, NBDs, and an M-domain, which mediates the interaction with auxiliary factors, such as DnaK, Hsp70, and MecA. Specific function of each domain in Clp/Hsp100 family is summarized in Table 2. M-domain of ClpC and ClpB mediates interaction with co-chaperone or auxiliary factors (ClpB: DnaK and ClpC: MecA), but the M-domain of ClpL is much shorter than those of Hsp104 and ClpB, indicating that the M-domain of ClpL may have different functions from those of other Clp/Hsp100 family members [31]. The N-terminal domain of ClpA and ClpC mediates interaction with auxiliary factors such as ClpS and MecA, whereas that of ClpB interacts with the aggregated proteins (Table 2). Mutational analysis at the Walker A motifs (K127A/T128A and K458A/T459A) revealed that both NBDs are essential for chaperone activity, ATP hydrolase activity, and hexamerization.
Domains | Name | N | NBDs | MD |
---|---|---|---|---|
![]() |
ClpL(S. pneumoniae) | ND | Oligomerization and ATPase activity [31] | ND |
ClpA (E. coli) | Interaction with ClpS[56] | Oligomerization and ATPase activity [57] | None | |
ClpC (B. subtilis) | Interaction with MecA and Oligomerization [58] | Oligomerization and ATPase activity [58] | Interaction with MecA and Oligomerization [58] | |
ClpB (E. coli) | Interaction with aggregated proteins [59, 60] | Oligomerization and ATPase activity [35] | Interaction with DnaK[61] |
Table 1: Chaperone co-expression systems.
Currently, ClpL has not been used as a model for optimization of protein expression in E. coli or other systems. However, as recombinant ClpL is known to refold denatured rhodanese and CtsR [49,51]. Thus, it is highly feasible that ClpL may become a good system for protein expression. Although ClpL is found in S. pneumoniae, S. pneumoniae per se would not be an ideal host since it is difficult to carry out transformation experiment due to thick capsule as well as rapid induction of autolysis when it reaches stationary phase. It might be required to generate mutants of genes involved in capsule synthesis or autolysis prior to exploitation of S. pneumoniae as a host.
Many systems for heterologous protein overexpression and purification have been developed, but E. coli is currently used most widely. Owing to advances in recombinant technology, it is becoming more convenient to produce soluble proteins. However, despite many reports of successful heterologous protein expression, the use of bacterial co-chaperones, such as the DnaK and GroELS systems, to facilitate solubilization is not ideal. It is reported that during chaperone co-expression, undesired side effects, such as growth retardation and proteolysis, could give rise to reduced yield and reduced specific activity. Moreover, chaperone system per se is complex and sometimes induces protein aggregation. The lack of success in some cases does not suggest that this approach cannot be improved to produce functional proteins. For instance, pneumococcal ClpL is a very promising chaperone system that does not require co-chaperones, but the chaperone function of ClpL has not yet been tested in vivo. In addition, the study of chaperones may shed light on the virulence factors of pathogens, given that chaperones have been associated with the invasion, infection, and survival of virulent bacteria. For example, pneumococcal ClpL is induced by both cold shock and heat shock stress to protect bacteria from extreme conditions encountered by S. pneumoniae during infection. Thus, more extensive research on bacterial chaperons will undoubtedly provide us with useful information in regard to protein overexpression as well as bacterial virulence.