Gene Cloning and Protein Expression

For the current PSI-2 target ORFs, boundaries defined by domain family analysis and several constructs are generated for each target to maximize the opportunity for a successful outcome. Our permutation strategy is based on extension of the N or C-termini based on secondary structure, length restrictions or experimental data. When possible however, we attempted cloning and expression of the original full-length-target sequence. The majority of targets represent the MSCG assigned Pfam groups with approximately 25% of the target group consisting of biomedical targets from pathogens. For the expressed target group, we observed an overall success rate of ~25% for generation of an expression clone that produced a soluble protein product sufficient for crystallographic studies.

To meet the production goals for PSI-2, we apply a comprehensive 96-well-plate HTP technology to generate clones and express soluble proteins. Our pipelines use pMCSG7 as the primary expression vector and a maltose-binding protein (MBP) fusion vector for a “salvage” strategy for proteins that express well in pMCSG7 but show low solubility. Expression clones that produce insoluble proteins are directed to Level 2 processing (see Figure). The developmental goal is to address solubility problems using HTP approaches. Criteria for entry into the salvage loop will include: lack of a soluble orthologues, poor diffraction quality crystals, or high target priority due to biomedical impact. This tiered strategy leverages our efficient and cost-effective parallel processes designed for mass production of proteins and protein fragments in E. coli.

Coding regions are amplified using primers designed with the Express Primer tool or domain-specific primer design tools. All primers contain ligation-independent cloning sites compatible with multiple vectors. Affinity tags with a TEV (tobacco etch virus) protease cleavage site are fused to all proteins to facilitate their purification or capture. The primary steps of the process — PCR gene amplification, testing for protein expression and solubility — are conducted in 96-well-plate format. Denaturing PAGE analysis of proteins is carried out in a high-density gel format.

When the PSI pilot centers were formed, ligation-independent cloning (LIC) offered an attractive technology adaptable for robotic cloning, but existing vectors were not suitable for automated purification of proteins for crystallization. We developed a set of superior LIC vectors tailored specifically for this purpose. The vector, pMCSG7, encodes a His6-tag followed by a spacer and a TEV protease cleavage site that overlaps with the LIC site. This design puts the TEV site close to the start of the cloned native protein. Only the three-amino-acid-sequence SerAsnAla

When the PSI pilot centers were formed, ligation-independent cloning (LIC) offered an attractive technology adaptable for robotic cloning, but existing vectors were not suitable for automated purification of proteins for crystallization. We developed a set of superior LIC vectors tailored specifically for this purpose. The vector, pMCSG7, encodes a His6-tag followed by a spacer and a TEV protease cleavage site that overlaps with the LIC site. This design puts the TEV site close to the start of the cloned native protein. Only the three-amino-acid-sequence SerAsnAla (SNA) is added to the protein after protease cleavage.

For more information on vectors please see the vector summary page.

TEV protease is highly specific and we have yet to observe substantial target degradation. We also constructed a series of derivatives of pMCSG7 that fuse helper peptides or proteins, such as MBP, to the N-termini of proteins or introduce these elements into vectors with different origins of replication to allow co-expression of proteins. Four additional vectors improve tandem purification of complexes, aid robotic screening protocols, and improve robotic protein purification. The pMCSG21 vector creates a bridge to Gateway (Invitrogen) vectors to offer easy access to vectors designed to express proteins in alternative hosts. As this avenue of protein production becomes more important, some Gateway vectors will be redesigned to make them compatible with the existing protein production pipelines. Gene expression in all the MCSG vectors is driven by the T7 promoter and controlled by lac repressor, and all vectors accept the same PCR products.

Name

Base

Antibi

Tag & sites

pET30Xa/LIC

pET30a

kan

N-His-Thrmb-S-Xa-LICb-MCS-His-C

pET21a

pET

amp

N-T7tag-MCS-His-C

pMCSG1

pET30Xa/LIC

kan

N-His-Xa-LICb-MCS-His-C

pMCSG3

pET21a

amp

N-His-Xa-LICb-MCS-His-C

pMCSG6

pMCSG3

amp

N-His-TEV-LICb-MCS-His-C

pMCSG7

pMCSG6

amp

N-His-TEV-LIC-MCS-His-C

pMCSG8

pMCSG7

amp

N-His-Sloop-TEV-LIC-MCS-His-C

pMCSG9

pMCSG7

amp

N-His-MBP-TEV-LIC-MCS-His-C

pMCSG10

pMCSG7

amp

N-His-GST-TEV-LIC-MCS-His-C

pMCSG11

pACYCDuet

Cm

N-His-TEV-LIC-MCS-His-C

pMCSG12

pACYCDuet

Cm

N-His-Sloop-TEV-LIC-MCS-His-C

pMCSG13

pACYCDuet

Cm

N-His-MBP-TEV-LIC-MCS-His-C

pMCSG14

pACYCDuet

Cm

N-His-GST-TEV-LIC-MCS-His-C

pMCSG15

pMCSG7

amp

N-LIC-GS-TEV-AviTag-His-C

pMCSG16

pMCSG7

amp

N-His-AviTag-GS-TEV-LIC-MCS-His-C

pMCSG17

pMCSG7

amp

N-Stag-TEV-LIC-MCS-His-C

pMCSG18

pMCSG7

amp

N-His-TEV-LIC-GFP-MCS-His-C

pMCSG19

pMCSG7

amp

N-MBP-TVMV-His-TEV-LIC-MCS-His-C

pMCSG19B

pMCSG7

amp

N-MBP-TVMV-His-TEV-LIC-MCS-His-C TVMV

pMCSG19C

pMCSG7

amp

N-MBP-TVMV-His-TEV-LIC-MCS-His-C TVMV

pMCSG20

pMCSG17

amp

N-Stag-GST-TEV-LIC-MCS-His-C

pMCSG21

pCDFDuet

spec

N-His-TEV-LIC-MCS-His-C

pMCSG22

pMCSG21

spec

N-His-Sloop-TEV-LIC-MCS-His-C

pMCSG23

pMCSG21

spec

N-His-MBP-TEV-LIC-MCS-His-C

pMCSG24

pMCSG21

spec

N-His-GST-TEV-LIC-MCS-His-C

pMCSG25

pMCSG7

amp

N-Protease-His-TEV-LIC-MCS-His-C

pMCSG26

pMCSG7

amp

N-LIC3-His6-C

pMCSG27

pMCSG26

amp

N-LIC3-His10-TVMV-MBP-C

pMCSG28

pMCSG26

amp

N-LIC4-TEV-His6-C

pMCSG29

pMCSG27

amp

N-LIC4-TEV-His10-TVMV-MBP-C

pMCSG30

pMCSG7

amp

N-His-TEV-MPB-LIC2-SacB-LIC2-C

pMCSG31

pASKIBA3+

amp

N-MBP-TVMV-His6-TEV-LIC (Tet-inducible TVMV)

pMCSG32

pMCSG28

amp

N-MBP-TVMV-LIC-TEV-His6-C

pMCSG33

pMCSG28

amp

N-His10-MBP-TEV-LIC-His6-C

pMCSG34

pMCSG32

amp

N-MBP-TVMV-LIC-TEV-His6-C (TVMV)

pMCSG34B

pMCSG32

amp

N-MBP-TVMV-LIC-TEV-His6-C (TVMV anti-clock)

pMCSG35

pMCSG28

amp

N-His10-MBP-TEV-LIC-His6-C (TVMV)

pMCSG35b

pMCSG28

amp

N-His10-MBP-TEV-LIC-His6-C (TVMV anti-clock)

pMCSG37

pMCSG9

amp

N-His10-MBP-His6-TEV-LIC-C (TVMV)

pMCSG38

pMCSG7

amp

N-His6-TEV-LIC-C (TVMV anti-clock)

pMCSG38C

pMCSG7

amp

N-His6-TEV-LIC-C (TVMV)

pMCSG39

pMCSG7

amp

N-MBP-TVMV-His10-TEV-LIC-C

pMCSG40

pMCSG19

amp

N-OsmY-TVMV-His10-TEV-LIC-C (DsbA)

pMCSG41

pMCSG28

amp

N-LIC-TEV-His6-C (TVMV)

pMCSG42

pMCSG32

amp

N-MBP-TVMV-LIC-TEV-His6-C (TVMV)

pMCSG43

pMCSG32

amp

N-OsmY-TVMV-LIC-TEV-His6-C (DsbA)

pMCSG44

pMCSG7

amp

N-HN6-SUMO-TEV-LIC-C

pMCSG45

pMCSG7

amp

N-HN6-MBP-TEV-LIC-C

pMCSG46

pMCSG7

amp

N-HN6-NusA-TEV-LIC-C

pMCSG47

pMCSG7

amp

N-HN6-Halo-TEV-LIC-C

pMCSG48

pMCSG7

amp

N-His8-NusA-TEV-LIC-C

pMCSG49

pMCSG7

amp

N-His-TEV-LIC-C (1kb removed))

pMCSG50

pMCSG60

amp

N-His6-AviTag-TEV-LICs-BirA-C

pMCSG51

pMCSG50

amp

N-His6-AviTag-TEV-LICs-BirA-C

pMCSG52

pMCSG7

amp

N-His-TEV-LIC-C (Rare: argU-ileX)

pMCSG53

pMCSG52

amp

N-His-TEV-LIC-C (Rare: argU-ileX; 1kb removed)

pMCSG54

pMCSG49

amp

N-His-FLAG-TEV-LIC-C

pMCSG55

pMCSG49

amp

N-His-HA-TEV-LIC-C

pMCSG56

pMCSG49

amp

N-His-HA-HA-TEV-LIC-C

pMCSG57

pMCSG49

amp

N-His-AviTag-TEV-LIC-C

pMCSG58

pMCSG53

amp

N-LIC-His-C  (Rare: argU-ileX; 1kb removed)

pMCSG59

pMCSG53

amp

N-LIC-TEV-His-C  (Rare: argU-ileX; 1kb removed)

pMCSG60

pMCSG7

amp

N-His6-TEV-LIC-LIC4-C

pMCSG61

pMCSG30

amp

N-His6-TEV-MBP-LIC-C

pMCSG62

pMCSG53

amp

N-His6-AviTag-TEV-LIC-BirA-C  (Rare: argU-ileX; 1kb removed)

pMCSG63

pMCSG53

amp

N-His6-TEV-LIC-LIC-C (Rare: argU-ileX; 1kb removed)

pMCSG64

pMCSG60

amp

N-His-FLAG-TEV-LIC-LIC4-C

pMCSG65

pMCSG60

amp

N-His-HA-TEV-LIC-LIC4-C

pMCSG66

pMCSG60

amp

N-His-HA-HA-TEV-LIC-LIC4-C

pMCSG67

pMCSG60

amp

N-His-STRII-TEV-LIC-LIC4-C

pMCSG68

pMCSG53

amp

N-His-STRII-TEV-LIC-C  (Rare: argU-ileX; 1kb removed)

pMCSG69

pMCSG53

amp

N-MBP-TVMV-His-TEV-LIC-C  (Rare: argU-ileX; 1kb removed)

pMCSG70

pMCSG53

amp

N-LIC-TEV-His10-TVMV-MBP-C   (Rare: argU-ileX; 1kb removed)

pMCSG71

pMCSG53

amp

N-MBP-TVMV-LIC-TEV-His-C  (Rare: argU-ileX; 1kb removed)

pMCentr1

pDONR/Zeo

zeocin

attL1-TEV-LIC-attL2

pMCentr2

pDONR221

kan

attL1-TEV-LIC-attL2

pMCentr3

pDONR221

kan

attL1-TVMV-LIC-attL2

pMCdest19

pMCSG19

amp, Cm

attR1-ccdB-CmR-attR2

For more information on vectors please see the vector summary page.

The insect cell expression system developed in the Fremont laboratory at Washington University is particularly well suited for targets that must be handled separately because they require correct disulfide bonds and other posttranslational modifications to produce properly folded proteins. The approach takes advantage of the fact that very few proteins are secreted from insect cells during baculovirus infection. Methods for the efficient recovery of secreted proteins from insect cell supernatants based on a His6 affinity tag have been developed. In addition, the fusion tag allows for easy monitoring of the infection and purification steps as it is easily detected on western blots using anti-His6 antiserum. To greatly shorten this process, the following modifications were implemented:

The transfer vector was modified to allow for ligation-independent cloning (LIC) of PCR fragments. The baculovirus transfer vector pAcUW51 was altered to contain a honeybee melittin signal sequence after the polyhedrin promoter. The honeybee melittin signal sequence has been shown to enhance the secretion of numerous foreign proteins from insect cells. Also, a C-terminal His6 tag removable by thrombin is included downstream of the cloning site.

We have succeeded in developing high-throughput bacterial inclusion body refolding protocols with particular emphasis on the folding of disulfide-bonded proteins. Again, for a typical target, we first PCR amplify the DNA corresponding to the mature secreted protein without the predicted leader sequence, transmembrane or intracellular regions, and then inserted it into a tagless pET-23b expression construct. For protein production we use BL21-Codon Plus (DE3)-RIL cells and induce expression with IPTG. Induced cell pellets are collected by centrifugation and lysed by sonication. Proteins are then recovered in the form of inclusion bodies and purified. The target proteins are first denatured, reduced, and then refolded by dilution under oxidative conditions. We have found small molecule additives like L-Arginine and NDSB to be extremely useful in optimizing refolding efficiencies. We next concentrate the refolded material and subject it to size exclusion chromatography. Further purification is usually pursued using ion-exchange chromatography, with protein identity and disulfide bond formation checked by mass spectrometry. For proteins with known ligands, we confirm correct folding of the recombinant reparations by testing their functional properties, for instance using surface plasmon resonance binding assays. For proteins where no known function exists, we judge appropriate folding by biophysical parameters that correlate with folding, including monodisperse profiles on size-exclusion chromatography and significant secondary structure as measured by circular dichroism spectroscopy.

We are developing a salvage pathway for proteins that express well but fail in the crystallization trials. It is possible that crystallization of such proteins is inhibited by unfolded or disordered portions of the protein. Therefore, we are seeking to define the stable, folded domains of proteins through limited proteolysis. Target proteins are digested with various proteases under native conditions, and the protease-resistant portions of the protein that remain after digestion are analyzed by electrospray mass spectrometry to determine their intact mass and by tandem mass spectroscopy (ESI MS/MS) to determine their amino acid sequence. Bioinformatics is used to predict secondary structure and, together with data from the proteolysis experiments, guides the design of truncated constructs that can then be fed back through the cloning and crystallization pipeline.

1   MHHHHHHSSG VDLGTENLYF QSNAMKPIDR FSYLKNNRVS QDTSSLVQCY
51  LPIIGQEALS LYLYTISFWD NGRKEYLFSS ILNHLNFGMD RLIKSLKILS
101 AFNLLTLYQK GDVYQLALHA PLSSQDFLGH PVYRRLLEKK IGDVAVEDLK
151 VESADGEEIP VSLNQVFPEL AELGSQEDLG LKKKVANDFD LEHFRQLMAR
201 DGLRFADEQS DVLNLFAIAE EKKWTWFETY QLAKSTAVSQ VISTKRMREK
251 IAQKPVSSDF SLKEATIIKE AKSKTALQFL AEIKQTRKGT ITQTERELLQ
301 QMAGLGLLDE VINIILLLTF NKVDSANINE KYAMKVANDY AYQKIHSAEE
351 AVLRIRDRGQ KAKTQKQNQT APEKTNVPKW SNPEYKNETS EETRLELERK
401 KQELLARLEK G

Selected related publications

  • Brett TJ, Legendre-Guillemin V, McPherson PS, Fremont DH (2006). Structural definition of the F-actin-binding THATCH domain from HIP1R. Nat Struct Mol Biol, 13, 121-30 Times cited: 6. [PubMed] [PDB]
  • Dieckman L, Gu M, Stols L, Donnelly MI, Collart FR (2002). High throughput methods for gene cloning and expression. Protein Expr Purif, 25, 1-7 Times cited: 42. [PubMed]
  • Donnelly MI, Stevens PW, Stols L, Su SX, Tollaksen S, Giometti C, Joachimiak A (2001). Expression of a highly toxic protein, Bax, in Escherichia coli by attachment of a leader peptide derived from the GroES cochaperone. Protein Expr Purif, 22, 422-9 Times cited: 7. [PubMed]
  • Donnelly MI, Zhou M, Millard CS, Clancy S, Stols L, Eschenfeldt WH, Collart FR, Joachimiak A (2006). An expression vector tailored for large-scale, high-throughput purification of recombinant proteins. Protein Expr Purif, 47, 446-54 Times cited: 5. [PubMed]
  • Moy S, Dieckman L, Schiffer M, Maltsev N, Yu GX, Collart FR (2004). Genome-scale expression of proteins from Bacillus subtilis. J Struct Funct Genomics, 5, 103-9 Times cited: no data. [PubMed]
  • Nelson CA, Pekosz A, Lee CA, Diamond MS, Fremont DH (2005). Structure and intracellular targeting of the SARS-coronavirus Orf7a accessory protein. Structure (Camb), 13, 75-85 Times cited: 21. [PubMed] [PDB]
  • Scholle MD, Collart FR, Kay BK (2004). In vivo biotinylated proteins as targets for phage-display selection experiments. Protein Expr Purif, 37, 243-52 Times cited: 6. [PubMed]
  • Smith HR, Heusel JW, Mehta IK, Kim S, Dorner BG, Naidenko OV, Iizuka K, Furukawa H, Beckman DL, Pingel JT, Scalzo AA, Fremont DH, Yokoyama WM (2002). Recognition of a virus-encoded ligand by a natural killer cell activation receptor. Proc Natl Acad Sci U S A, 99, 8826-31 Times cited: 181. [PubMed]
  • Stevens FJ, Kuemmel C, Babnigg G, Collart FR (2004). Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: application. J Mol Recognit, 18, 150-157 Times cited: 2. [PubMed]
  • Stols L, Gu M, Dieckman L, Raffen R, Collart FR, Donnelly MI (2002). A new vector for high-throughput, ligation-independent cloning encoding a tobacco etch virus protease cleavage site. Protein Expr Purif, 25, 8-15 Times cited: 57. [PubMed]
  • Stols L, Millard CS, Dementieva I, Donnelly MI (2004). Production of selenomethionine-labeled proteins in two-liter plastic bottles for structure determination. J Struct Funct Genomics, 5, 95-102 Times cited: no data. [PubMed]
  • Stols L, Zhou M, Eschenfeldt WH, Millard CS, Abdullah J, Collart FR, Kim Y, Donnelly MI (2007). New vectors for co-expression of proteins: Structure of Bacillus subtilis ScoAB obtained by high-throughput protocols. Protein Expr Purif, 53, 396-403 Times cited: 0. [PubMed]
  • Yoon JR, Laible PD, Gu M, Scott HN, Collart FR (2002). Express primer tool for high-throughput gene cloning and expression. Biotechniques, 33, 1328-33 Times cited: 3. [PubMed]

For a more exhaustive list of publications see the MCSG publications website.