Information

How can I produce milligram quantities of an isotope-labeled DNA oligomer?

How can I produce milligram quantities of an isotope-labeled DNA oligomer?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'd like to produce a specific DNA sequence on a milligram-scale and 13C15N-label it. The sequence is around 35 nucleotides long, so chemical synthesis is out due to the exorbitant costs.

I'm also only interested in the single-stranded DNA, so a method that produces double-stranded DNA without any way to easily separate the strands would also not be useful.

What methods are there to produce labelled DNA that fulfill these requirements?


You can also design a plasmid that has your 35mers and raise the bacteria with isotopic carbon and nitrogen C13 acetate and N15 ammonium sulfate will work with E coli in a minimal medium.

this is not cheap, but the cheapest available sources of the isotopes can be used.


I found a couple of methods using M13 phage growing on E coli:

  • Reddy P, McKenney K. Improved method for the production of M13 phage and single-stranded DNA for DNA sequencing. Biotechniques. 1996 May;20(5):854-6, 858, 860.

  • I Jupin and B Gronenborn. Abundant, easy and reproducible production of single-stranded DNA from phagemids using helper phage-infected competent cells.Nucleic Acids Res. 1995 February 11; 23(3): 535-536.

You could easily adapt these protocols to grow on 13C, 15N-containing medium. Then you could use a restriction enzyme able to cut ss DNA and purify your fragment through a PA gel. Clontech's gigaprep kit (up to 50 mg of DNA) will cost you $546 (in the US, before any quote).

Disclaimer: I've never done any of this!


The cheapest way to make that large quantity of DNA without the need of the whole oligomer being label would be to produce 34 nucleotides by chemical synthesis. The 35 nucleotide can be labeled and ligated to the remaining strands. This should reduce the costs of producing the oligomer since the most expensive stage, the labeling is limited to just the final nucleotide.


What about PCR using labeled nucleotides? Might have to run several reactions to get miligram quantities you need, but 35 nucleotides seems really really small for growth in bacteria, and purification would be extremely difficult. But 35 bases might even be too small for PCR, hardly bigger than your primers.

If you do need to grow in e coli, it might make sense to create a plasmid with multiple repeats of the 35 base sequence separated by the same restriction site. You can grow e coli in c13 n15 media, but it's expensive and you have "train" the cells over several generations because the heavier isotopes create differences in chemical kinetics that make the labeled nutrients hard for enzymes to use, but I've seen people do this for making proteins for NMR.

I bet you need this labeled DNA for NMR, don't you?

Have you considered phosphate NMR? No need to label it, P-31 is already NMR active. If it must be C13 N15, I highly recommend having it synthesized, it will be expensive, but that 35 base length and the requirement for single strandedness are ideal for chemical synthesis.


Purification and characterization of transcribed RNAs using gel filtration chromatography

RNA synthesis using in vitro transcription by phage T7 RNA polymerase allows preparation of milligram quantities of RNA for biochemical, biophysical and structural investigations. Previous purification approaches relied on gel electrophoretic or gravity-flow chromatography methods. We present here a protocol for the in vitro transcription of RNAs and subsequent purification using fast-performance liquid chromatography. This protocol greatly facilitates production of RNA in a single day from transcription to purification.


Scientific Fundamentals of Biotechnology

1.06.11 Particular Features of Oligonucleotides’ Cloning

The cloning of oligonucleotides is common in modern genetic engineering. Typical oligonucleotide-generated plasmid inserts include the polylinkers, sites for site-specific recombination, and coding sequences for small protein domains.

Double-stranded oligonucleotides are produced by annealing single-stranded oligonucleotides, which are commercially synthesized from dNTPs. Current technology allows for the chemical synthesis of oligonucleotides with a length up to about 110–120 bp (Sigma-Genosys, Invitrogen). Longer oligonucleotides are normally purified from the contaminating abortive forms by high-pressure liquid chromatography or polyacrylamide gel electrophoresis. Because oligonucleotides are relatively short and, therefore, are unlikely to contain restriction recognition sites, the selection of the recombinant plasmids can be conveniently accomplished by restriction endonuclease digestion of the ligation mixture. After the extraction of plasmid DNA from transformants and the excision of the insert, the excised oligonucleotide insert can be analyzed in a 1.0–2.0% agarose gel or a polyacrylamide gel. However, it is often easier to confirm a successful oligonucleotide insertion by the resultant gain or loss of the relevant restriction recognition sites within the whole plasmid. Such restriction sites can be incorporated into or excluded from the oligonucleotide sequence with the specific purpose to simplify the clone screening.

The cloning of oligonucleotides benefits from the substantial available quantities of oligonucleotide DNA, its purity from undesired DNA fragments, the possibility to build cohesive ends of choice, the possibility to create restriction nuclease-resistant sites by changing nucleotide next to the cohesive end, and the possibility to introduce extra restriction sites if required. In addition, nonphosphorylated DNA is simple to obtain as oligonucleotides are supplied in the nonphosphorylated form by default. However, there are some oligonucleotide-specific pitfalls: longer oligonucleotides might require a particularly thorough purification and the cloning of oligonucleotides with a pronounced secondary structure might be complicated. Therefore, it is always advisable to check for potential stem–loop structures by software like DNAsis and to block these unwanted structures with some minor changes in the sequence of the designed oligonucleotide.


We have a significant experience and expertise in the large-scale production of plasmid DNA for various uses.

We manufacture GMP plasmid DNA starting material to be used in the production of viral vectors (e.g. lentivirus and AAV) or in vitro transcribed-RNA (IVT-RNA) (e.g. mRNA, long RNA)

We also produce plasmid DNA API for use as a DNA Vaccines or therapeutic non-viral plasmid DNA mediated gene therapy applications.

For human clinical trials and commercialization

All GMP material is produced in accordance to FDA 21 CFR Part 210 & 211, EU 2003/94/EC, Eudralex Vol 4, and relevant ICH the regulatory requirements for sterile injectable products intended for human clinical trials and commercialization.

GMP Plasmid DNA to kg-scale

We manufacture custom Plasmid DNA vaccines for gene therapy.

Our 2200L fermenter combined with our purification process allow the production of plasmid DNA to the kilo-scale for commercial uses.


ORIGINAL RESEARCH article

Nadide Altincekic 1,2 † , Sophie Marianne Korn 2,3 † , Nusrat Shahin Qureshi 1,2 † , Marie Dujardin 4 † , Martí Ninot-Pedrosa 4 † , Rupert Abele 5 , Marie Jose Abi Saad 6 , Caterina Alfano 7 , Fabio C. L. Almeida 8,9 , Islam Alshamleh 1,2 , Gisele Cardoso de Amorim 8,10 , Thomas K. Anderson 11 , Cristiane D. Anobom 8,12 , Chelsea Anorma 13 , Jasleen Kaur Bains 1,2 , Adriaan Bax 14 , Martin Blackledge 15 , Julius Blechar 1,2 , Anja Bཬkmann 4 * ‡ , Louis Brigandat 4 , Anna Bula 16 , Matthias Bütikofer 6 , Aldo R. Camacho-Zarco 15 , Teresa Carlomagno 17,18 , Icaro Putinhon Caruso 8,9,19 , Betül Ceylan 1,2 , Apirat Chaikuad 20,21 , Feixia Chu 22 , Laura Cole 4 , Marquise G. Crosby 23 , Vanessa de Jesus 1,2 , Karthikeyan Dhamotharan 2,3 , Isabella C. Felli 24,25 , Jan Ferner 1,2 , Yanick Fleischmann 6 , Marie-Laure Fogeron 4 , Nikolaos K. Fourkiotis 26 , Christin Fuks 1 , Boris Fürtig 1,2 , Angelo Gallo 26 , Santosh L. Gande 1,2 , Juan Atilio Gerez 6 , Dhiman Ghosh 6 , Francisco Gomes-Neto 8,27 , Oksana Gorbatyuk 28 , Serafima Guseva 15 , Carolin Hacker 29 , Sabine H๏ner 30 , Bing Hao 28 , Bruno Hargittay 1,2 , K. Henzler-Wildman 11 , Jeffrey C. Hoch 28 , Katharina F. Hohmann 1,2 , Marie T. Hutchison 1,2 , Kristaps Jaudzems 16 , Katarina Jović 22 , Janina Kaderli 6 , Gints Kalniņš 31 , Iveta Kaᑮpe 16 , Robert N. Kirchdoerfer 11 , John Kirkpatrick 17,18 , Stefan Knapp 20,21 , Robin Krishnathas 1,2 , Felicitas Kutz 1,2 , Susanne zur Lage 18 , Roderick Lambertz 3 , Andras Lang 30 , Douglas Laurents 32 , Lauriane Lecoq 4 , Verena Linhard 1,2 , Frank Löhr 2,33 , Anas Malki 15 , Luiza Mamigonian Bessa 15 , Rachel W. Martin 13,23 , Tobias Matzel 1,2 , Damien Maurin 15 , Seth W. McNutt 22 , Nathane Cunha Mebus-Antunes 8,9 , Beat H. Meier 6 , Nathalie Meiser 1 , Miguel Mompeán 32 , Elisa Monaca 7 , Roland Montserret 4 , Laura Mariño Perez 15 , Celine Moser 34 , Claudia Muhle-Goll 34 , Thais Cristtina Neves-Martins 8,9 , Xiamonin Ni 20,21 , Brenna Norton-Baker 13 , Roberta Pierattelli 24,25 , Letizia Pontoriero 24,25 , Yulia Pustovalova 28 , Oliver Ohlenschläger 30 , Julien Orts 6 , Andrea T. Da Poian 9 , Dennis J. Pyper 1,2 , Christian Richter 1,2 , Roland Riek 6 , Chad M. Rienstra 35 , Angus Robertson 14 , Anderson S. Pinheiro 8,12 , Raffaele Sabbatella 7 , Nicola Salvi 15 , Krishna Saxena 1,2 , Linda Schulte 1,2 , Marco Schiavina 24,25 , Harald Schwalbe 1,2 * ‡ , Mara Silber 34 , Marcius da Silva Almeida 8,9 , Marc A. Sprague-Piercy 23 , Georgios A. Spyroulias 26 , Sridhar Sreeramulu 1,2 , Jan-Niklas Tants 2,3 , Kaspars Tārs 31 , Felix Torres 6 , Sabrina Töws 3 , Miguel Á. Treviño 32 , Sven Trucks 1 , Aikaterini C. Tsika 26 , Krisztina Varga 22 , Ying Wang 17 , Marco E. Weber 6 , Julia E. Weigand 36 , Christoph Wiedemann 37 , Julia Wirmer-Bartoschek 1,2 , Maria Alexandra Wirtz Martin 1,2 , Johannes Zehnder 6 , Martin Hengesbach 1 * ‡ and Andreas Schlundt 2,3 * ‡
  • 1 Institute for Organic Chemistry and Chemical Biology, Goethe University Frankfurt, Frankfurt am Main, Germany
  • 2 Center of Biomolecular Magnetic Resonance (BMRZ), Goethe University Frankfurt, Frankfurt am Main, Germany
  • 3 Institute for Molecular Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
  • 4 Molecular Microbiology and Structural Biochemistry, UMR 5086, CNRS/Lyon University, Lyon, France
  • 5 Institute for Biochemistry, Goethe University Frankfurt, Frankfurt am Main, Germany
  • 6 Swiss Federal Institute of Technology, Laboratory of Physical Chemistry, ETH Zurich, Zurich, Switzerland
  • 7 Structural Biology and Biophysics Unit, Fondazione Ri.MED, Palermo, Italy
  • 8 National Center of Nuclear Magnetic Resonance (CNRMN, CENABIO), Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
  • 9 Institute of Medical Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
  • 10 Multidisciplinary Center for Research in Biology (NUMPEX), Campus Duque de Caxias Federal University of Rio de Janeiro, Duque de Caxias, Brazil
  • 11 Institute for Molecular Virology, University of Wisconsin-Madison, Madison, WI, United States
  • 12 Institute of Chemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
  • 13 Department of Chemistry, University of California, Irvine, CA, United States
  • 14 LCP, NIDDK, NIH, Bethesda, MD, United States
  • 15 Univ. Grenoble Alpes, CNRS, CEA, IBS, Grenoble, France
  • 16 Latvian Institute of Organic Synthesis, Riga, Latvia
  • 17 BMWZ and Institute of Organic Chemistry, Leibniz University Hannover, Hannover, Germany
  • 18 Group of NMR-Based Structural Chemistry, Helmholtz Centre for Infection Research, Braunschweig, Germany
  • 19 Multiuser Center for Biomolecular Innovation (CMIB), Department of Physics, São Paulo State University (UNESP), São José do Rio Preto, Brazil
  • 20 Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, Frankfurt am Main, Germany
  • 21 Structural Genomics Consortium, Buchmann Institute for Molecular Life Sciences, Frankfurt am Main, Germany
  • 22 Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, United States
  • 23 Department of Molecular Biology and Biochemistry, University of California, Irvine, CA, United States
  • 24 Magnetic Resonance Centre (CERM), University of Florence, Sesto Fiorentino, Italy
  • 25 Department of Chemistry “Ugo Schiff”, University of Florence, Sesto Fiorentino, Italy
  • 26 Department of Pharmacy, University of Patras, Patras, Greece
  • 27 Laboratory of Toxinology, Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil
  • 28 Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, United States
  • 29 Signals GmbH & Co. KG, Frankfurt am Main, Germany
  • 30 Leibniz Institute on Aging𠅏ritz Lipmann Institute (FLI), Jena, Germany
  • 31 Latvian Biomedical Research and Study Centre, Riga, Latvia
  • 32 “Rocasolano” Institute for Physical Chemistry (IQFR), Spanish National Research Council (CSIC), Madrid, Spain
  • 33 Institute of Biophysical Chemistry, Goethe University Frankfurt, Frankfurt am Main, Germany
  • 34 IBG-4, Karlsruhe Institute of Technology, Karlsruhe, Germany
  • 35 Department of Biochemistry and National Magnetic Resonance Facility at Madison, University of Wisconsin-Madison, Madison, WI, United States
  • 36 Department of Biology, Technical University of Darmstadt, Darmstadt, Germany
  • 37 Institute of Biochemistry and Biotechnology, Charles Tanford Protein Centre, Martin Luther University Halle-Wittenberg, Halle/Saale, Germany

The highly infectious disease COVID-19 caused by the Betacoronavirus SARS-CoV-2 poses a severe threat to humanity and demands the redirection of scientific efforts and criteria to organized research projects. The international COVID19-NMR consortium seeks to provide such new approaches by gathering scientific expertise worldwide. In particular, making available viral proteins and RNAs will pave the way to understanding the SARS-CoV-2 molecular components in detail. The research in COVID19-NMR and the resources provided through the consortium are fully disclosed to accelerate access and exploitation. NMR investigations of the viral molecular components are designated to provide the essential basis for further work, including macromolecular interaction studies and high-throughput drug screening. Here, we present the extensive catalog of a holistic SARS-CoV-2 protein preparation approach based on the consortium’s collective efforts. We provide protocols for the large-scale production of more than 80% of all SARS-CoV-2 proteins or essential parts of them. Several of the proteins were produced in more than one laboratory, demonstrating the high interoperability between NMR groups worldwide. For the majority of proteins, we can produce isotope-labeled samples of HSQC-grade. Together with several NMR chemical shift assignments made publicly available on covid19-nmr.com, we here provide highly valuable resources for the production of SARS-CoV-2 proteins in isotope-labeled form.


MATERIALS AND METHODS

Materials.

Random DNA oligonucleotides and primers were from the University of Michigan DNA Core. Taq DNA polymerase was prepared as described (18). T4 polynucleotide kinase was obtained from New England Biolabs. XbaI and SalI were from Boehringer Mannheim. γ- 32 P-ATP was from New England Nuclear and α- 32 P-ATP was from Amersham. Cellulose was purchased from W&R Balston, United Kingdom. Cellobiose was from Pfanstiehl Chemicals. Gentiobiose, lactose, and maltose were from Sigma. Cellotriose, cellotetraose, and cellopentaose were purchased from Seikagaku, Tokyo.

In Vitro Selection and Amplification.

A pool of 86-nucleotide DNA oligomer containing 40 central nucleotides of random sequence flanked by defined primer-binding sites (Fig. 1) was synthesized. This resulted in an initial pool with estimated complexity of 10 14 –10 16 different sequences: (5′-ATAGGAGTCGACCGACCAGAA [N]40 TATGTGCGTCTACATCTAGACTCAT).

Selection of anticellulose DNA aptamers. Selection began with a library of 10 15 DNA molecules containing known 5′ and 3′ sequences (for PCR priming) and 40 central nucleotide positions at which the sequence had been randomized during synthesis. In the first round, the library was bound to cellulose, washed extensively, and eluted first with method A (50 mM cellobiose competitor, 2.5 mM EDTA) and then with method B (50% formamide). DNA eluted from the first round of binding under these two conditions was kept separate in the subsequent 13 rounds of binding selection and amplification by PCR, eluting only with the original method. Pools of DNA aptamers from the 14th round of selection by each method were converted to double-stranded DNA, ligated into plasmid vectors to produce individual clones. DNA from more than 200 clones was reamplified to produce “monoclonal” aptamers, of which 80% were found to bind tightly to cellulose.

Short DNA oligonucleotides for amplifying selected sequences were: 5′-primer, 5′-ATAGGAGTCGACCGACCAGAA 3′-primer, 5′-ATGAGTCTAGATGTAGACGCACATA.

The selection method is shown in Fig. 1. For the first-round selection, 1 mg of DNA library was passed through a 50-mg cellulose column 10 times. The column was washed with 4 × 400 μl of binding buffer (20 mM Tris, pH 7.5/100 mM NaCl/5 mM MgCl2), and sequentially eluted first with elution method A (50 mM cellobiose, 2.5 mM EDTA) and then with method B (50% formamide). Eluted aptamers were precipitated with ethanol and resuspended in 50 μl H2O. Asymmetric PCRs were used to amplify the recovered DNA sequences from 40 μl of the resuspended DNA eluate and recover single-stranded DNA. The 400-μl asymmetric PCR consisted of: 1× PCR buffer (10 mM Tris, pH 9.1/50 mM KCl/.5 mM MgCl2), 200 μM dNTPs, 18.5 μg 5′-primer, 0.64 μg 3′-primer, and 2.5 units Taq DNA. The cycling protocol was 30 cycles of: 94°C for 45 sec, 60°C for 1 min, 72°C for 2 min. A 350-μl aliquot of the PCR product was bound to the next 50-μl packed cellulose column and, the elution/amplification process was repeated 13 rounds. After the first round of elution by method A, then method B, rounds 2–14 kept the DNA selected by the two methods separate.

Obtaining Monoclonal Aptamers from Selected Pools.

Double-stranded DNA fragments were prepared from round 14 of method A and method B pools, cleaved at the terminal XbaI and SalI site, and ligated into those sites in plasmid pUC19. Transformation into Escherichia coli strain DH5α was followed by direct PCR screening of plasmid-containing bacterial colonies. The difference between the colony PCR and the PCR described above was that the cell samples were initially denatured at 94°C for 5–10 min, followed by 30–35 cycles using equal amount of 5′-primer and 3′-primer (0.5 μg) to produce double-stranded DNA (dsDNA) corresponding to unique sequences from the selected pool. Individual single-stranded aptamer DNAs was derived from an aliquot of this dsDNA stock by asymmetric PCR of the dsDNA amplification.

Cellulose Binding Assay.

α- 32 P-dATP was incorporated into single-stranded DNA by using asymmetric PCR and isolated after electrophoretic separation in 10% denaturing polyacrylamide gels. 32 P-labeled single-stranded DNA was preincubated with binding buffer or 50 mM cellobiose (dissolved in binding buffer) for 10 min at room temperature. Next, 200 μl of the mixture was bound to 12–15 mg cellulose with gentle mixing for 20 min. The unbound fraction was saved, and the cellulose was washed with 200 μl of binding buffer. Bound aptamer was eluted with 200-μl elution method A or method B, depending on the conditions used in selection of that aptamer. Aliquots of 7.5 μl from the unbound, wash, or eluted fractions were analyzed by electrophoresis on denaturing 10% polyacrylamide gel and exposed to x-ray film.

DNA Sequencing.

Thirty-seven cellulose-binding aptamer clones were sequenced. They included all clones where binding was inhibited by cellobiose. Plasmid DNA was isolated by using Qiagen QIAprep-spin plasmid miniprep. DNA sequencing was performed by using primers flanking the plasmid insertion site and reagents from United States Biochemicals.

Hypochromicity Assay.

To quantitatively determine binding properties of three aptamers, milligram quantities were synthesized chemically. Only the internal variable region of three aptamers in were used: the 41 mer of Cel#16, the 40 mer of Cel#183, and the 36 mer of Cel#202 (see Table 1). A 41-mer control oligonucleotide also was used to show that the disaccharides generally did not cause hyperchromicity. The sequence of the 41 mer was 2-fold degenerate at four positions: 5′-CCGAATTCTGGAA(C or G)(A or T)CCC(T or A)(A or C)GCTTTCCTGATGAGTCCGTGA. To analyze binding of the DNAs to soluble disaccharides, hypochromicity was measured at 260 nm as a reflection of conformational change on binding the sugar. DNA aptamers at concentrations between 0.0006 and 0.0007 mM were preincubated with 0.00003 mM to 10 mM sugars (cellobiose, cellotetraose, gentiobiose, lactose, and maltose) at room temperature for 1 hr before reading absorbance at 260 nm. Higher concentrations of several sugars led to aberrantly large drops in absorbance, presumably caused by aggregation, and concentrations higher than 1 mM were not included in interpretations. Cellotriose and cellopentaose (not shown) gave data similar to that for cellotetraose. Readings were taken in triplicate, correcting for minor absorbance by the sugars, and each experiment was repeated three or more times. Hypochromicity is represented as the % change in absorbance from no sugar added (see Fig. 3) errors in the triplicate readings from one experiment are represented by bars.

Sequences of cellulose-binding DNA aptamers


Discussion

Our current working model for the activation of DNA cleavage by SgrAI involves the assembly of SgrAI/DNA complexes into a run-on oligomer that stabilizes an activated conformation of the SgrAI enzyme [10]. This activated conformation has an accelerated rate of DNA cleavage of both primary and secondary site sequences as compared to the low activity, dimeric form of SgrAI. The oligomer is a “run-on” oligomer due to its ability to add additional SgrAI/DNA complexes at one or both ends of the oligomer (a left handed helix) in a potentially unending manner (Fig 3A). The model predicts that formation of the run-on oligomer is favored only when SgrAI is bound to its primary recognition sequence, but also that the run-on oligomer is capable of incorporating, and thereby activating, SgrAI bound to secondary site sequences. This model for SgrAI activation has been derived from several previous studies of DNA cleavage by SgrAI [5,6,24], as well as structural studies including high resolution structures of the low activity form [8,11,12] and an 8.6 Å cryo-EM structure of the activated form (the run-on oligomer)[10]. Herein we sought to both test predications made based on the cryo-EM model using site directed mutagenesis, as well as to expand our understanding of various other aspects of SgrAI activation (see below).

Our standard assay to investigate the activation of SgrAI utilizes a low concentration of reporter DNA containing the SgrAI recognition site ( 32 P labeled), high concentrations of the SgrAI enzyme, and varied concentrations of unlabeled activator DNA. Typically the activator DNA is a cleaved version of the primary site sequence (i.e. PC DNA, Fig 1), however, we have also tested other versions of the primary site including the intact primary site sequence (40–1, Fig 1) and an uncleavable version of the intact primary site sequence (22-1-3’S, Fig 1), as well as a secondary site sequence (40-2A, Fig 1). The single turnover DNA cleavage conditions (i.e. the large excess of SgrAI concentration over reporter DNA concentration) allows for the measurement of a first order rate constant for DNA cleavage (kobs, Table 1). We find that this rate constant varies with the concentration of activator DNA (“Added Unlabeled Activator DNA”, Table 1), becoming faster with higher concentrations of activator DNA (see “Relative Acceleration”, Table 1). Previously we have discussed our interpretation of this phenomenon, namely in capturing in part or in whole, the rate of run-on oligomer formation (which would depend on the concentration of SgrAI bound to activator DNA) in the kobs [5]. In the current study, we have filled in several missing aspects of the SgrAI activation model. First, we have found that use of a reporter DNA molecule with a greater number of base pairs flanking the SgrAI recognition site (i.e. 40–1 vs. 18–1, Fig 1) increases the relative acceleration of DNA cleavage at intermediate concentrations of activator DNA (compare at 100 nM PC DNA, Table 1, Fig 2). Previously we have shown that the flanking DNA was important in the activator DNA, where a greater number of flanking base pairs results in a greater degree of SgrAI activation[10]. In either case, whether considering the reporter or the activator DNA, these results are readily explained by our model of the activated form of SgrAI in the cryo-EM model of the run-on oligomer, the residues from one SgrAI/DNA complex appear close enough to make ionic, hydrogen bonding, and/or van der Waals interactions with flanking DNA (magenta, Fig 3A) of the neighboring SgrAI/DNA complex (Fig 3A and 3F). In the DNA cleavage reactions, the run-on oligomer would be composed largely of SgrAI bound to PC DNA stabilized by interactions between the flanking DNA from one SgrAI/PC DNA complex to SgrAI residues of neighboring SgrAI/PC DNA complex (as well as by protein-protein interactions between neighboring SgrAI). Similarly, the SgrAI bound to the reporter DNA (i.e. 18–1 or 40–1) would be activated by being assimilated into this run-on oligomer and making the same types of interactions. The flanking DNA highlighted in magenta (Fig 3A, and pink and magenta, Fig 3F) are present in 40–1, but not 18–1. Therefore, SgrAI bound to 40–1 would be expected to bind with a higher affinity to the run-on oligomer and therefore be activated at lower concentrations of the run-on oligomer.

Similarly, measurements with a secondary site containing more flanking base pairs (40-2A vs. 18–2, Fig 1) result in considerably greater degrees of activation (up to 7000 fold in Relative Acceleration, Table 1)(compare 40-2A and 18–2, Fig 2), however, the measured rate constants for DNA cleavage of the secondary site at the highest concentration of activator DNA tested still fall short of those measured for the cleavage of primary site (compare 40-2A and 40-2B with 40–1 and 18–1 at 1000 nM PC DNA, Fig 2). The dependence of the rate constant of DNA cleavage on the concentration of activator DNA also differs between the secondary site (40-2A and 40-2B, Fig 2) and the primary site with the same number of flanking base pairs (40–1, Fig 2), requiring more activator DNA to achieve similar levels of activation for the cleavage of the secondary site (Fig 2). This cannot merely be explained by a lower affinity of SgrAI for the secondary site, since measured affinities were found to be in the 10–30 nM range (note: these were measured in the presence of Ca 2+ , but measurements with the catalytically relevant Mg 2+ indicate only a 5–6 fold weaker KD, Table 2). Instead we interpret this result as a lowered affinity of the run-on oligomer to the complex containing SgrAI bound to secondary site. This is consistent with our model for the inability of secondary site DNA to activate SgrAI, namely that binding to the secondary site by SgrAI favors the low activity conformation which would have a lower affinity for the run-on oligomer, since the run-on oligomer preferentially binds (and thereby stabilizes) the activated conformation [10].

The results with single mutations in SgrAI, designed to disrupt the run-on oligomer without disrupting SgrAI structure or DNA binding, also support our model for SgrAI activation (Tables 3–5, Fig 3). These substitutions were located in either of two loops making close contact to the flanking DNA of a neighboring SgrAI/DNA complex in the run-on oligomer structure (Fig 3A and 3F), or in one case, distal from any contacts (E301W, Fig 3A and Fig 3F). The substitutions with the greatest effect were those introducing a negative charge (S56E, A57E, Table 3), or those removing a positive charge (R131A and R134A, Table 3), consistent with charge repulsion or alternatively loss of charge attraction to DNA. However, these substitutions did not significantly affect DNA binding by SgrAI (Table 4) or the basal rate of DNA cleavage by SgrAI in the absence of activator DNA (compare 0 nM concentration of PC DNA, Table 3, to that of wild type SgrAI with 32 P-labeled 18–1, Table 1), indicating minimal disruption of the low activity form (i.e. the dimeric DNA bound SgrAI). The results are therefore consistent with the perturbation of interactions between flanking DNA (magenta, Fig 3A, pink and magenta, Fig 3F) and neighboring SgrAI in the run-on oligomer. In contrast, the E301W substituted SgrAI behaved as wild type in the DNA cleavage assays (Table 5), consistent with the position of E301 distal from any interfaces in the SgrAI/DNA complex or in the run-on oligomer (Fig 3). Finally, the last piece we present here supporting our run-on oligomer model of SgrAI derives from two previous independent structural investigations, one using mass spectrometry (ion mobility mass spectrometry[9]) and the other the cryo-EM structural study[10], which both provide consistent measures of the size of the run-on oligomer containing 1–6 copies of the DNA bound SgrAI dimer (DBD) (S4 Fig).

In addition to the studies probing the structural model of the run-on oligomer described above, we present the results of several new studies probing the parameters of SgrAI activation. First, we tested whether the uncleaved primary site could activate SgrAI to the same extent as the cleaved. Comparing 1 μM 40–1 to 2 μM PC DNA (since each SgrAI dimer binds one 40–1 or two PC DNA [9]), we found that PC DNA appears to activate SgrAI to twice the value of the observed rate constant (using 40–1 as the reporter DNA, Table 1). Since the uncleaved DNA could in principle be cleaved by SgrAI in these assays, we also tested the allosteric activating properties of an uncleavable version of the primary site (22-1-3’S, Fig 1), and found it capable of activating SgrAI by 16 fold (Table 1). Therefore, uncleaved primary site is in fact capable of activating SgrAI, however, the cleaved version may actually provide more robust activation.

Regarding the two types of SgrAI secondary site, those with the substitution in the 2 nd /7 th nucleotide position (i.e. CRCCGG G G ) and those with the substitution in the 1 st /8 th position (i.e. CRCCGGY ( A / C / T ))(the primary site sequence is CRCCGGYG ), we found qualitative and quantitative differences in cleavage by SgrAI (40-2A is of the first type, 40-2B is of the second type, Figs 1–2, Table 1). First, cleavage of the first type requires lower concentrations of activator DNA to accelerate its cleavage by SgrAI (Fig 2). Second, in the absence of activation the cleavage of the second type (40-2B, Table 1) is much slower in one strand than the other (slower in the bottom strand, Table 1). This may be because on this strand the cleavage site is closer to the substitution in the sequence (the difference in the sequence compared to the primary site sequence, Fig 1), hence potentially resulting in greater structural disruption of the cleavage active site, a phenomenon we have seen before with a different restriction endonuclease [25,26]. It is interesting that the differences in the rates of cleavage of the two strands disappear with higher concentrations of activator DNA (Table 1). This suggests that an asymmetry in the low activity conformation is absent in the activated conformation. The mechanism for this change awaits the high resolution structures of SgrAI (activated and low activity forms) bound to this type of secondary site DNA to date all structures of SgrAI bound to the secondary site have been with the first type, and in the low activity conformation [12], and the atomic details of the structure of the activated state of SgrAI is limited by the resolution cryo-EM study (8.6 A) and is with primary site DNA[10].

Finally, we investigated the potential for activated cleavage of a noncognate sequence, one that differs from primary at two base pairs, and is essentially a symmetrized version of a secondary site (i.e. C T CCGG A G , 40-NCTA, Fig 1). We found that SgrAI cleaved only a small percentage of this DNA, and with very slow kobs (Table 1). Unlike with secondary site DNA, the activator DNA was not able to activate SgrAI to cleave this sequence (Table 1). The lack of cleavage was not due to lack of binding to this sequence, as the affinity for the DNA was measured to be in the low nanomolar range, with either Ca 2+ or Mg 2+ (Table 2). This result suggests that the ability of SgrAI to cleave secondary site sequences is not merely loss of recognition to the outer two base pairs, otherwise 40-NCTA should also be cleaved by SgrAI. We suggest that binding to the noncognate DNA may result in blocking SgrAI from attaining the activated conformation, and therefore from binding to the run-on oligomer.

The unusual mechanism of activation and modulation of substrate specificity exhibited by SgrAI may actually be shared to a greater or lesser extent by a growing list of enzymes. Run-on filament formation that affects enzyme activity has been shown by a handful of other enzymes, including IRE1[2] (a kinase/RNase involved in the unfolded protein response), acetyl-CoA carboxylase [3] (ACC), CTP synthase [4,27,28] (CTPS), and RIP1/RIP3 kinases [29] (involved in programmed necrosis). The growing body of knowledge regarding the mechanisms of these enzymes allows for a comparison of mechanistic details with those of SgrAI. First, SgrAI is activated from a low activity state to one that is 200–1000 fold more active in the run-on oligomer IRE1, ACC and RIP1/RIP3 are also activated in their oligomeric/filamentous forms, while CTPS is inhibited. The actual degree of activation can vary in those activated by oligomerization, being 200–1000 fold for SgrAI, 60 fold for ACC [3], and over 100,000 fold in the case of the RNase activity of IRE1 [2] (that for RIP1/RIP3 has not been quantitated). Filament formation is stimulated by binding of substrate only in the case of SgrAI, while IRE1, ACC, RIP1/RIP3, and CTPS form filaments in response to binding to activators (unfolded proteins in the case of IRE1, and the allosteric effector citrate in the case of ACC), products (CTP, which is the product of CTPS), or phosphorylation (RIP1/RIP3). Phosphorylation also appears to be involved in further oligomerization of IRE1, once initial oligomerization has begun [2,7]. In addition, the substrates of CTPS can induce its depolymerization to produce the active form of the enzyme [27]. Modulation of substrate specificity upon run-on oligomerization or filament formation also occurs in the SgrAI system. Only IRE1 appears to display a similar property, as it cleaves other RNA molecules in addition to its target mRNA when in its longest filaments [7]. Some apparently unique features in the SgrAI system include the fact that it binds tightly to both types of sites, primary and secondary, in its low activity form but only cleaves primary in that state (secondary site DNA is cleaved significantly by SgrAI only in the run-on oligomer). In addition, both the substrate (uncleaved primary) and product (cleaved primary DNA) stabilize the run-on oligomeric form of SgrAI. Another characteristic that differentiates these systems is the potential for rapid association and dissociation kinetics, which are likely in all cases except the RIP1/RIP3 system, which appears to form an irreversible amyloid [29]. Finally, the proposed biological role for run-on oligomer/filament formation includes rapid activation (or deactivation in the case of CTPS) in all cases, and increased substrate binding (through a larger interface in the case of IRE1 and also possibly RIP1/RIP3). In the case of CTPS, the filament has been proposed to be a further mechanism of fine tuning the enzyme’s function to the environmental concentrations of substrate and product by creating a readily activatable pool of inactive enzymes [27]. Only the run-on oligomer formed by SgrAI has the proposed role of sequestration of activated enzymes[5,10]. We postulate that this mechanism evolved due to the relatively long genome of Streptomyces griseus, from which SgrAI derives the longer genome would result in a greater potential number of DNA cleavage sites, which must be protected by the cognate methyltransferase to prevent damage to the host DNA by the SgrAI endonuclease. If the activity of the methyltransferase is limited, perhaps due to cofactor availability, the system could respond evolutionarily by reducing the activity of the endonuclease. The unusually long recognition sequence of SgrAI (8 bp vs. 4–6 bp), and low DNA cleavage activity in the absence of activation both reduce SgrAI activity, but also concurrently diminish the enzyme’s effectiveness against invading phage DNA. However, the ability of SgrAI to both activate its cleavage activity more than 200 fold, and also expand its sequence specificity (from 3 to 17 different sequences) in the presence of unmethylated primary sites (e.g. as expected in phage DNA) recovers its activity against phage. Run-on oligomer formation may have evolved to sequester activated SgrAI on the phage and away from the host DNA, since cleavage at presumably unmethylated secondary sites in the host DNA would be damaging to the host. In addition, the run-on oligomer could also serve to rapidly activate many SgrAI by assimilating a potentially unlimited number of DNA bound SgrAI into a growing oligomer and/or to block replication and transcription of the phage DNA. In conclusion, SgrAI appears to be a member of a small (but perhaps growing [30–34]) group of enzymes known to be modulated by the formation of run-on oligomers or filaments. Yet, SgrAI maintains several unique characteristics as well. Further work will be needed to determine how generalizable the mechanisms of SgrAI and the other similarly behaving enzymes are in nature.


Calculating Concentrations for PCR

i) Oligonucleotide primers are generally supplied as "so many OD units/ml" - but what does this mean, in terms of mg/ml, or mmol/ml, etc?

Given: a primer is Y nucleotides (nt) long

Given: the MW of ssDNA is (330 daltons per nt) x (length in nt) (Sambrook et al., 1989 p. C.1)

Given: the concentration of primer (=ssDNA) producing an OD of 1 at 254 nm in a 1 cm cuvette, is 37 ug/ml

Then: the MW of the primer is 330.Y daltons

And: X OD/ml = 37.X ug/ml = 37.X mg/l = 37.X /330.Y mM = 37.X.1000/330.Y uM

For example:

B 88/77 primer - a 17-mer oligodeoxynucleotide - as supplied is 12.6 OD units/ml. We need to make a 5 uM stock solution for PCR.

MW: 17 x 330 = 5610

Concentration: 12.6 OD x 37 ug/ml = 466 ug/ml = 466 mg/l = 0.466 g/l

Molarity: 0.466/5610 = 0.000083 Molar = 83 uM

Therefore: we need 5 ul of oligo stock solution in 83 ul (+78 ul water) to make a 5 uM solution (if 1 ul in 83 ul gives a 1 uM soln. )

ii) Calculation of amounts for PCR reactions: if we need a final concentration of 0.5 uM oligo in the PCR reaction mix (final volume 50 ul), we add 5 ul of 5 uM stock to the reaction mix (1/10 final dilution).

B) Nucleotides:

Stocks of nucleotides for PCR (or other procedure) are NEARLY ALWAYS dNTP s (deoxynucleotides), and concentrations is almost always given in EACH dNTP: that is, the given concentration is EACH nucleotide in the mix, NOT the total concentration. This means that a 2.5 mM dNTP mix for PCR contains 2.5 mM of EACH dNTP, and 10 mM TOTAL dNTPs.

Example:

i) Make up a 2.5 mM stock solution of dNTPs from stock 100 mM individual dNTPs, supplied by Promega:

  • FIRST mix equal volumes of each nucleotide (eg: 50 ul): this gives you 200 ul of 25 mM mixed dNTPs (Remember: concn. expressed in EACH dNTP).
  • THEN dilute this (or aliquot) 1/10 with WATER - aliquot into 100 ul amounts and freeze.

ii) Prepare a 1 mM stock of dNTPs with dTTP substituted to 10% (w/w) by digoxigenin-11-dUTP (DIG-dUTP) for use as a labelling mix for PCR labelling of PCR products:

  • DIG-dUTP supplied (by Boehringer Mannheim) at 25 nmol/25ul = 1 umol/ml = 1mM final concentration of DIG-dUTP must be 1/10th that of other nucleotides, and [DIG-dUTP] + [dTTP] must = [any other dNTP]. Therefore to get a 1 mM dNTP stock one must dilute DIG-dUTP stock 1/10.
  • FIRST dilute separate 100 mM dNTP stocks to 10 mM (eg. 5 ul to 50 ul, in water).
  • THEN mix equal volumes (eg. 10 ul) of 10 mM dCTP, dGTP and dATP stock, and 9/10ths volume of dTTP (9 ul). Add equal volume (eg. 10 ul) of of 1 mM DIG-dUTP.
  • THEN add water to 10 vol (=100 ul add 51 ul): final concentration each dNTP = 1 mM final concn DIG-dUTP = 0.1 mM, and of dTTP = 0.9 mM.

iii) USE mix made above at 50 uM each dNTP in a PCR reaction mix, final volume 25ul:


How can I produce milligram quantities of an isotope-labeled DNA oligomer? - Biology

This is a division of application Ser. No. 08/383,766 filed Feb. 2, 1995, which is a continuation in part of application Ser. No. 08/180,863 filed Jan. 13, 1994, now abandoned, which is a continuation in part of application Ser. No. 08/092,863 filed Jul. 16, 1993, now abandoned.

1. A pre-encoded substrate useful for being linked to a single oligomer structure whereby said pre-encoded substrate bears a unique identifier, said tag being a pre-encoded microchip identifying said oligomer structure.

2. An encodable substrate useful for being linked to a single oligomer structure whereby said encodable substrate has a unique identifier, said tag being a encodable microchip identifying said oligomer structure.

3. One or a plurality of the pre-encoded substrate of claim 1 useful for in a method of preparing a labeled synthetic oligomer library comprising a plurality of different members, each member comprising a distinct pre-encoded substrate whereby said one or plurality of pre-encoded substrate is linked to a oligomer structure said method comprising the steps of:

a) apportioning said pre-encoded substrates among a plurality of reaction vessels

b) exposing said pre-encoded substrates in each reaction vessel to one or a plurality of transformation events

c) detecting and recording identifier information for each of said identifier tags in each of said reaction vessels

d) apportioning said pre-encoded substrates among a plurality of reaction vessels and

e) repeating steps a) through c) from at least one to about twenty times wherein said preencoded substrate being a pre-encoded microchip.

4. One or a plurality of the encodable substrate of claim 2 useful in a method of preparing a labeled synthetic oligomer library comprising a plurality of different members, each member comprising a distinct encodable substrate whereby said one or plurality of pre-encoded substrate a oligomer structure is linked to a oligomer structure said method comprising the steps of:

a) apportioning said encodable substrates among a plurality of reaction vessels

b) exposing said encodable substrates in each reaction vessels to one or a plurality of transformation events

c) adding first identifier information to said encodable substrates

d) apportioning said encodable substrates among a plurality of reaction vessels and

e) repeating steps a) through c) from at least one to about twenty times wherein said encodable substrate being a encodable microchip.

5. One or a plurality of the pre-encoded substrate of claim 1 useful in a method of preparing a labeled synthetic oligomer library comprising a plurality of different members, each member comprising a distinct pre-encoded substrate whereby said one or plurality of pre-encoded substrate is linked to a oligomer structure said method comprising the steps of:

a) apportioning said pre-encoded substrates among a plurality of reaction vessels

b) exposing said pre-encoded substrates in each reaction vessel to one or a plurality of monomers

c) detecting and recording identifier information for each of said identifier tags in each of said reaction vessels

d) apportioning said pre-encoded substrates among a plurality of reaction vessels and

e) repeating steps a) through c) from at least one to about twenty times wherein said pre-encoded substrate being a pre-encoded microchip.

6. One or a plurality of the encodable substrate of claim 2 useful in a method of preparing a labeled synthetic oligomer library comprising a plurality of different members, each member comprising a distinct encodable substrate whereby said one or plurality of pre-encoded substrate a oligomer structure is linked to a oligomer structure said method comprising the steps of:

a) apportioning said encodable substrates among a plurality of reaction vessels

b) exposing said encodable substrates in each reaction vessels to one or a plurality of monomers

c) adding first identifier information to said encodable substrates

d) apportioning said encodable substrates among a plurality of reaction vessels and

e) repeating steps a) through c) from at least one to about twenty times wherein said encodable substrate being a encodable microchip.

CROSS REFERENCE TO RELATED APPLICATIONS

BACKGROUND OF THE INVENTION

BRIEF DESCRIPTION OF THE FIGURES

DETAILED DESCRIPTION OF THE INVENTION

I. Labeled Oligomer Libraries

II. Methods for Generating Labeled Oligomer Libraries

III. Identifying the Sequence of Any Oligomer

IV. Types of Identifier Tags

V. Linking the Oligomers to the Identifier Tags

VI. Encoding the Identifier Tag Information

VII. Recovering and Decoding the Identifier Tag Information

VIII. Screening Receptors with Labeled Synthetic Oligomer Libraries

EXAMPLE I. SYNTHESIS OF ONE-HUNDRED AMIDES

EXAMPLE II. SYNTHESIS ON ELAMS™ OF FOUR PENTAPEPTIDES

A. Derivatization of ELAMS™

B. Preparation of Boc-Gly-L-Phe-L-Leu-OH

C. Preparation of Gly-L-Phe-L-Leu ELAMS™

D. Preparation of Gly-Gly-L-Phe-L-Leu (SEQ ID NO: 5) ELAMS™

E. Preparation of L-Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 6) ELAMS™

F. Preparation of Tyr-Gly-Gly-L-Phe-L-Leu (SEQ ID NO: 1) and Tyr-Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 2) ELAMS™

G. Preparation of Pro-L-Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 3) and Pro-Gly-Gly-L-Phe-L-Leu (SEQ ID NO: 4) ELAMS™

H. Selection of ELAMS™ Containing Peptide Ligands for Monoclonal Antibody 3E7

EXAMPLE III. PARALLEL SYNTHESIS OF PEPTIDES ON ELAMS™

A. Derivatizing Amino ELAMS™ with a Linker

B. Parallel Synthesis of Peptides

EXAMPLE IV. PARALLEL SYNTHESIS OF OLIGONUCLEOTIDE OCTAMERS

A. Preparation of Hydroxyl ELAMS™

C. Attachment of Synthesis Linker

D. Preparation of Fluoresceinylated Probe

E. Parallel Synthesis of Octanucleotides

EXAMPLE V. SEQUENCE SPECIFIC TARGET HYBRIDIZATION

The present invention relates to labeled combinatorial synthesis libraries and methods and apparatus for labeling individual library members of a combinatorial synthesis library with unique identification tags that facilitate elucidation of the structures of the individual library members synthesized.

BACKGROUND OF THE INVENTION

The relationship between structure and function of molecules is a fundamental issue in the study of biological systems. Structure-function relationships are important in understanding, for example, the function of enzymes, cellular communication, and cellular control and feedback mechanisms. Certain macromolecules are known to interact and bind to other molecules having a specific three-dimensional spatial and electronic distribution. Any macromolecule having such specificity can be considered a receptor, whether the macromolecule is an enzyme, a protein, a glycoprotein, an antibody, an oligonucleotide sequence of DNA, RNA or the like. The various molecules that receptors bind are known as ligands.

Pharmaceutical drug discovery is one type of research that relies on the study of structure-function relationships.

Most contemporary drug discovery involves discovering novel ligands with desirable patterns of specificity for biologically important receptors. Thus, the time necessary to bring new drugs to market could be greatly reduced by the discovery of novel methods which allow rapid screening of large numbers of potential ligands.

Since the introduction of solid phase synthesis methods for peptides and polynucleotides new methods employing solid phase strategies have been developed that are capable of generating thousands, and in some cases even millions, of individual peptide or nucleic acid polymers using automated or manual techniques. These synthesis strategies, which generate families or libraries of compounds, are generally referred to as "combinatorial chemistry" or "combinatorial synthesis" strategies.

Combinatorial chemistry strategies can be a powerful tool for rapidly elucidating novel ligands to receptors of interest. These methods show particular promise for identifying new therapeutics. See generally, Gorgon et al., "Applications of Combinatorial Technologies to Drug Discovery: II. Combinatorial organic Synthesis, Library Screening Strategies, and Future Directions," J. Med. Chem 37: 1385-401 (1994) and Gallop et al., "Applications of Combinatorial Technologies to Drug Discovery: I. Background and Peptide Combinatorial Libraries," J. Med. Chem 37: 1233-51 (1994). For example, combinatorial libraries have been used to identify nucleic acid aptamers, Latham et al., "The Application of a Modified Nucleotide in Aptamer Selection: Novel Thrombin Aptamers Containing 5-(1-Pentynyl)-2'-Deoxy Uridine," Nucl. Acids Res. 22: 2817-2822 (1994) to identify RNA ligands to reverse transcriptase, Chen & Gold, "Selection of High-Affinity RNA Ligands to Reverse Transcriptase: Inhibition of cDNA Synthesis and RNase H Activity," Biochemistry 33: 8746-56 (1994) and to identify catalytic antibodies specific to a particular reaction transition state, Posner et al., "Catalytic Antibodies: Perusing Combinatorial Libraries," Trends. Biochem. Sci. 19: 145-50 (1994).

The diversity of libraries generated using combinatorial strategies is impressive. For example, these methods havebeen used to generate a library containing four trillion decapeptides, Pinilla et al., "Investigation of Antigen-Antibody Interactions Using a Soluble, Non-Support-Bound Synthetic Decapeptide Library Composed of Four Trillion (4×10 12 ) Sequences," Biochem. J. 301: 847-53 (1994) 1,4-benzodiazepines libraries, Bunin et al., "The Combinatorial Synthesis and Chemical and Biological Evaluation of a 1,4-Benzodiazepine Library," Proc. Natl. Acad. Sci. 91: 4708-12 (1994) and U.S. Pat. No. 5,288,514, entitled "Solid Phase and Combinatorial Synthesis of Benzodiazepine Compounds on a Solid Support," issued Feb. 22, 1994 libraries containing multiple small ligands tied together in the same molecules, Wallace et al., "A Multimeric Synthetic Peptide Combinatorial Library," Pept. Res. 7: 27-31 (1994) libraries of small organics, Chen et al., "`Analogous` Organic Synthesis of Compound Libraries: Validation of Combinatorial Chemistry in Small-Molecule Synthesis," J. Am. Chem. Soc. 116: 2661-2662 (1994) libraries of peptidosteroidal receptors, Boyce & Nestler, "Peptidosteroidal Receptors for Opioid Peptides: Sequence-Selective Binding Using a Synthetic Receptor Library," J. Am. Chem. Soc. 116: 7955-7956 (1994) and peptide libraries containing non-natural amino acids, Kerr et al., "Encoded Combinatorial Peptide Libraries Containing Non-Natural Amino Acids," J. Am. Chem. Soc. 115: 2529-31 (1993).

To date, three general strategies for generating combinatorial libraries have emerged: "spatially-addressable," "split-bead" and recombinant strategies. These methods differ in one or more of the following aspects: reaction vessel design, polymer type and composition, control of physical constants such as time, temperature and atmosphere, isolation of products, solid-phase or solution-phase methods of assay, simple or complex mixtures, and method for elucidating the structure of the individual library members.

Of these general strategies, several sub-strategies have been developed. One spatially-addressable strategy that has emerged involves the generation of peptide libraries on immobilized pins that fit the dimensions of standard microtitre plates. See PCT Publication Nos. 91/17271 and 91/19818, each of which is incorporated herein by reference. This method has been used to identify peptides which mimic discontinuous epitopes, Geysen et al., BioMed. Chem. Lett. 3: 391-404 (1993), and to generate benzodiazepine libraries, U.S. Pat. No. 5,288,514, entitled "Solid Phase and Combinatorial Synthesis of Benzodiazepine Compounds on a Solid Support," issued Feb. 22, 1994 and Bunin et al., "The Combinatorial Synthesis and Chemical and Biological Evaluation of a 1,4-Benzodiazepine Library," Proc. Natl. Acad. Sci. 91: 4708-12 (1994). The structures of the individual library members can be decoded by analyzing the pin location in conjunction with the sequence of reaction steps used during the synthesis.

A second, related spatially-addressable strategy that has emerged involves solid-phase synthesis of polymers in individual reaction vessels, where the individual vessels are arranged into a single reaction unit. An illustrative example of such a reaction unit is a standard 96-well microtitre plate the entire plate comprises the reaction unit and each well corresponds to a single reaction vessel. This approach is an extrapolation of traditional single-column solid-phase synthesis.

As is exemplified by the 96-well plate reaction unit, each reaction vessel is spatially defined by a two-dimensional matrix. Thus, the structures of individual library members can be decoded by analyzing the sequence of reactions to which each well was subjected.

Another spatially-addressable strategy employs "tea bags" to hold the synthesis resin. The reaction sequence to which each tea bag is subject is recorded, which determines the structure of the oligomer synthesized in each tea bag. See for example, Lam et al., "A New Type of Synthetic Peptide Library for Identifying Ligand-Binding Activity," Nature 354: 82-84 (1991) Houghten et al., "Generation and Use of Synthetic Peptide Combinatorial Libraries for Basic Research and Drug Discovery," Nature 354: 84-86 (1991) Houghten, "General Method for the Rapid Solid-Phase Synthesis of Large Numbers of Peptides: Specificity of Antigen-Antibody Interaction at the Level of Individual Amino Acids," Proc. Natl. Acad. Sci. 82: 5131-5135 (1985) and Jung et al., Agnew. Chem. Int. Ed. Enal. 91: 367-383 (1992), each of which is incorporated herein by reference.

In another recent development, scientists combined the techniques of photolithography, chemistry and biology to create large collections of oligomers and other compounds on the surface of a substrate (this method is called "VLSIPS™"). See, for example, U.S. Pat. No. 5,143,854 PCT Publication No. 90/15070 PCT Publication No. 92/10092 entitled "Very Large Scale Immobilized Polymer Synthesis," Jun. 25, 1992 Fodor et al., "Light-Directed Spatially Addressable Parallel Chemical Synthesis," Science 251: 767-773 (1991) Pease et al., "Light-Directed Oligonucleotide Arrays for Rapid DNA Sequence Analysis," Proc. Natl. Acad. Sci. 91: 5022-5026 (1994) and Jacobs & Fodor, "Combinatorial Chemistry: Applications of Light-Directed Chemical Synthesis," Trends. Biotechnology 12(1): 19-26 (1994), each of which is incorporated herein by reference.

Others have developed recombinant methods for preparing collections of oligomers. See, for example, PCT Publication No. 91/17271 PCT Publication No. 91/19818 Scott, "Discovering Peptide Ligands Using Epitope Libraries," TIBS 17: 241-245 (1992) Cwirla et al., "Peptides on Phage: A Vast Library of Peptides for Identifying Ligands," Proc. Natl. Acad. Sci. 87: 6378-6382 (1990) Devlin et al., "Random Peptide Libraries: A Source of Specific Protein Binding Molecules," Science 249: 404-406 (1990) and Scott & Smith, "Searching for Peptide Ligands with an Epitope Library," Science 249: 386-390 (1990). Using these methods, one can identify each oligomer in the library by determining the coding sequences in the recombinant organism or phage. However, since the library members are generated in vivo, recombinant methods are limited to polymers whose synthesis is mediated in the cell. Thus, these methods typically have been restricted to constructing peptide libraries.

A third general strategy that has emerged involves the se of "split-bead" combinatorial synthesis strategies. See, for example, Furka et al., Int. J. Pent. Protein Res. 37: 487-493 (1991), which is incorporated herein by reference. In this method synthesis supports are apportioned into aliquots, each aliquot exposed to a monomer, and the beads pooled. The beads are then mixed, reapportioned into aliquots, and exposed to a second monomer. The process is repeated until the desired library is generated.

Since the polymer libraries generated with the split-bead method are not spatially-addressable, the structures of the individual library members cannot be elucidated by analyzing the reaction histogram. Rather, structures must be determined by analyzing the polymers directly. Thus, one limitation of the split-bead approach is the requisite for an available means to analyze the polymer composition. While sequencing techniques are available for peptides and nucleic acids, sequencing reactions for polymers of other composition, such as for example carbohydrates, organics, peptide nucleic acids or mixed polymers may not be readily known.

Variations on the "split-bead" scheme have emerged that obviate the need to sequence the library member directly. These methods utilize chemicals to tag the growing polymers with a unique identification tag ("co-synthesis" strategies). See, for example, PCT Publication No. WO 94/08051 entitled "Complex Combinatorial Chemical Libraries Encoded with Tags," Apr. 14, 1994 Nestler et al., "A General Method for Molecular Tagging of Encoded Combinatorial Chemistry Libraries," J. Org. Chem. 59: 4723-4724 (1994) PCT Publication No. WO 93/06121 entitled "Method of Synthesizing Diverse Collections of Oligomers," Apr. 1, 1993 Needels et al., Proc. Natl. Acad. Sci. 90: 10700-10704 (1993) Kerr et al., "Encoded Combinatorial Peptide Libraries Containing Non-Natural Amino Acids," J. Amer. Chem. Soc. 115: 2529-2531 (1993) and Brenner & Lerner, "Encoded Combinatorial Chemistry," Proc. Natl. Acad. Sci. 89: 5381-5383 (1992), each of which is incorporated herein by reference.

Encoding library members with chemical tags occurs in such a fashion that unique identifiers of the chemical structures of the individual library members are constructed in parallel, or are co-synthesized, with the library members. Typically, in a linear three component synthesis containing building blocks A, B and C in the process of generating library member ABC, an encoding tag is introduced at each stage such that the tags TA, TB and TC would encode for individual inputs in the library. The synthesis would proceed as follows: (a) Chemical A is coupled onto a synthesis bead, immediately followed by coupling tag TA to the bead (b) The bead is subject to deprotection conditions which remove the protecting group selectively from A, leaving TA protected. Chemical B is coupled to the bead, generating the sequence AB. The bead is then subject to deprotection which selectively removes the protecting group from TA, and TB is coupled to the bead, generating tag sequence TA TB (c) The third component C and concomitant tag TC is added to the bead in the manner described above, generating library sequence ABC and tag sequence TA TB TC.

For large libraries containing three chemical inputs, the chemical tagging sequence is the same. Thus, to generate a large library containing the complete set of three-input, one hundred unit length polymers, or 100 3 =10 6 library members, unique identifying tags are introduced such that there is a unique identifier tag for each different chemical structure. Theoretically, this method is applicable to libraries of any complexity as long as tagging sequences can be developed that have at least the same number of identification tags as there are numbers of unique chemical structures in the library.

While combinatorial synthesis strategies provide a powerful means for rapidly identifying target molecules, substantial problems remain. For example, since members of spatially addressable libraries must be synthesized in spatially segregated arrays, only relatively small libraries can be constructed. The position of each reaction vessel in a spatially-addressable library is defined by an XY coordinate pair such that the entire library is defined by a two-dimensional matrix. As the size of the library increases the dimensions of the two-dimensional matrix increases. In addition, as the number of different transformation events used to construct the library increases linearly, the library size increases exponentially. Thus, while generating the complete set of linear tetramers comprised of four different inputs requires only a 16×16 matrix (4 4 =256 library members), generating the complete set of linear octamers composed of four different inputs requires a 256×256 matrix (4 8 =65,536 library members), and generating the complete set of linear tetramers composed of twenty different inputs requires a 400×400 matrix (20 4 =160,000 library members). Therefore, not only does the physical size of the library matrix quickly become unwieldy (constructing the complete set of linear tetramers composed of twenty different inputs using spatially-addressable techniques requires 1667 microtitre plates), delivering reagents to each reaction vessel in the matrix requires either tedious, time-consuming manual manipulations, or complex, expensive automated equipment.

While the VLSIPS™ method attempts to overcome this limitation through miniaturization, VLSIPS™ requires specialized photoblocking chemistry, expensive, specialized synthesis equipment and expensive, specialized assay equipment. Thus, the VLSIPS™ method is not readily and economically adaptable to emerging solid phase chemistries and assay methodologies.

Split bead methods also suffer severe limitations. Although large libraries can theoretically be constructed using split-bead methods, the identity of library members displaying a desirable property must be determined by analytical chemistry. Accordingly, split-bead methods can only be employed to synthesize compounds that can be readily elucidated by microscale sequencing, such as polypeptides and polynucleotides.

Co-synthesis strategies have attempted to solve this structure elucidation problem. However, these methods also suffer limitations. For example, the tagging structures may be incompatible with synthetic organic chemistry reagents and conditions. Additional limitations follow from the necessity for compatible protecting groups which allow the alternating co-synthesis of tag and library member, and assay confusion that may arise from the tags selectively binding to the assay receptor.

Finally, since methods such as the preceding typically require the addition of like moieties, there is substantial interest in discovering methods for producing labeled libraries of compounds which are not limited to sequential addition of like moieties, and which are amenable to any chemistries now known or that will be later developed to generate chemical libraries. Such methods would find application, for example, in the modification of steroids, sugars, co-enzymes, enzyme inhibitors, ligands and the like, which frequently involve a multi-stage synthesis in which one would wish to vary the reagents and/or conditions to provide a variety of compounds.

In such methods the reagents may be organic or inorganic reagents, where functionalities or side groups may be introduced, removed or modified, rings opened or closed, stereochemistry changed, and the like.

From the above, one can recognize that there is substantial interest in developing improved methods and apparatus for the synthesis of complex labeled combinatorial chemical libraries which readily permit the construction of libraries of virtually any composition and which readily permit accurate structural determination of individual compounds within the library that are identified as being of interest. Many of the disadvantages of the previously-described methods as well as many of the needs not met by them are addressed by the present invention, whichas described more fully hereinafter, provides myriad advantages over these previously-described methods.

The following terms are intended to have the following general meanings as they are used herein:

Labeled Synthetic Oligomer Library: A "labeled synthetic oligomer library" is a collection of random synthetic oligomers wherein each member of such a library is labeled with a unique identifier tag from which the structure or sequence of each oligomer can be deduced.

Identifier Tag: An "identifier tag" is any detectable attribute that provides a means whereby one can elucidate the structure of an individual oligomer in a labeled synthetic oligomer library. Thus, an identifier tag identifies which transformation events an individual oligomer has experienced in the synthesis of a labeled synthetic oligomer library, and at which reaction cycle in a series of synthesis cycles each transformation event was experienced.

An identifier tag may be any detectable feature, including, for example: a differential absorbance or emission of light magnetic or electronically pre-encoded information or any other distinctive mark with the required information. An identifier tag may be pre-encoded with unique identifier information prior to synthesis of a labeled synthetic oligomer library, or may be encoded with a identifier information concomitantly with of a labeled synthetic oligomer library.

In this latter embodiment, the identifier information added at each synthesis cycle is preferably added in a sequential fashion, such as, for example digital information, with the identifier information identifying the transformation event of synthesis cycle two being appended onto the identifier information identifying the transformation event of synthesis cycle one, and so forth.

Preferably, an identifier tag is impervious to the reaction conditions used to construct the labeled synthetic oligomer library.

A preferred example of an identifier tag is a microchip that is pre-encoded or encodable with information, which information is related back to a detector when the microchip is pulsed with electromagnetic radiation.

Pre-Encoded Identifier Tag: A "pre-encoded identifier tag" is an identifier tag that is pre-encoded with unique identifier information prior to synthesis of a labeled synthetic oligomer library. A preferred example of such a pre-encoded identifier tag is a microchip that is pre-encoded with information, which information is related back to a detector when the microchip is pulsed with electromagnetic radiation.

Encodable Identifier Tag: An "encodable identifier tag" is an identifier tag that is capable of receiving identifier information from time to time. An encodable identifier tag may or may not be pre-encoded with partial or complete identifier information prior to synthesis of a labeled synthetic oligomer library. A preferred example of such an encodable identifier tag is a microchip that is capable of receiving and storing information from time to time, which information is related back to a detector when the microchip is pulsed with electromagnetic radiation.

Transformation Event: As used herein, a "transformation event" is any event that results in a change of chemical structure of an oligomer or polymer. A "transformation event" may be mediated by physical, chemical, enzymatic, biological or other means, or a combination of means, including but not limited to, photo, chemical, enzymatic or biologically mediated isomerization or cleavage photo, chemical, enzymatic or biologically mediated side group or functional group addition, removal or modification changes in temperature changes in pressure and the like. Thus, "transformation event" includes, but is not limited to, events that result in an increase in molecular weight of an oligomer or polymer, such as, for example, addition of one or a plurality of monomers, addition of solvent or gas, or coordination of metal or other inorganic substrates such as, for example, zeolities events that result in a decrease in molecular weight of an oligomer or polymer, such as, for example, de-hydrogenation of an alcohol to form an alkene or enzymatic hydrolysis of an ester or amide events that result in no net change in molecular weight of an oligomer or polymer, such as, for example, stereochemistry changes at one or a plurality of a chiral centers, Claissen rearrangement, or Cope rearrangement and other events as will become apparent to those skilled in the art upon review of this disclosure. See, for example, application Ser. No. 08/180,863 filed Jan. 13, 1994, which is assigned to the assignee of the present invention and PCT Publication WO 94/08051 entitled "Complex Combinatorial Libraries Encoded with Tags," Apr. 14 (1994), each of which is incorporated herein by reference.

Monomer: As used herein, a "monomer" has the same meaning as defined below.

Oligomer or Polymer: As used herein, an "oligomer" or "polymer" is any chemical structure that can be synthesized using the combinatorial library methods of this invention, including, for example, amides, esters, thioethers, ketones, ethers, sulfoxides, sulfonamides, sulfones, phosphates, alcohols, aldehydes, alkenes, alkynes, aromatics, polyaromatics, heterocyclic compounds containing one or more of the atoms of: nitrogen, sulfur, oxygen, and phosphorous, and the like chemical entities having a common core structure such as, for example, terpenes, steroids, β-lactams, benzodiazepines, xanthates, indoles, indolones, lactones, lactams, hydantoins, quinones, hydroquinones, and the like chains of repeating monomer units such as polysaccharides, phospholipids, polyurethanes, polyesters, polycarbonates, poly ureas, polyamides, polyethyleneimines, poly arylene sulfides, polyimides, polyacetates, polypeptides, polynucleotides, and the like or other oligomers or polymers as will be readily apparent to one skilled in the art upon review of this disclosure. Thus, an "oligomer" and "polymer" of the present invention may be linear, branched, cyclic, or assume various other forms as will be apparent to those skilled in the art.

Concerted: As used herein "concerted" means synchronous and asychronous formation of one or more chemical bonds in a single reaction step.

Substrate: As used herein, a "substrate" is a synthesis means linked to an identifier tag. By way of example and not limitation, a "substrate" may be an identifier tag functionalized with one or a plurality of groups or linkers suitable for synthesis a glass or polymer encased identifier tag, which glass or polymer is functionalized with one or a plurality of groups or linkers suitable for synthesis an identifier tag that is coated with one or a plurality of synthesis supports an identifier tag retained within a frame or housing, which frame or housing is functionalized with one or a plurality of groups or linkers suitable for synthesis an identifier tag retained within a frame or housing, which frame or housing also retains one or a plurality of synthesis supports and the like.

Synthesis Means: A "synthesis means" is any means for carrying out synthesis of a labeled synthetic oligomer library. Thus, "synthesis means" may comprise reaction vessels, columns, capillaries, frames, housings, and the like, suitable for carrying out synthesis reactions one or a plurality of synthesis supports suitable for carrying out synthesis reactions or functional groups or linkers attached to an identifier tag suitable for carrying out synthesis reactions.

"Synthesis means" may be constructed such they are capable of retaining identifier tags and/or synthesis supports.

In a preferred embodiment a "synthesis means" is one or a plurality of synthesis supports.

Synthesis Support: A "synthesis support" is a material having a rigid or semi-rigid surface and having functional groups or linkers, or that is capable of being derivatized with functional groups or linkers, that are suitable for carrying out synthesis reactions.

Such materials will preferably take the form of small beads, pellets, disks, capillaries, hollow fibers, needles, solid fibers, cellulose beads, pore-glass beads, silica gels, polystyrene beads optionally cross-linked with polyethylene glycol divinylbenzene, grafted co-poly beads, poly-acrylamide. beads, latex beads, dimethylacrylamide beads optionally cross-linked with N,N'-bis-acryloyl ethylene diamine, glass particles coated with a hydrophobic polymer, or other convenient forms.

"Synthesis supports" may be constructed such that they are capable of retaining identifier tags.

Linker: A "linker" is a moiety, molecule, or group of molecules attached to a synthesis support or substrate and spacing a synthesized polymer or oligomer from the synthesis support or substrate. A "linker" can also be a moiety, molecule, or group of molecules attached to a substrate and spacing a synthesis support from the substrate.

Typically a linker will be bi-functional, wherein said linker has a functional group at one end capable of attaching to a monomer, oligomer, synthesis support or substrate, a series of spacer residues, and a functional group at another end capable of attaching to a monomer, oligomer, synthesis support or substrate. The functional groups may be, but need not be, identical.

Spacer residues: "Spacer residues" are atoms or molecules positioned between the functional groups of a bifunctional linker, or between a functional group of a linker and the moiety to which the linker is attached. "Spacer residues" may be atoms capable of forming at least two covalent bonds such as carbon, silicon, oxygen, sulfur, phosphorous, and the like, or may be molecules capable of forming at least two covalent bonds such as amino acids, peptides, nucleosides, nucleotides, sugars, carbohydrates, aromatic rings, hydrocarbon rings, linear and branched hydrocarbons, and the like.

Linked together the spacer residues may be rigid, semi-rigid or flexible. Linked spacer residues may be, but need not be, identical.

Pre-encoded Substrate: A "pre-encoded substrate" is a substrate wherein the identifier tag is a pre-encoded identifier tag.

Encodable Substrate: An "encodable substrate" is a substrate wherein the identifier tag is an encodable identifier tag.

Synthetic: A compound is "synthetic" when produced by in vitro chemical or enzymatic synthesis.

Oligomer or Polymer Sequence: As used herein "oligomer sequence" or "polymer sequence" refers to the chemical structure of an oligomer or polymer.

The present invention relates to labeled libraries of random oligomers. Each library member is labeled with a unique identifier tag from which the structure of the library member can be readily ascertained.

The present invention also relates to methods and apparatus for synthesizing labeled libraries of random oligomers. The random oligomers are generally synthesized on synthesis supports, but may be cleaved from these supports or synthesized in solution phase to provide a soluble library. In a preferred embodiment the oligomers are composed of a set of monomers, the monomers being any member of the set of atoms or molecules that can be joined together to form an oligomer or polymer. The library is then screened to isolate individual oligomers that bind to a receptor or possess some desired property. In a preferred embodiment, each oligomer structure in the library is unique.

The identifier tag is used to identify the structures of oligomers contained in the labeled synthetic oligomer library. The identifier tag, which may be linked to the oligomer in a variety of fashions, may be any detectable feature that in some way carries the required information, and that is decipherable. Preferably, the identifier tag is impervious to the chemical reagents used to synthesize the library.

In a preferred embodiment the identifier tag relates a signal to a detector upon excitation with electromagnetic radiation. Suitable identifier tags may be, by way of example and not limitation, bar codes that can be scanned with a laser, chemical moieties that differentially emit or absorb light, such as chromophores, fluorescent, and phosphorescent moieties, or microchips that are pre-encoded or are encodable with a unique radiofrequency "fingerprint" that emit a detectable signal when pulsed with electromagnetic radiation.

In a further preferred embodiment the identification tags are encased in glass or a polymeric material that can be coated with synthesis supports or derivatized with functional groups or linkers suitable for synthesis. Preferably, the identifier tags can be sorted with automatic sorting equipment. Such polymeric materials and sorting equipment are widely known in the art.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 schematically illustrates the synthesis of the complete set of linear trimers composed of four different monomer inputs using pre-encoded identifier tags.

FIG. 2 schematically illustrates the assembly of the complete set of linear trimers composed of four different monomer inputs wherein the identifier tags are encoded with identifier information in parallel with oligomer synthesis.

FIG. 3 schematically illustrates the synthesis of a labeled library of oligomers composed of four different monomer inputs and-having a common core structure using pre-encoded identifier tags.

FIG. 4 schematically illustrates the synthesis of a labeled library of oligomers composed of four different monomer inputs and having a common core structure wherein the identifier tags are encoded with identifier information in parallel with oligomer synthesis.

FIG. 5 schematically illustrates the synthesis of a labeled library of oligomers constructed using a multiple cycle synthesis series with a plurality of different transformation events using pre-encoded identifier tags.

FIG. 6 schematically illustrates the synthesis of a labeled library oligomers constructed using a multiple cycle synthesis series with a plurality of different transformation events wherein the identifier tags are encoded in parallel with oligomer synthesis.

FIG. 7 schematically illustrates several ways in which an identifier tag can be linked to an oligomer library member.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides labeled synthetic libraries of random oligomers and methods and apparatus for generating labeled synthetic oligomer libraries. Each member of such a library is labeled with a unique identifier tag that specifies the structure or sequence of the oligomer. In a preferred embodiment of the present invention the identifier tag is a microchip that is pre-encoded or encodable with information that is related back to a detector when the identifier tag is pulsed with electromagnetic radiation.

I. Labeled Oligomer Libraries

The present invention relates to labeled libraries of random oligomers. Each member of a random oligomer library is linked to an identifier tag such that the structure of the oligomer library member can be readily ascertained. The random oligomer library generally comprises a highly diverse collection of oligomers, wherein each member of such library comprises a single oligomer structure (e.g., a benzodiazepine). The oligomers may be soluble or may be bound to a synthesis support or substrate.

The library members may be linked to an identifier tag in a variety of fashions. See, for example, FIG. 7. For example, an oligomer library member may be attached to a synthesis support, which synthesis support is retained within a reaction vessel, frame or housing that also retains an identifier tag. As another example, a library member may be attached to a synthesis support which is in turn attached to an identifier tag. The library member may be attached directly to a functional group on the synthesis support, but will usually be attached by means of a linker. The linker will generally be a bi-functional linker, which bi-functional linker comprises a functional group capable of attaching to a monomer, oligomer, synthesis support or substrate on one end, a series of spacer residues, and a functional group capable of attaching to a monomer, oligomer or synthesis support or substrate at another end.

Attachment of a synthesis support to an identifier tag can be mediated by a variety of means. For example, an identifier tag may be coated with one or a plurality of synthesis supports, which synthesis supports are attached to the identifier tag by physical means such as glue or magnetic attraction. In one embodiment a synthesis support may be a polymer capable of being functionalized with reactive groups or linkers, which synthesis support is molded into a frame or housing that retains an identifier tag.

Alternatively, one or a plurality of synthesis supports may be covalently attached to an identifier tag. Covalent attachment may be directly to a functional group on the identifier tag, or may be mediated by a linker as described above.

In another embodiment, a library member may be attached directly to a functional group on an identifier tag, or to a linker which is attached to a functional group on an identifier tag.

Synthesis supports and substrates may have a plurality of functional groups or linkers, such that each synthesis support or substrate may have a plurality of oligomer library members of identical sequence attached thereto. The quantity synthesized of each library member comprising the labeled oligomer library can be varied by varying the number of synthesis supports, functional groups or linkers on synthesis supports, or functional groups or linkers on substrates. Thus, the labeled oligomer library of the present invention may comprise milligram quantities of each library member structure, thereby providing a sufficient quantity of each library member for multiple assays or other analytical experiments.

The labeled oligomer libraries of the present invention generally comprise a highly diverse collection of oligomers. Such a library may contain, for example, all N X different oligomers, wherein each oligomer is synthesized in a series of X synthesis cycles using N different transformation events. As a specific example, a library may contain all combinations of N X different oligomers, which oligomers are composed of N different monomers assembled in X synthesis cycles.

The library may also contain oligomers having-been synthesized with different transformation events at, for example, only one or a small number of cycles in the synthesis series, while having identical transformation events at all other cycles. As a specific example, a library may contain oligomers having different monomers at only one or a small number of positions while having identical monomers at all other positions.

Oligomers or polymers of the present invention are formed from a stepwise or concerted series of transformation events. A transformation event is any event that results in a change of chemical structure of an oligomer or polymer. A transformation event may be mediated by physical, chemical, enzymatic, biological or other means, or a combination of means, including but not limited to, photo, chemical, enzymatic or biologically mediated isomerization or cleavage photo, chemical, enzymatic or biologically mediated side group or functional group addition, removal or modification changes in temperature changes in pressure and the like. Thus, transformation events include, but are not limited to, events that result in an increase in molecular weight of an oligomer or polymer, such as, for example, addition of one or a plurality of monomers, addition of solvent or gas, or coordination of metal or other inorganic substrates such as, for example zeolities events that result in a decrease in molecular weight of an oligomer or polymer, such as, for example, dehydrogenation of an alcohol to from an alkene, or enzymatic hydrolysis of an ester or amide events that result in no net change in molecular weight of an oligomer or polymer, such as, for example, stereochemistry changes at one or a plurality of a chiral centers, Claissen rearrangement, or Cope rearrangement and other events as will become apparent to those skilled in the art upon review of this disclosure. See, for example, application Ser. No. 08/180,863 filed Jan. 13, 1994, which is assigned to the assignee of the present invention and PCT Publication WO 94/08051 entitled "Complex Combinatorial Libraries Encoded with Tags," Apr. 14 (1994), each of which is incorporated herein by reference.

In a preferred embodiment, at lease one transformation event in the generation of a labeled synthetic oligomer library is the stepwise or concerted enzymatic or chemical addition of one or a plurality of monomers.

In another preferred embodiment, each transformation event in the generation of a labeled synthetic oligomer library is the stepwise or concerted enzymatic or chemical addition of one or a plurality of monomers.

A monomer is any atom or molecule capable of forming at least one chemical bond. Thus, a monomer is any member of the set of atoms or molecules that can be joined together as single units in a multiple of sequential or concerted chemical or enzymatic reaction steps to forma an oligomer or polymer. The set of monomers useful in the present invention includes, but is not restricted to, alkyl and aryl amines alkyl and aryl mercaptans alkyl and aryl ketones alkyl and aryl carboxylic acids alkyl and aryl esters alkyl and aryl ethers alkyl and aryl sulfoxides alkyl and aryl sulfones alkyl and aryl sulfonamides phenols alkyl alcohols alkyl and aryl alkenes alkyl and aryl lactams alkyl and aryl lactones alkyl and aryl di- and polyenes alkyl and aryl alkynes alkyl and aryl unsaturated ketones aldehydes sulfoxides sulfones heteroatomic compounds containing one or more of the atoms of: nitrogen, sulfur, phosphorous, oxygen, and other polyfunctional molecules containing one or more of the above functional groups L-amino acids D-amino acids deoxyribonucleosides deoxyribonucleotides ribonucleosides ribonucleotides sugars benzodiazepines β-lactams hydantoins quinones hydroquinones terpenes and the like.

The monomers of the present invention may have groups protecting the functional groups within the monomer. Suitable protecting groups will depend on the functionality and particular chemistry used to construct the library. Examples of suitable functional protecting groups will be readily apparent to skilled artisans, and are described, for example, in Greene and Wutz, Protecting Groups in Organic Synthesis, 2d ed., John Wiley & Sons, N.Y. (1991), which is incorporated herein by reference.

As used herein, monomer refers to any member of a basis set for synthesis of an oligomer. For example, dimers of L-amino acids form a basis set of 400 "monomers" for synthesis of polypeptides. Different basis sets of monomers may be used at successive steps in the synthesis of a polymer. Thus, as the skilled artisan will appreciate, the oligomer or polymer library members generated by practicing the present invention may serve as monomers in a the synthesis of a labeled synthetic oligomer libraries.

Accordingly, oligomers or polymers of the present invention comprise any chemical structure that can be synthesized using the combinatorial library methods of this invention, including, for example, amides, esters, thioethers, ketones, ethers, sulfoxides, sulfonamides, sulfones, phosphates, alcohols, aldehydes, alkenes, alkynes, aromatics, polyaromatics, heterocyclic compounds containing one or more of the atoms of: nitrogen, sulfur, oxygen, and phosphorous, and the like chemical entities having a common core structure such as, for example, terpenes, steroids, β-lactams, benzodiazepines, xanthates, indoles, indolones, lactones, lactams, hydantoins, quinones, hydroquinones, and the like chains of repeating monomer units such as polysaccharides, phospholipids, polyurethanes, polyesters, polycarbonates, poly ureas, polyamides, polyethyleneimines, poly arylene sulfides, polyimides, polyacetates, polypeptides, polynucleotides, and the like or other oligomers or polymers as will be readily apparent to one skilled in the art upon review of this disclosure. Thus, an "oligomer" and "polymer" of the present invention may be linear, branched, cyclic, or assume various other forms as will be apparent to those skilled in the art.

A labeled oligomer library of the present invention may comprise virtually any level of complexity and is limited in size only by the physical size of a substrate. An oligomer library will typically comprise from about 10 to about 5000 library members, preferably from about 1000 to about 250,000 library members, and more preferably from about 50,000 to about 10 6 library members.

The labeled synthetic oligomer libraries of the present invention have a wide variety of uses. By way of example and not limitation, labeled synthetic oligomer libraries can be used to identify peptide, nucleic acid, carbohydrate and/or other structures that bind to proteins, enzymes, antibodies, receptors and the like identify sequence-specific binding drugs identify epitopes recognized by antibodies evaluate a variety of drugs for clinical diagnostic applications identify materials that exhibit specific properties, such as, for example, ceramics identify elements comprising superconducting compositions combinations of the above, and other uses that will be apparent to those skilled in the art.

II. Methods for Generating Labeled Oligomer Libraries

The present invention also provides methods and apparatus for generating labeled oligomer libraries. The general methods typically involve synthesizing the oligomers in a random combinatorial fashion by a stepwise or concerted series of transformation events. A labeled oligomer library may be produced by synthesizing on each of a plurality of identifier tags linked to a synthesis means ("substrates") a single oligomer structure, the oligomer structure being different for different substrates.

Substrates used for practicing the methods of the present invention include, but are not limited to, an identifier tag functionalized with one or a plurality of groups or linkers suitable for synthesis a glass or polymer encased identifier tag, which glass or polymer is functionalized with one or a plurality of groups or linkers suitable for synthesis an identifier tag that is coated with one or a plurality of synthesis supports an identifier tag retained within a frame or housing, which frame or housing is functionalized with one or a plurality of groups or linkers suitable for synthesis an identifier tag retained within a frame or housing, which frame or housing also retains one or a plurality of plurality of synthesis supports and the like.

In a preferred embodiment a substrate comprises an identifier tag retained within a frame or housing, which frame or housing also retains one or a plurality of plurality of synthesis supports.

In another preferred embodiment a substrate comprises an identifier tag retained within a frame or housing, which frame or housing is functionalized with one or a plurality of groups or linkers suitable for synthesis.

In yet another preferred embodiment a substrate comprises an identifier tag, optionally encased in a glass or polymeric coating, which identifier tag is functionalized with one or a plurality of groups or linkers suitable for synthesis.

In one embodiment of the methods of the present invention a labeled synthetic oligomer library is generated that employs "birth-to-death" identifier tags. A "birth-to-death" identifier tag is a tag whose information content does not change during the course of synthesis. A labeled synthetic oligomer library is synthesized in a process comprising the steps of: (a) apportioning a plurality of substrates, each of which is pre-encoded with a unique identifier tag ("pre-encoded substrates") among a plurality of reaction vessels (b) exposing the pre-encoded substrates in each reaction vessel to a one or a plurality of transformation events (c) detecting and recording the identifier tag information from each pre-encoded substrate in each reaction vessel (d) apportioning the pre-encoded substrates among a plurality of reaction vessels and (e) repeating steps (a) through (c) from at least one to about twenty times.

A capping step wherein unreacted functional groups following a transformation event are "capped" with a highly reactive chemical moiety specific for the functional group(s) desired to be capped may be used after each transformation event. Suitable chemical capping moieties are well known in the art.

Typically, substantially equal numbers of substrates will be apportioned into each reaction vessel. The substrates may be apportioned in a stochastic manner at each step, but preferably will be apportioned in a non-stochastic fashion.

In a preferred embodiment, at least one transformation event is the stepwise or concerted chemical or enzymatic addition of one or a plurality of monomer units.

In an even more preferred embodiment, each transformation event is the stepwise or concerted chemical or enzymatic addition of one or a plurality of monomer units. For this preferred embodiment, a labeled synthetic oligomer library is synthesized in a process comprising the steps of: (a) apportioning a plurality of pre-encoded substrates among a plurality of reaction vessels (b) exposing the pre-encoded substrates in each reaction vessel to a one or a plurality of monomer units (c) detecting and recording the identifier tag information from each pre-encoded substrate in each reaction vessel (d) apportioning the pre-encoded substrates among a plurality of reaction vessels and (e) repeating steps (a) through (c) from at least one to about twenty times.

A capping step wherein unreacted functional groups following addition of one or a plurality of monomers are "capped" with a highly reactive chemical moiety specific for the functional group(s) desired to be capped may be used after each reaction cycle. Suitable chemical capping moieties are well known in the art.

As a specific example of the method, one may consider the synthesis of the set of linear oligomers three monomer residues in length, assembled from a set of four monomers A, B, C, D. See FIG. 1. The first monomer is coupled to four different aliquots of pre-encoded substrates, each different monomer in a different aliquot. The identifier information from each pre-encoded substrate is detected and recorded for each different aliquot. The pre-encoded substrates from all the aliquots are then redistributed for a second round of monomer addition.

The pre-encoded substrates may be redistributed in a stochastic fashion. For this method the pre-encoded substrates from all the aliquots are be pooled, which pool now contains approximately equal numbers of four different types of pre-encoded substrates, each of which is characterized by the monomer in the first residue position, and redistributed into the separate monomer reaction vessels containing A, B, C or D as the monomer. Alternatively, the pre-encoded substrates may be sorted and redistributed in a non-stochastic fashion into the separate monomer reaction vessels containing A, B, C or D as the monomer.

Following re-distribution a second monomer is coupled and the identifier information from each pre-encoded substrate again detected and recorded for each substrate in each reaction vessel. Each vessel now has substrates with four different monomers in position one and the monomer contained in each particular second reaction vessel in position two. The pre-encoded substrates from all reaction vessels are again redistributed among each of the four reaction vessels, and the coupling, detecting and recording process repeated. The process of sequential coupling and apportioning yields pre-encoded substrates that have passed through all the possible reaction pathways, and the collection of pre-encoded substrates displays all possible trimers composed of the four monomer inputs A, B, C and D (4 3 =64 trimers).

The sequential detection and recording steps have provided a detailed list of the stepwise monomer additions to which each pre-encoded substrate was subjected ("reaction histogram"). For example, if the trimer sequence ABC was synthesized on a pre-encoded substrate bearing identifier tag signal "001", the recorded reaction histogram would reveal that in the first reaction step substrate 001 was contained in the reaction vessel containing monomer A, in the second reaction step substrate 001 was contained in the reaction vessel containing monomer B, and in the third step in the vessel containing monomer C. Thus, determining in which reaction vessel a particular pre-encoded substrate was contained at each reaction step reveals the polymer structure or sequence synthesized on the particular pre-encoded substrate. Thus, it can be appreciated that the number of unique identifier tags needed to label the library is dictated by the number of members in the library being generated.

In another embodiment of the present invention the identifier tags are encoded with information in parallel with generating the oligomer library ("encodable substrates"). The encodable substrates may be pre-encoded with partial identifier information prior to synthesis or may be blank. A labeled synthetic oligomer library is synthesized in a process comprising the steps of: (a) apportioning a plurality of encodable substrates among a plurality of reaction vessels (b) exposing the substrates in each reaction vessel to one or a plurality of transformation events (c) adding identifier information to each identifier tag in each reaction vessel (d) apportioning the encodable substrates among a plurality of reaction vessels and (e) repeating steps (a) through (c) from at least one to about twenty times.

A capping step wherein unreacted functional groups following a transformation event are "capped" with a highly reactive chemical moiety specific for the functional group(s) desired to be capped may be used after each transformation event. Suitable chemical capping moieties are well known in the art.

In a preferred embodiment, at least one transformation event is the stepwise or concerted chemical or enzymatic addition of one or a plurality of monomer units.

In an even more preferred embodiment, each transformation event is the stepwise or concerted chemical or enzymatic addition of one or a plurality of monomer units. For this preferred embodiment, a labeled synthetic oligomer library is synthesized in a process comprising the steps of: (a) apportioning a plurality of encodable substrates among a plurality of reaction vessels (b) exposing the substrates in each reaction vessel to one or a plurality of units (c) adding identifier information to each identifier tag in each reaction vessel (d) apportioning the encodable substrates among a plurality of reaction vessels and (e) repeating steps (a) through (c) from at least one to about twenty times.

A capping step wherein unreacted functional groups following addition of one or a plurality of monomer units are "capped" with a highly reactive chemical moiety specific for the functional group(s) desired to be capped may be used after each reaction cycle. Suitable chemical capping moieties are well known in the art.

Typically, substantially equal numbers of substrates will be apportioned into each reaction vessel. As discussed above, the redistribution process may be stochastic, but is preferably non-stochastic.

As a specific example of the method, one may again consider the synthesis of the set of oligomers three residues in length, assembled from a set of monomers A, B, C, D. See FIG. 2. The first monomer is coupled to four different aliquots of encodable substrates, each different monomer in a different aliquot. Identifier information is added to the identifier tags in each aliquot, with the identifier information being unique for each aliquot. Thus, each encodable substrate is characterized by the identity of the monomer in the first residue position. The encodable substrates are then redistributed among the separate monomer reaction vessels containing A, B, C, or D as the monomer.

The second residue is coupled and identifier information unique to each aliquot added to the encodable substrates in each reaction vessel. Following this reaction, each vessel now has encodable substrates with four different monomers in position one and the monomer contained in each particular second reaction vessel in position two. The encodable substrates from all reaction vessels are again redistributed among each of the four reaction vessels, the third monomer coupled and identifier information added. The process of sequential re-distributing and coupling yields substrates that have passed through all the possible reaction pathways, and the collection of substrates displays all possible trimers composed of the A, B, C, and D (4 3 =64 trimers).

Each identifier tag is now labeled with sequential information that identifies the monomers to which each encodable substrate was exposed. For example, if one assigned the four monomers A, B, C and D identification labels according to a binary code such that A=00, B=01, C=10 and D=11, the encodable substrate containing the sequence ABC will contain an identifier tag that reads 000110.

It will be appreciated that the identifier tag "grows" with the growing oligomer, and thus the number of unique identification labels, which identification labels uniquely identify particular transformation events, is dictated by the number of transformation events used to generate the oligomer library. Accordingly, a unique identifier tag is generated for each oligomer in the library by the sequential addition of identification labels identifying the transformation events used to construct the library member.

As will be readily appreciated by those skilled in the art, the method of assembling oligomers from a stepwise or concerted series of transformation events can utilize any chemical, physical, enzymatic or biological means, or combinations thereof, that can effect a change in the structure of an oligomer or polymer. The oligomers can be synthesized by introducing, modifying, or removing functional groups or side chains, opening and/or closing rings, changing stereo chemistry, and the like. Accordingly, the resulting oligomers can be linear, branched, cyclic, or assume various other conformations that will be apparent to those of ordinary skill in the art. See FIGS. 3, 4, 5 and 6.

In addition, because the substrates are apportioned amongst a number of reaction vessels, transformation events using different physical chemical, enzymatic or biological chemistries, or combination thereof can be used to assemble the oligomers. Examples of the plethora of transformation events that can be used with the present invention are described in, for example, application Ser. No. 08/180,863 filed Jan. 13, 1994, which is assigned to the assignee of the present invention and PCT Publication WO 94/08057 entitled, "Complex Combinatorial Libraries Encoded with Tags" Apr. 14 (1994), each of which is incorporated herein by reference.

Thus, those skilled in the art will appreciate that the methods of the present invention can be used to synthesize labeled libraries of virtually any chemical composition including, but not limited to, amides, esters, thioethers, ketones, ethers, sulfoxides, sulfonamides, sulfones, phosphates, alcohols, aldehydes, alkenes, alkynes, aromatics, polyaromatics, heterocyclic compounds containing one or more of the atoms of: nitrogen, sulfur, oxygen, and phosphorous, and the like chemical entities having a common core structure such as, for example, terpenes, steroids, β-lactams, benzodiazepines, xanthates, indoles, indolones, lactones, lactams, hydantoins, quinones, hydroquinones, and the like chains of repeating monomer units such as polysaccharides, phospholipids, polyurethanes, polyesters, polycarbonates, poly ureas, polyamides, polyethyleneimines, poly arylene sulfides, polyimides, polyacetates, polypeptides, polynucleotides, and the like or other oligomers or polymers as will be readily apparent to those skilled in the art.

In a preferred embodiment, at least one transformation event in the generation of a labeled synthetic oligomer library is the stepwise or concerted enzymatic or chemical addition of one or a plurality of monomers.

In another preferred embodiment, each transformation event in the generation of a labeled synthetic oligomer library is the stepwise or concerted enzymatic or chemical addition of one or a plurality of monomers.

In these preferred modes, the functionalities connecting monomers need not be identical. Thus, polymers composed of non-identical interlinkages are contemplated by the preferred embodiments. Also contemplated are oligomers that are composed of non-uniform monomer composition. As one example, an oligomer may be composed of aryl or alkyl hydroxyl, aryl or alkyl carboxylic acid, and aryl or alkyl amine monomers. As another example, an oligomer may be composed of deoxyribonucleoside, carbohydrate, and amino acid monomer units.

Although typically the present invention will utilize solid phase synthesis strategies, the present invention also contemplates solution phase chemistries. Techniques for solid phase synthesis of peptides are described, for example, in Atherton and Sheppard, Solid Phase Peptide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1989) for oligonucleotides in, for example, Gait, Oligonucleotide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1984) each of which is incorporated herein by reference.

Techniques for solution and solid phase multiple component combinatorial array syntheses strategies include U.S. patent application Ser. No. 08/092,862 filed Jan. 13, 1994, which is assigned to the assignee of the present invention, and which is incorporated herein by reference.

Other synthetic strategies that may be employed by the present invention are described in, for example, Bunin et al., "The Combinatorial Synthesis and Chemical and Biological Evaluation of a 1,4-Benzodiazepine Library," Proc. Natl. Acad. Sci. 91: 4708-12 (1994) and U.S. Patent No. 5,288,514, entitled "Solid Phase and Combinatorial Synthesis of Benzodiazepine Compounds on a Solid Support," issued Feb. 22, 1994 and Chen et al., "`Analogous` Organic Synthesis of Compound Libraries: Validation of Combinatorial Chemistry in Small-Molecule Synthesis," J. Am. Chem. Soc. 116: 2661-2662 (1994).

Thus, as those of skill in the art will appreciate, the methods of the present invention may be used with virtually any synthesis strategy, be it chemical, biological or otherwise, that is now known or will be later developed, to generate libraries of oligomers or polymers.

The representation of each library member within the library depends on apportioning the substrates into the proper reaction vessels at each reaction cycle in the synthesis series. In one embodiment the substrates can be pooled at each step, mixed and stochastically re-apportioned into reaction vessels for subsequent reaction cycles. For stochastic mixing and apportioning, increasing the number of substrates upon which a single oligomer sequence will be synthesized increases the likelihood that each oligomer sequence will be represented in the library.

In a preferred embodiment the substrates are apportioned in a non-stochastic manner at each reaction cycle. Non-stochastic distribution has two distinct advantages. First, it ensures that each oligomer sequence is represented in the synthesis library, even if only a single substrate is employed for each oligomer sequence. Second, it increases the likelihood that all oligomer sequences are represented in substantially equal quantities in a synthesized library.

The non-stochastic redistribution process at each reaction cycle will be dictated by the composition of the library. Generally, the composition of any library can be described as a series of sequential matrix calculations. The number of different transformation events at each synthesis cycle is represented by a horizontal "chemical input" matrix. The composition of the library at the beginning of each cycle is defined by a "library matrix". The composition of the library at the completion of any cycle is the product of the library matrix (from the beginning of the cycle) and the chemical input matrix for the cycle just completed.

The matrix notation can be best illustrated by way of specific example. If one desires to construct the complete set of linear oligomer trimers composed of four different monomer inputs A, B, C, and D (4 3 =64 library members), the chemical input matrix at each cycle is A, B, C, D!. Thus, at the end of the first monomer addition step, the library matrix is the vertical matrix ##EQU1## where the subscript denotes the reaction cycle number in the series of synthesis reaction cycles.

The composition of the library following the second round of transformation events (here monomer addition reactions) is obtained by taking the product of the chemical input matrix for cycle two and the library matrix from cycle one. Here, the composition of the library at the end of the second reaction cycle is given by: ##EQU2##

The composition of the complete set of trimers (i.e. at the end of the third reaction cycle) is given by: ##EQU3##

This matrix notation illustrates the redistribution process that must take place at each reaction cycle to generate a library of a particular composition. Specifically, at each reaction cycle each set of substrates in a particular reaction vessel must be divided into subsets, where the number of subsets is equal to the number of different transformation events at that cycle. The exact distribution process will depend on the composition of the library, and will be readily apparent to those skilled in the art upon review of this disclosure.

In one preferred embodiment the substrates can be manually sorted and reapportioned at each reaction cycle. This can be illustrated by way of specific example for a library comprising the complete set of N Xn oligomers composed of Xn monomer inputs assembled in N reaction cycles. For the first monomer addition reaction N Xn substrates are divided into Xn reaction vessels, N Xn /Xn substrates per vessel. Integer multiples of N Xn substrates may also be used. After addition of the first monomer inputs X1, X2, . . . Xn, the substrates in each of the Xn vessels are divided into Xn aliquots and reapportioned into the Xn vessels, one aliquot per vessel. Following the second reaction cycle each library member can be represented as NXn1 Xn2, where Xn1 represents the first monomer added to the substrate and Xn2 represents the second monomer added to the substrate.

For the third monomer addition step, each subset of substrates NXn1 Xn2 is divided into Xn aliquots and reapportioned into the Xn reaction vessels, one aliquot per vessel. Repeating this process N times generates the complete set of oligomers comprised of Xn monomer inputs.

Modifications of this exemplary approach are also possible. For example, one may use any series of transformation events at any reaction cycle. The set of transformation events may be expanded or contracted from cycle to cycle or the set of transformation events could be changed completely for the next cycle (e.g. couple a monomer in one cycle, rearrange stereochemistry in another cycle). Additionally, one can fix certain transformation events at some cycles while varying other transformation events, to construct oligomer frameworks wherein certain residues or regions within oligomers are altered to provide diversity.

In another preferred embodiment the substrates are sorted and re-apportioned at each cycle using automated sorting equipment. Substrates are placed in a mechanical hopper which introduces them into a vibratory sorter apparatus such as are commonly employed in the manufacturing industry to sort small objects. The vibratory sorter introduces the substrates one at a time into a delivery chute. Attached to the chute is a detector which can scan the identifier tag and receive a unique identifier code. When the code is received the substrate is released from the chute and drops into a reaction vessel. A conveyor system may be used to position one of a series of reaction vessels under the sorter chute for receipt of the substrate. After the identifier tag has been read the conveyor system may then be positioned such that the correct reaction vessel is positioned under the chute to sort individually or in groups as desired.

III. Identifying the Sequence of Any Oligomer

The present invention provides methods for identifying the structure of any of the oligomers in the library. By tracking the synthesis pathway that each oligomer has taken, one can deduce the sequence of any oligomer in a given library. The method involves linking an identifier tag to an oligomer that indicates the series of transformation events, and corresponding synthesis cycles in which those transformation events were experienced, that an oligomer has experienced during construction of a labeled synthetic oligomer library. In one embodiment, after a series of synthesis cycles and concurrent identifier tag detections, one tracks the transformation events to which a particular identifier tag, and thus oligomer, was subjected at each synthesis cycle to determine the oligomer structure. In another embodiment, after a series of synthetic cycles and concurrent identifier tag additions, one "reads" the identifier tag associated with an oligomer to determine the structure of the particular oligomer.

The identifier tags therefore identify each transformation event that an individual oligomer library member has experienced. In addition, a record of the synthesis cycle in the synthesis series at which each transformation event was experienced is generated ("reaction histogram"). As described above, the identifier tags may be pre-encoded with unique identifier information prior to synthesis, or can be encoded with information immediately before, during, or after each transformation event. Methods employing pre-encoded identifier tags, and thus wherein a unique identifier tag is assigned to each library member prior to synthesis, require a number of unique identifier tags equal to the number of library members. Methods employing encodable identifier tags and wherein identifier information is added at each transformation event require only as many unique identification labels, which labels uniquely identify particular transformation events, as there are different transformation events in the synthesis cycle. Unique identifier tags identifying the structure of each oligomer in the library are generated concomitant with synthesis as identification labels identifying each transformation event are added, preferably in a sequential fashion, to the encodable identifier tags at each synthesis cycle. In this latter embodiment the identifier tags preserve a sequential record of which particular transformation events a substrate experienced at each synthesis cycle.

IV. Types of Identifier Tags

The identifier tags of the instant invention may be any detectable feature that permits elucidation of the structure of each oligomer synthesized in a given labeled library. Thus, identifier tags may be, for example: microscopically distinguishable in shape, size, color, optical density, etc. differentially absorbing or emitting of light chemically reactive pre-encoded with optical, magnetic or electronic information or in some other way distinctively marked with the required decipherable information.

In a preferred embodiment of the invention, the identifier tags relate information back to a detector when pulsed with electromagnetic radiation.

In a more preferred embodiment the identifier tags are microchips that are either pre-encoded or encodable with a unique radiofrequency "fingerprint" that can be detected by pulsing the identifier tag with electromagnetic radiation.

The radiofrequency fingerprint may be a single radiofrequency band, or a combination of radiofrequency bands. When pulsed with electromagnetic radiation, the identifier tags emit a radiofrequency fingerprint that is detected by an electromagnetic radiation detector. Therefore it can be appreciated that the number of unique radiofrequency identifier tags, or "fingerprints," that are is available is virtually limitless.

The identifier tags of the present invention can be pre-encoded with unique identifier information prior to synthesis of a labeled synthetic oligomer library. For example, an identifier tag may be a bar code strip that corresponds to, say, the number 001, or it may be a radiofrequency fingerprint comprised of one or a plurality of frequency bands. Each library member is thus labeled with a unique, static identifier tag throughout the combinatorial synthesis, or in a "birth-to-death" fashion.

The identifier tags may also be encodable with new information from time to time. For example, the identifier tag may be a bar code strip that can receive sequential information from time to time or a microchip that can be "downloaded" with digital information from time to time. The encodable identifier tags may be either blank or pre-encoded with partial or complete identifier information prior to synthesis of a labeled synthetic oligomer library.

Each transformation event in a series of reaction cycles in the synthesis of a labeled synthetic oligomer library is assigned a unique identification label, which label is added to the encodable identifier tag either concomitant with, or close in time to, performance of a particular transformation event. At the termination of a synthesis comprising an unlimited number of reaction cycles and transformation events, a unique sequential signal has been downloaded onto the microchip such that the history of transformation events to which the encodable substrate was subjected is recorded in a sequential fashion on the microchip. The oligomer sequence can therefore be deduced by detecting the unique sequential identification information contained in the identifier tag.

In another preferred embodiment the identifier tags of the present invention are encased in a glass or polymeric material. Such glass or polymeric material may be readily attached to synthesis supports, directly derivatized with one or a plurality of functional groups suitable for synthesis, or directly derivatized with one or a plurality of linkers bearing one or a plurality of functional groups suitable for synthesis. It can be readily appreciated that the identity of such functional groups will be dictated by the composition of the desired oligomers. Suitable groups will be readily apparent to those skilled in the art and include, but are not restricted to, amino, carboxyl, sulfhydryl, hydroxyl, and the like.

Any glass or polymeric material capable of encasing an identifier tag can be used in the present invention. In one mode such polymeric material is capable of being derivatized with one or a plurality of functional groups or linkers suitable for synthesis. Polymers such as polyethylene glycol polystyrene-divinyl benzene, polyethylene grafted polystyrene, polyacrylamide-kieselguhr composites and glass have all been commercialized with functional groups suitable for derivatization with various linkers and monomers. Methods for derivatizing such polymers are well known in the art. See, e.g., Atherton and Sheppard, Solid Phase Peptide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1989), and Gait, Oligonucleotide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1984), each of which is incorporated herein by reference.

Suitable preferred identifier tags are well known in the art and are described, for example, in U.S. Pat. No. 5,252,962, entitled "System Monitoring Programmable Implantable Transponder," issued on Oct. 12, 1993, to Bio Medic Data Systems, Inc. and U.S. Pat. No. 5,351,052, entitled "Transponder System for Automatic Identification Purposes," issued on Sep. 27, 1994, to Texas Instruments, Inc., each of which is incorporated herein by reference. Commercially available examples include ELAMS™ (Electronic Laboratory Animal Monitoring Systems), manufactured by Biomedic Data Systems, 225 West Spring Valley Ave., Maywood, N.J. 07607, and TIRIS™ (Texas Instruments Registration and Identification System), manufactured by Texas Instruments, 12501 Research Blvd., Mailstop 2243, Austin, Tex. 78759.

ELAMS™, which are widely used to tag and identify laboratory mice via subcutaneous injection of the ELAM™, comprise glass-encased microchips that are pre-tuned to emit a unique radiofrequency fingerprint when pulsed with electromagnetic radiation. TIRIS™, which are currently used for security cards and to track and identify livestock and automobiles, comprise glass-encased microchips that can be downloaded with digital information from time to time.

ELAMS™ and TIRIS™ possess a variety of features that make them ideally suited for use as combinatorial chemistry library identifier tags. For example, ELAMS™ and TIRIS™ can be readily sorted using currently available automated sorter technology. In addition, the encoded information can be scanned and stored in a microcomputer. Furthermore, ELAMS™ and TIRIS™ are compatible with virtually any chemistry now known or that will be later developed to generate oligomer or polymer libraries. Thus, large labeled libraries of virtually any chemical composition can be generated in an automated fashion, thereby increasing the diversity of labeled libraries available while decreasing the time and effort necessary to generate such libraries.

V. Linking the Oligomers to the Identifier Tags

An oligomer of the present invention may be linked to an identifier tag in a variety of fashions. See, for example, FIG. 7. One or a plurality of oligomers of identical sequence can be attached directly to one or a plurality of functional groups on an identifier tag, to one or a plurality of linkers that are attached to an identifier tag, or to one or a plurality of synthesis supports that are attached to an identifier tag. An identifier tag may also be retained within a frame or housing to which one or a plurality of oligomers of identical structure are attached. Additionally, an identifier tag may be retained within a frame or housing that also retains one or a plurality of synthesis supports having attached thereto one or a plurality of oligomers of identical structure.

In one preferred embodiment one or a plurality of oligomers of identical structure are attached directly to one or a plurality of functional groups on an identifier tag. Typically, the identifier tag will be encased in a glass or polymeric material that can be derivatized with one or a plurality of functional groups suitable for synthesis. Any polymeric material capable of providing functional groups suitable for attachment can be utilized. It can be readily appreciated that the identity of the functional groups will be dictated by the composition of the desired oligomers. Suitable groups will be readily apparent to those skilled in the art and include, but are not limited to, amino sulfhydryl, hydroxyl, and the like.

Several polymeric materials have been commercialized with suitable functional groups such as, for example, polystyrene-divinyl benzene, polyethylene grafted polystyrene, polyacrylamide kieselguhr composite and controlled pore glass. Methods for derivatizing such polymers are well known in the art. See, e.g., Atherton and Sheppard, Solid Phase Peptide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1989), and Gait, Oligonucleotide Synthesis: A Practical Approach, IRL Press at Oxford University Press, Oxford, England (1984), each of which is incorporated herein by reference.

Alternatively, one or a plurality of oligomers of identical structure may be attached to the functional groups on an identifier tag by means of a linker. A linker is generally a moiety, molecule, or group of molecules attached to a synthesis support or substrate and spacing a synthesized polymer or oligomer from a synthesis support or substrate.

Typically a linker will be bi-functional, wherein said linker has a functional group at one end capable of attaching to a monomer, oligomer, synthesis support or substrate, a series of spacer residues, and a functional group at another end capable of attaching to a monomer, oligomer, synthesis support or substrate.

The functional groups of a bifunctional linker need not be identical, thereby allowing the linker to act as a "functional group adapter." Thus, bifunctional linkers provide a means whereby the functional group displayed on a substrate or synthesis support, say an amino group, can be converted into a different functional group, say a hydroxyl group. The use of bifunctional linkers can therefore greatly increase the repertoire of chemistries that can take advantage of solid phase synthesis strategies.

The composition and length of the linker will depend in large part upon the application of the library. The degree of binding between an immobilized library member and its binding partner may in some embodiments depend on the accessibility of the immobilized library member to its binding partner. The accessibility in turn may depend on the length and/or chemical composition of the linker moiety.

The composition of the linker moiety will also depend on the desired properties of the linker, and in large part will be dictated by the physical conditions and/or biological or chemical reagents to which the linker will be exposed during synthesis. For example, one may desire a rigid linker, such as for example, poly-proline or poly-allyl, or one may desire flexible linker such as, for example polyalanine or saturated hydrocarbons.

It is desirable that the linker be stable to the reaction conditions used to construct the library. Linkers of suitable composition will be readily apparent to those skilled in the art, or may be later developed.

The number of spacer residues that comprise the linker may also vary. Typically, a linker will generally comprise about 1-100 spacer residues, preferably about 1-20 spacer residues, and usually about 5-15 spacer residues.

Spacer residues may be atoms capable of forming at least two covalent bonds, such as carbon, silicon, oxygen, nitrogen, sulfur, phosphorous, and the like. Spacer residues ay also be molecules capable of forming at least two covalent bonds such as amino acids, nucleosides, nucleotides, sugars, aromatic rings, hydrocarbon rings, carbohydrates, branched or linear hydrocarbons, and the like. Thus, the interlinkages comprising the linker include, but are not limited to, amides, ethers, esters, ureas, phosphoesters, thioesters, thioethers, and the like. The interlinkages connecting spacer residues may be, but need not be, identical.

Linked together, the spacer residues may form linear, cyclic, branched, or other types of structures. Thus, a linker may provide a plurality of functional groups to which oligomers may be attached, thereby increasing the quantity of oligomer synthesized. Linked together the spacer residues may be rigid, semi-rigid or flexible. The spacer residues comprising the linker may be, but need not be, identical.

Such linker moieties may be capable of later releasing the synthesized molecules by some specific, regulatable mechanism. Such regulatable mechanisms include but are not restricted to thermal, photochemical, electrochemical, acid, base, oxidative and reductive reactions. Several linkers which provide a variety of functional group coupling and cleavage strategies are commercially available.

As will be readily apparent to the skilled artisan upon review of this disclosure, the labeled combinatorial synthesis methods and apparatus described herein can be used to optimize linker composition and length.

In another preferred embodiment, one or a plurality of oligomer of identical structure may be attached to one or a plurality of synthesis supports that are attached to an identifier tag. An oligomer may be covalently attached directly to a functional group on the synthesis support, or may be attached to a synthesis support by means of a linker as described above. In one preferred embodiment one or a plurality of synthesis supports are attached to an identifier tag by physical means such as glue or magnetic attraction. Virtually any physical means that is stable to the reaction conditions employed to synthesize the library may be used.

In another preferred embodiment, one or a plurality of synthesis supports is covalently attached to an identifier tag (optionally encased in a glass or polymeric coating). Such covalent attachment can be either directly to one or a plurality of functional groups on the identifier tag, or by means of a linker as described above.

In yet another preferred embodiment an identifier tag may be retained within a frame or housing, which frame or housing also retains one or a plurality of oligomers of identical structure or one or a plurality of synthesis supports having attached thereto one or a plurality of oligomers of identical structure. Such oligomer attachment may be either directly to a functional group on the synthesis support, or may be mediated by a linker as described above.

In still another preferred embodiment an identifier tag may be retained in a frame or housing, which frame or housing has attached thereto one or a plurality of oligomers of identical structure. Such oligomer attachment may be either directly to a functional group on the frame or housing, or may be mediated by a linker as described above.

VI. Encoding the Identifier Tag Information

A variety of types of information may be encoded on the identifier tags of the present invention. For example, the identifier tags may be pre-encoded with a bar code strip, moieties which differentially absorb or emit light, magnetic or optical information, or any other uniquely identifiable and detectable mark. Thus, the means employed for encoding the identifier tag information is dictated by the means employed for detecting the encoded identification signal.

In a preferred embodiment, an identifier tag comprises a microchip that is imprinted with a unique radiofrequency "fingerprint" that is transmitted back to a detector when the identifier tag is pulsed with electromagnetic radiation. In this embodiment, a microchip is exposed to a beam comprising one or a plurality of radiofrequency bands. The microchip draws power from the beam. The microchip has an electronic fingerprint. Thus, microchips of different fingerprints will emit different signals when pulsed with electromagnetic radiation. Accordingly, each of a plurality of microchips can be pre-encoded with a unique "fingerprint" or identification label. The skilled artisan will appreciate that the number of microchips that can be imprinted with a unique identification label is virtually without limit. Such microchips are well known in the art, and are described in, for example, U.S. Pat. No. 5,148,404, entitled "Transponder and Method for the Production Thereof," issued on Sep. 15, 1992, to Texas Instruments, Inc., and which is incorporated herein by reference.

In another preferred embodiment an identifier tag comprises a microchip that can be imprinted with digital or sequential information at each reaction cycle in a series of reaction cycles. The identifier tag can be either pre-encoded with partial or complete identifier information or blank prior to synthesis of a labeled synthetic oligomer library. Concomitant with, or at a point close in-time to, each reaction cycle a new multi-digit identifying the transformation event a particular encodable substrate experienced is downloaded to the identifier tag such that a history of transformation events to which the identifier tag was subjected is recorded. Since the identifier information at each cycle is added sequentially to the microchip, each oligomer structure can be elucidated by detecting the sequential digital information contained in the identifier tag linked to that particular oligomer. Suitable digitally encodable substrates and encoding methods are well known or will be apparent to those skilled in the art, and are described in, for example U.S. Pat. No. 5,351,052, entitled "Transponder Systems for Automatic Identification Purposes," issued on Sep. 27, 1994 to Texas Instruments, Inc. and U.S. Pat. No. 5,252,962, entitled "System Monitoring Programmable Implantable Transponder," issued on Oct. 12, 1993, to Bio Medic Data Systems, Inc., each of which is incorporated herein by reference.

VII. Recovering and Decoding the Identifier Tag Information

When specific library members are isolated in a receptor screening experiment, the substrates can be segregated individually by a number of means, including: micromanipulation, magnetic attraction, sorting, or fluorescence activated cell sorting (FACS), although with respect to the present invention FACS is more accurately "fluorescence activated oligomer sorting." See Methods in Cell Biology, Vol. 33, Darzynkiewicz, Z. and Crissman, H. A. eds., Academic Press and Dangl and Herzenberg, J. Immunol. Methods 52: 1-14 (1982), both of which are incorporated herein by reference.

Once the desired substrates have been isolated, the identity of the identifier tag must be ascertained to obtain the structure of the oligomer on the substrate. The method of identification will depend on the type of identifier tag used to encode the library. For example, bar code identifier tags can be scanned with laser devices commonly employed in the art to read bar codes. Fluorescent identifier tags can be read by obtaining a fluorescence spectrum.

In preferred embodiments employing microchip identifier tags, the identifier tag information can be read using detection devices commonly employed in the art to scan and store identifier information from such tags. Typically such detectors can scan, display, transmit and store identifier information received from an identifier tag. Such detectors are well known in the art and are described, for example, in U.S. Pat. No. 5,262,772, entitled "Transponder Scanner" issued Nov. 16, 1993, to Bio Medic Data Systems, Inc., which is incorporated herein by reference. Such systems that read, display, transmit and store identifier information and that can be interfaced with a microcomputer are commercially available, including Bio Medic Data Systems models DAS-4004EM, DAS-40020A and DAS-4001.

In preferred embodiments, the detection system employed will be interfaced with a computer to automate identifier tag information storage. In even more preferred embodiments the detection equipment will be interfaced with a computer and automated sorting equipment.

VIII. Screening Receptors with Labeled Synthetic Oligomer Libraries

The labeled synthetic oligomer libraries of the present invention will have a wide variety of uses. By way of example and not limitation, labeled synthetic oligomer libraries can be used to identify peptide, nucleic acid, carbohydrate and/or other structures that bind to proteins, enzymes, antibodies, receptors and the like identify sequence-specific binding drugs identify epitopes recognized by antibodies evaluate a variety of drugs for clinical diagnostic applications identify materials that exhibit specific properties, such as, for example, ceramics identify elements comprising superconducting compositions combinations of the above and other uses that will be apparent to those skilled in the art.

Synthetic oligomers displayed on substrates can be screened for the ability to bind to a receptor. The receptor may be contacted with the library of synthetic oligomers, forming a bound member between a receptor and the oligomer capable of binding the receptor. The bound member may then be identified. As one example, the receptor may be an antibody.

The techniques for selection of individual substrates displaying ligands on their surface are analogous to FACS methods for cloning mammalian cells expressing cell surface antigens and receptors. Therefore, methods for selecting and sorting substrates will be readily apparent to those skilled in the art of cell sorting. For example, a receptor can be labelled with a fluorescent tag and then incubated with the mixture of substrates displaying the oligomers. After washing away un-bound and non-specifically bound receptors, one can then use FACS to sort the beads and to identify and isolate physically individual beads showing high fluorescence. Alternatively, if the physical size of the substrates permits, one can manually identify and sort the substrates showing high fluorescence.

Alternatively, the present invention can be used to generate libraries of soluble labeled oligomers, which can be used with a variety of screening methods. For instance, The substrates can be sorted and placed in individual compartments or wells, such as, for example, the wells of a 96-well microtitre plate. The oligomers are cleaved from the substrates and remained contained within the well along with the identifier tag. The library members may then be assayed in solution by a variety of techniques that will be readily apparent to those skilled in the art of immunology, one example of which is described below.

In one embodiment the bottom surface of each well is coated with the receptor. After addition of the binding buffer and a known ligand for that receptor that is fluorescently labelled, one effectively has a solution phase competitive assay for novel ligands of the receptor. The binding of the fluorescently labelled ligand to the receptor is estimated by confocal imaging of the monolayer of immobilized receptor. Wells showing decreased fluorescence on the receptor surface indicate that the released oligomer competes with the labelled ligand. The substrates in the wells showing competition are recovered, and the identifier tag decoded to reveal the sequence of the oligomer.

SYNTHESIS OF ONE-HUNDRED AMIDES

One hundred unique identifier tags containing Rink polymer are subdivided into ten groups of ten, and each group of ten is introduced into a separate 250 mL reaction vessel charged with 100 mL of methanol solvent. To each reaction vessel is then added 10 mL of a solution containing an aldehyde dissolved in methanol, a different aldehyde added to each reaction vessel.

The reaction is stirred for six hrs at room temperature, or until completion of the reaction. The reaction may be monitored using standard techniques for the monitoring of solid phase reactions. After completion of the reaction, the solvent and excess reagents are removed from each of the ten reaction vessels independently, and the polymer in each vessel washed three times with methanol and dichloromethane and allowed to dry using reduced pressure.

The unique identifier tags are then recorded by removing the contents of each reaction vessel and passing the unique identifier tag by a detector. The unique identifier tag for each oligomer is thus recorded and cross-referenced to the reaction vessel from which it was removed.

The unique identifier tags associated with the Rink polymer are then randomly recombined and again subdivided into ten groups of ten, and each group of ten placed into a different reaction vessels (250 mL) containing 100 mL dichloromethane. Each of the ten reaction vessels is charged with base, and ten unique acid chlorides are introduced into the reaction vessels, one acid chloride per vessel.

The reactions are allowed to proceed to completion. The reactions may be monitored using standard methods. After completion, the solvent is removed by filtration and the oligomer in each reaction vessel is washed independently with three washes each of methanol and dichloromethane. Each of the ten identifier tags is then removed from each vessel and passed by a detector to record their unique identification numbers.

Thus, each identification number is associated with a specific reagent utilized in the first step of the synthesis (an aldehyde) and the second step of the synthesis (an acid chloride), providing a unique "reaction histogram" for each of the one hundred unique identifier tags.

After completion of the reactions, the identifier tags associated with each polymer are separately deblocked using trifluoroacetic acid and introduced into a microtiter plate such that each well of the microtiter plate contains only a single polymer. After removal of solvent and evaporation to dryness, each well contains a unique structure.

The decoding of said structure can be accomplished by comparing the individual identifier tag with histogram for that particular tag. That is, the identifier tag will be associated with a specific structure for the aldehyde monomer input and a specific structure for the acid chloride monomer input. The structure of the polymer contained in each well is thus known unequivocally.

SYNTHESIS ON ELAMS™ OF FOUR PENTAPEPTIDES

A. Derivatization of ELAMS™

Four (or multiples thereof) ELAMS™ (Biomedic Data Systems), each having a unique identifier tag are washed with refluxing aqueous HNO3 for 20 min. The ELAMS™ are pelleted and washed with distilled water (5×) and methanol (3×) and dried at 125° C. for about 12 hours. The ELAMS™ are then vortexed in with a solution of 5% (v/v) aminopropyltriethoxysilane in acetone for ten hours, washed with acetone (2×), ethanol (5×) and methylene chloride (2×) and dried at 125° C. for 45 min.

The ELAMS™ are suspended in anhydrous DMF (1 mL) containing diisopropylethylamine (DIEA) (17 μL, 100 μmoles) and a solution of Fmoc-b-alanine pentafluorophenyl ester (200 mg, 420 μmoles, Peninsula Labs) in distilled water (1.5 mL) added. After treatment with shaking for about 12 hours the ELAMS™ can be collected and washed with DMF (3×) and methylene chloride (2×). ELAMS™ are treated with a solution of 10% acetic anhydride in DMF containing 0.05 mol of 4-dimethylaminopyridine to cap uncoupled aminopropyl groups and then washed with DMF (2×) and methylene chloride (2×). ELAMS™ are then vortexed with a solution of 20% piperidine in DMF to release the Fmoc protecting group. The Fmoc-piperidine adduct can be quantitated by monitoring the absorbance spectrum of the supernatant at 302 nm (ε302 =7800M -1 cm -1 ) to estimate the degree of substitution of amino groups per quantity of E. Finally, the ELAMS™ are washed with ethanol (5×) and methylene chloride (2×) and dried at 85° C. for about 12 hours.

B. Preparation of Boc-Glv-L-Phe-L-Leu-OH

Glycyl-L-phenylalanyl-L-leucine (552 mg, 1.5 mmol, Bachem) is dissolved in a solution containing distilled water (10 mL) and 1M NaOH (1.5 mL). The solution is cooled in an ice bath and treated with a solution of di-tert-butyl pyrocarbonate (337 mg, 1.5 mmol) in p-dioxane (12 mL). The solution is stirred for 4 hours at room temperature, after which the solution is concentrated to dryness in vacuo, the residue taken up in water (5 mL) and the pH adjusted to 2.5 by the addition of 1M KHSO4. The aqueous suspension is extracted with ethyl acetate (2×, 15 mL), the organic layer separated and dried over magnesium sulfate. After removal of the solvent in vacuo the residue can be titurated with hexane to yield Boc-Gly-L-Phe-L-Leu-OH as a white solid.

C. Preparation of Glv-L-Phe-L-Leu ELAMS™

Boc-Gly-L-Phe-L-Leu-OH (44 mg, 0.1 mmol), benzotriazol-1-yloxytris(dimethylamino)phosphonium hexaflurophsophate (14 mg, 0.104 mmol) are dissolved in dry DMF (1 mL). DIEA (20 μL, 0.115 mmol) is added and about 0.5-1.0 mL of this solution is transferred to a test tube containing amino derivatized ELAMS™. The tube is sealed, vortexed for about 3.5-4 hours and the ELAMS™ pelleted and washed with DMF (3×) and methylene chloride (2×). The ELAMS™ are then deprotected with a solution of 50% trifluoroacetic acid (TFA) in methylene chloride for 30 min., washed with methylene chloride (2×), ethanol (2×) and methylene chloride (2×), and dried at 55° C. for about 1 hour. The identifier tag from each ELAMS™ is detected and recorded.

D. Preparation of Gly-Glv-L-Phe-L-Leu (SEQ ID NO: 5) ELAMS™

Fmoc-glycine pentafluorophenyl ester (46 mg, 0.1 mmol) is dissolved in anhydrous DMF (1 mL) containing DIEA (17 μL, 0.1 mmol). About 0.5-1.0 mL of this solution is added to Gly-L-Phe-L-Leu ELAMS™ in a test tube and the tube vortexed for about 3 hours. The ELAMS™ are pelleted and washed with DMF (4×) and methylene chloride (2×). Deprotection can be effected by treatment with a solution of 20% piperidine in DMF for 30 min. The ELAMS™ are then washed with DMF (2×), ethanol (2×) and methylene chloride (2×) and dried at 60° C. for 4 hours. The identifier tag for each ELAMS™ is detected and recorded.

E. Preparation of L-Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 6) ELAMS™

Fmoc-L-proline pentafluorophenyl ester (50 mg, 0.1 mmol) is dissolved in anhydrous DMF (1 mL) containing DIEA (17 μL, 0.1 mmol). About 0.5-1.0 mL of this solution is added to Gly-L-Phe-L-Leu ELAMS™ in a test tube and the tube vortexed for about 3 hours. The ELAMS™ are pelleted and washed with DMF (4×) and methylene chloride (2×). Deprotection can be effected by treatment with a solution of 20% piperidine in DMF for 30 min. The ELAMS™ are then washed with DMF (2×), ethanol (2×) and methylene chloride (2×) and dried at 60° C. for 4 hours. The identifier tag for each ELAMS™ is detected and recorded.

F. Preparation of Tyr-Gly-Glv-L-Phe-L-Leu (SEQ ID NO: 1) and Tyr-Pro-Glv-L-Phe-L-Leu (SEQ ID NO: 2) ELAMS™

Fmoc-O-t-butyl-L-tyrosine pentafluorophenyl ester (63 g, 0.1 mmol) is dissolved in anhydrous DMF (1 mL) containing DIEA (17 μL, 0.1 mmol). About 0.5-1.0 mL of this solution is added to Gly-Gly-L-Phe-L-Leu (SEQ ID NO: 5) and Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 6) ELAMS™ in a test tube and the tube vortexed for about 3 hours. The ELAMS™ are pelleted and washed with DMF (4×) and methylene chloride (2×). Deprotection can be effected by treatment with a solution of 20% piperidine in DMF for 30 min, followed by treatment with a solution of 50% TFA in methylene chloride for 30 min. The ELAMS™ are then washed with DMF (2×), ethanol (2×) and methylene chloride (2×) and dried at 60° C. for 4 hours. The identifier tag for each ELAMS™ is detected and recorded.

G. Preparation of Pro-L-Pro-Glv-L-Phe-L-Leu (SEQ ID NO: 3) and Pro-Gly-Glv-L-Phe-L-Leu (SEQ ID NO: 4) ELAMS™

Fmoc-L-proline pentafluorophenyl ester (50 mg, 0.1 mmol) is dissolved in anhydrous DMF (1 mL) containing DIEA (17 μL, 0.1 mmol). About 0.5-1.0 mL of this solution is added to Gly-Gly-L-Phe-L-Leu (SEQ ID NO: 5) and Pro-Gly-L-Phe-L-Leu (SEQ ID NO: 6) ELAMS™ in a test tube and the tube vortexed for about 3 hours. The ELAMS™ are pelleted and washed with DMF (4×) and methylene chloride (2×). Deprotection can be effected by treatment with a solution of 20% piperidine in DMF for 30 min. The ELAMS™ are then washed with DMF (2×), ethanol (2×) and methylene chloride (2×) and dried at 60° C. for 4 hours. The identifier tag for each ELAMS™ is detected and recorded.

H. Selection of ELAMS™ Containing Peptide Ligands for Monoclonal Antibody 3E7

Monoclonal antibody 3E7 can be raised against opioid peptide beta-endorphin. The binding specificity of MAb 3E7 has been well characterized by solution assays with chemically synthesized peptides. The equilibrium binding constants (Kd) of the peptides considered here are as follows: YGGFL (SEQ ID NO: 1) is 6.6 nM and YPGFL (SEQ ID NO: 2), PPGFL (SEQ ID NO: 3), and PGGFL (SEQ ID NO: 4) are each >1 mM thus, only peptide YGGFL (SEQ ID NO: 1) shows appreciable affinity for the antibody.

A mixture of ELAMS™ containing either YGGFL (SEQ ID NO: 1), YPGFL (SEQ ID NO: 2), PGGFL (SEQ ID NO: 4), or PPGFL (SEQ ID NO: 3) are added to phosphate buffered saline (PBS) containing monoclonal antibody 3E7 that has been previously conjugated to colloidal superparamagnetic microbeads (Miltenyi Biotec, West Germany). After a 16 hour incubation at 4° C., beads which bind the 3E7 antibody can be selected using a high strength magnet. The identifier information of the selected beads is then analyzed with a model DAS-4001EM or DAS-4001 detector (Bio Medic Data Systems). Analysis will reveal that only ELAMS™ upon which YGGFL (SEQ ID NO: 1) was synthesized are selected by the 3E7 antibody.

Alternatively, the ELAMS™ can be incubated with 3E7 antibody that has been previously conjugated with a fluorophore such as fluorescein or rhodamine, and peptide-antibody binding detected with a fluorimeter or epifluorescence microscope using the appropriate wavelength of light

PARALLEL SYNTHESIS OF PEPTIDES ON ELAMS™

A. Derivatizing Amino ELAMS™ with a Linker

ELAMS™ containing amino groups and each having a unique identifier tag are prepared as described in Example II.A., above. The ELAMS™ are treated with a mixture of 4-Fmoc-aminobutyric acid N-hydroxysuccinimide ester (1 mmol), HBTU (1 mmol), HOBt (1 mmol) and DIEA (1 mmol) in 9:1 methylene chloride:DMF (10 mL). After vortex treatment for 30 minutes, the reaction mixture is diluted with DMF (10 mL), the ELAMS™ pelleted, and the supernatant decanted. The ELAMS™ are washed with DMF (3×10 mL). The coupling procedure may be repeated with fresh reagents and the ELAMS™ pelleted and washed as described above.

B. Parallel Synthesis of Peptides

The parallel assembly of linear oligomers is shown schematically in FIGS. 1 and 2. The general method for parallel assembly of polypeptides can be illustrated by way of specific example. Twenty linker-derivatized ELAMS™, each having a unique identifier tag (Example III.A.) are placed in a reaction vessel and the sequence GGFL (SEQ ID NO: 5) synthesized on each of the twenty ELAMS™ using standard Fmoc peptide synthesis reagents and chemistry as described in Atherton & Sheppard, Solid Phase Peptide Synthesis: A Practical Approach, IRL Press, Oxford, England (1989).

Following removal of the Fmoc groups by treatment with 30% piperidine in DMF for 60 min., the ELAMS™ are then apportioned into twenty reaction vessels, one ELAM™ per vessel. Each vessel is then charged with a solution containing an amino acid monomer (0.1M), HBTU (0.1M), HOBt (0.1M) and DIEA (0.1M) in 9:1 methylene chloride:DMF for 30 min., a different amino acid monomer per vessel. The coupling may be repeated with fresh reagents for a further 30 min. The ELAMS™ are then washed with DMF (3×) and then with acetonitrile (3×). The identifier tag information is detected and recorded for each ELAMS™ in each reaction vessel, along with the identity of the monomer added. Side chain protecting groups are removed using standard deprotection chemistry. See Atherton & Sheppard, Solid Phase Peptide Synthesis: A Practical Approach, IRL Press, Oxford, England (1989), and the library assayed as described in Example II.F.

For libraries with larger diversity, successive rounds of coupling and identifier tag scanning and recording can be performed.

PARALLEL SYNTHESIS OF OLIGONUCLEOTIDE OCTAMERS

A. Preparation of Hydroxyl ELAMS™

Sixteen ELAMS™, or multiples thereof, each having a unique identifier tag are cleaned in concentrated NaOH, followed by exhaustive rinsing in water. The ELAMS™ are derivatized for 2 hr with a solution of 10% (v/v) bis(2-hydroxyethyl)aminopropyltriethoxysilane (Petrarch Chemicals, Bristol, Pa.) in 95% ethanol, rinsed thoroughly with ethanol (2×) and ether (2×), dried in vacuo at 40° C., and heated at 100° C. for 15 min.

A synthesis linker, 4,4-dimethoxytrityl-hexaethyloxy-β-cyanoethyl phosphoramidite, can be prepared using 1,6-dihydroxyhexane as starting material according to the method of Beaucage and Caruthers, Atkinson and Smith, "Solid Phase Synthesis of Oligodeoxyribnucleotides by the Phosphite-Triester Method," in Gait, Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford, England (1984) using 2-cyanoethyl N,N-diisopropylchlorophosphoramidite (Sigma, St. Louis, Mo.) as the phosphitylating reagent.

C. Attachment of Synthesis Linker

Synthesis linkers can be attached to ELAMS™ by reacting hydroxylated ELAMS™ (described in Example IV.A., above) with 4,4-dimethoxytrityl-hexaethyloxy-β-cyanoethyl phosphoramidite using standard phosphoramidite chemistry as described in Gait, Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford, England (1984). Typical reaction conditions are 0.1M phosphoramidite, 0.25M tetrazole in anhydrous acetonitrile for 1-3 min. The ELAMS™ are rinsed with acetonitrile (3×).

Following coupling, any unreacted hydroxyl groups can be capped if desired. ELAMS™ are added to fresh capping solution which is prepared as follows: 3 volumes of a solution of 6.5% (w/v) 4-dimethylaminopyridine (DMAP) in anhydrous tetrahydrofuran (THF) are mixed with 1 volume of a solution of 40% (v/v) acetic anhydride in 2,6-lutidine. The reaction is allowed to proceed for 1-3 min., after which the ELAMS™ are rinsed with methylene chloride (2×) and acetonitrile (3×).

After capping, the phosphite triester bond is oxidized to a phosphotriester by treating the ELAMS™ with 0.1M iodine solution prepared by dissolving 2.6 g iodine in a mixture containing 80 mL THF, 20 mL 2,6-lutidine and 2 mL water for about 1 min. The ELAMS™ are rinsed with acetonitrile until the effluent is colorless.

The dimethoxytrityl groups protecting the hydroxyls can be removed by treatment with 2% (v/v) dichloroacetic acid (DCA) in methylene chloride for about 1 min. followed by rinsing (3×) with methylene chloride. The number of hydroxyl groups per ELAMS™ (i.e. the loading capacity) can be determined by taking the absorbance of the dimethoxy trityl cation effluent at 498 nm (ε498 =14,300M -1 cm -1 ).

D. Preparation of Fluoresceinylated Probe

A target probe of sequence 5'-GCGCGGGC-fluorescein can be prepared using 3'-Amine-ON™ control pore glass (CPG) (Clontech, Palo Alto, Calif.) and standard DNA synthesis reagents (Applied Biosystems, Foster City, Calif.). The 3'-amine can be labeled with fluorescein isothiocyanate to generate a 3'-fluorescein labeled oligomer according to the manufacturer's instructions supplied with 3'-Amine-ON™ CPG.

E. Parallel Synthesis of Octanucleotides

Target oligomer sequences, represented by the matrix 3'-CGC(A+T+C+G) 2 CCG can be prepared by synthesizing on each of sixteen linker-derivatized ELAMS™, each having a unique identifier tag (described in Example IIV.C.) polynucleotide sequence 3'-CGC using standard base-labile DNA synthesis reagents and chemistry (Applied Biosystems, Foster City, Calif.). Following the coupling cycle, the ELAMS™ are distributed into four reaction vessels, four ELAMS™ per vessel. A single nucleotide monomer (as the protected phosphoramidite) is coupled to the ELAMS™ in each reaction vessel using standard base-labile DNA synthesis reagents and chemistry, a different nucleotide monomer per vessel, and the capping, oxidation, and DMT removal steps completed.

The identifier tag from each ELAMS™ in each reaction vessel is detected and recorded using, for example, a model DAS-4001EM or model DAS-4001 scanner (Bio Medic Data Systems, Maywood, N.J.), along with the identity of each monomer added in each vessel. The ELAMS™ are then distributed by placing one ELAMS™ from each current reaction vessel into each of four new reaction vessels, and a second nucleotide monomer added as described above, a different monomer per vessel. The identifier information is detected and recorded for each ELAMS™ in each reaction vessel, along with the identify of the monomer added in each vessel.

The ELAMS™ are then pooled into a single reaction vessel and the sequence 3-CCG added to each ELAMS™ using standard DNA synthesis reagents and chemistry. The exocyclic amine protecting groups are removed by treatment with conc. ammonia according to the manufacturer's instruction for base-labile nucleotide phosphoramidites.

SEQUENCE SPECIFIC TARGET HYBRIDIZATION

The deprotected ELAMS™ are incubated with the fluoresceinylated probe under conditions conducive to sequence specific hybridization as described in Hames and Higgins, Nucleic Acid Hybridization: A Practical Approach, IRL Press, Oxford, England (1985). Following rinse cycles the ELAMS™ can be interrogated for hybridization using a fluorimeter or epifluorescence microscope (488-nm argon ion excitation). The ELAMS™ displaying the highest photon counts are isolated and the identifier tag scanned and compared to the reaction histogram for that particular identifier tag, revealing that the sequence 3'-CGCGCCCG was synthesized on the ELAM™ displaying the highest photon count.

While the invention of this patent application is disclosed by reference to specific examples, it is understood that the present invention can be applied to all chemistries that are amenable to combinatorial strategies and to all identifier tags that relate information to a detector when pulsed with electromagnetic radiation. Further, the present invention is intended to be applicable to all future developed solid phase and multi-component combinatorial array syntheses, and to all future identifier tags that relate information to a detector when pulsed with electromagnetic information.


MATERIALS AND METHODS

Monoalkyne solid support and bromophosphorodiamidites for click oligomers

Preparation of the monoalkyne solid support and bromophosphorodiamidites was carried out according to Lietard et al. ( 37). The only difference was that one more step was added to obtain the modified support in which a succinic group was incorporated in 1-Propargyl-2-[(4,4΄-dimethoxytrityl)oxymethyl]-2-methylpro-pane-1,3-diol and then the LCAA-CPG support was coupled.

DNA/RNA synthesis and purification

Oligonucleotides were chemically synthesized in an Applied Biosystems DNA/RNA synthesizer, using the cyanoethyl phosphoramidite chemistry or purchased from Future Synthesis. All nucleotides and hexaethylene glycol phosphoramidites with 2΄-O-tertbutyldimethylsilyl were purchased from Glen Research, Azco, Proligo. The stilbene diether linker was synthesized using a custom synthesis service. Before the deprotection step, the click oligomers were treated with LiN3 (300eq) in dimethylformamide (DMF) for 2 h at 65°C. RNA oligomers were cleaved from the solid support using ammonium hydroxide/ethanol (3:1 v/v) at 55°C for 16 h. Deprotection was carried out with triethylamine trihydroflouride at 55°C for 3 h followed by the n-butanol precipitation for 1 h at 4°C. Oligomers were desalted on illustra NAP-25 columns (GE Healthcare) and purified by gel electrophoresis in denaturing conditions. In the case of the DNA oligomer it was cleaved and deprotected by ammonium hydroxide at 55°C for 16 h, desalted and purified by the denaturing gel electrophoresis.

T4 RNA ligase 1 expression and purification

The plasmid containing T4 RNA ligase gene was a generous gift from Prof. Peter J. Unrau from Simon Fraser University in Canada ( 38). The T4 RNA ligase open reading frame was re-cloned into pMCSG9 vector (Midwest Center for Structural Genomics) containing an N-terminal His6 tag. The construct with pMCSG9 T4 RNA ligase gene was obtained by ligase-independent cloning ( 39). The identity of the clone was confirmed by sequencing. The enzyme was produced in Escherichia coli BL21 Magic cells in Luria-Bertani medium (Midwest Center for Structural Genomics). The expression was induced with 1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) and carried out for 16 h at 19°C. The cells were harvested by centrifugation (4000 g, 15 min, 4°C) and resuspended in the lysis buffer (50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 20 mM imidazole, 10 mM Na4P2O7, 5% glycerol, 1 mM tris(2-carboxyethyl)phosphine (TCEP) and a protease inhibitors cocktail). The cells were lysed by sonication: 2 s pulse followed by 10 s pause (25 min at 4°C). Cell debris was removed by centrifugation (15 000 g, 30 min, 4°C). The supernatant was mixed with 6 ml of Ni Sepharose High Performance (GE Healthcare) equilibrated with 50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 10 mM Na4P2O7 and 5% glycerol. Sepharose beads were washed with 50 ml of buffer I (50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 40 mM imidazole, 10 mM Na4P2O7, 5% glycerol, 1 mM TCEP) and 200 ml of buffer II (50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 40 mM imidazole, 5% glycerol, 1 mM TCEP). The protein was eluted with 2 × 10 ml of buffer III (50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 300 mM imidazole, 5% glycerol, 1 mM TCEP) and dialyzed into 50 mM Tris–HCl pH 8.0, 0.5 M NaCl, 20% glycerol and 1 mM TCEP and next into the storage buffer (20 mM HEPES–KOH pH 7.5, 50 mM KCl, 1 mM dithiothreitol (DTT) and 50% glycerol). The purity of the recombinant protein was assessed by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE).

Circularization of RNA oligomers using T4 RNA ligase 1

RNA oligomers (20–50 μM) were denatured in 95°C for 2 min in 50 mM KCl and slowly cooled to 25 or 37°C. The ligation reaction was carried using 0.25–0.40 mg/ml T4 RNA ligase 1 in buffer (50 mM Tris pH 7.5, 10 mM MgCl2 and 10 mM DTT) with 0.5 mM ATP at 37°C for 4 h. The solution was desalted using Bio-Gel polyacrylamide P-6 columns (Biorad) and resolved on 6–10% denaturing gel electrophoresis. Gels were stained using toluidine blue or SYBR Green II. For the UV melting measurements circularized oligomers were eluted by the crash-and-soak method, desalted and concentrated using centrifugal filters.

RNA dephosphorylation and 32 P-labeling

The dephosphorylation reaction of linear or circular RNA was performed using alkaline phosphatase (Thermo) at 37°C for 10 min in a buffer containing 100 mM Tris–HCl pH 8.0, 50 mM MgCl2, 1 M KCl, 0.2% Triton X-100 and 1 mg/ml Bovine serum albumin. The enzyme was deactivated by heating at 75°C for 10 min. The RNA was either ethanol precipitated or loaded directly on the gel.

RNA oligomers were 5΄-end labeled with [γ- 32 P] ATP (Hartmann-Analytic) using T4 Polynucleotide Kinase (Thermo). Prior to reaction the RNA was denatured for 5 min at 90°C and incubated on ice for 10 min. The reaction was carried out in a buffer containing 50 mM Tris–HCl pH 7.6, 10 mM MgCl2, 5 mM DTT, 0.1 mM spermidine supplemented with 1 μl of [γ- 32 P] ATP (0.02 mCi) and 10 U of enzyme. The reaction was incubated for 30 min at 37°C. Labeled RNAs were purified by denaturing gel electrophoresis and quantified using a scintillation counter.

Alkaline RNA hydrolysis

One volume of 5΄ 32 P-labeled RNA was mixed with five volumes of 10 mM MgCl2 in formamide and incubated at 100°C for 10 min. The reaction was stopped by adding one volume of loading buffer (formamide, 0.02% xylene cyanol) and frozen on dry ice. The reaction products were separated on denaturing gels and visualized using autoradiography with an intensifying screen on Fluor Imager FLA-5100 (FujiFilm).

UV melting of oligonucleotides

UV thermal melting studies were performed according to Pasternak and Wengel (2011) ( 40) on DU-640 spectrometer with a thermoprogrammer (Beckman). RNA oligomers were dissolved in a buffer containing 10 mM sodium chloride, 20 mM sodium cacodylate and 0.5 mM disodium ethylenediaminetetraacetate (EDTA), pH 7.0 or 5.2. Each oligomer was prepared in nine different concentrations in the range 10 −5 –10 −6 M. Concentrations of single-stranded oligomers were calculated from the high temperature (>80°C) absorbance and single strand extinction coefficients approximated by the nearest-neighbor model. The UV absorption versus temperature was measured at 260 nm at the heating rate of 1°C/min in the range of 20–90°C. The melting curves were analyzed and the thermodynamic parameters calculated using MeltWin 3.5.

Click reaction

Prior to reaction 1 nmol of DNA or RNA oligomer was denatured in water for 2 min at 95°C and placed on ice for 10 min. The reaction was carried out in 200 mM NaCl, 500μM Copper(II)-tris(benzyltriazolylmethyl)amine (Cu(II)-TBTA) complex and freshly prepared 5 mM ascorbic acid. The click reaction was incubated at 37°C for 4 h. The reaction was desalted using Bio-Gel polyacrylamide P-6 columns and analyzed on 10–15% denaturing polyacrylamide gels.

HpaII digestion

Linear or circular DNA oligomer (10 μg) was digested in a buffer containing 1× Tango buffer and 40U of HpaII enzyme at 37°C overnight. The enzyme was deactivated by heating in 65°C for 20 min. The reaction mixture was desalted using Bio-Gel polyacrylamide P-6 columns or precipitated using lithium chloride and five volumes of EtOH/acetone (1:1 v/v) mix. The reaction products were separated on 20% denaturing polyacrylamide gel and visualized by UV shadowing.


Methods

Genetic constructs and mutagenesis

pRSF-oRibo-Q1-oGST-CaM1TAG 19 , containing an orthogonal ribosome under an IPTG-inducible promoter and the protein of interest under a constitutively active promoter, and pKW1 21 , containing the orthogonal aaRS and tRNA, for amber suppression were a kind gift from the Chin Lab (MRC LMB, Cambridge, UK). The PR65 template was available as a thrombin cleavable GST-PR65-H6 fusion protein in a pRSETa backbone. All primers are listed in Table S2 (Supplementary Information). The correct length and sequence of all constructs was verified by Sanger sequencing and restriction digests.

For amber suppression, constructs were created by first introducing the TAG codon at the positions of D5/L588 and E277 into the GST-PR65-H6 fusion protein using Round-the-Horn site-directed mutagenesis (RTH-SDM) 24 . Different end-to-end attachment sites were trialled initially, but D5/L588 was the first clone to work and hence was carried forward. 100 μM primers were phosphorylated using polynucleotide kinase (ThermoFischer) and 2–3 mM ATP according to the manufacturers protocol. The enzyme was heat-inactivated at 85 °C for 10–15 min and phosphorylated primers were stored at −20 °C until used. PCRs were performed using these primers and Phusion High-Fidelity DNA polymerase (NEB). PCR products were gel-purified and 50–100 μg of DNA material was added to 1 μl Anza™ T4 DNA Ligase Master Mix (ThermoFischer) in a total volume of 4 μl, incubated for 10–20 min at room temperature and transformed into in-house produced, chemically-competent DH5α E. coli cells.

The resulting constructs were then transferred into pRSF-oRibo-Q1-oGST by using the GST- internal SwaI and post-H6 SpeI restriction sites and In-Fusion Cloning (Takara Bio). pRSF-oRibo-Q1-oGST was digested using SwaI and SpeI, while the PR65 insert was obtained by PCR with primers that had 15 bp overlap with these restriction sites and the vector backbone. Both vector and insert were gel purified and 1 μl of each was mixed with 0.5 μl 5X In-Fusion HD Enzyme Premix on ice. The reaction was incubated for 15 min at 50 °C in a pre-heated thermal cycler and placed back on ice immediately after. 2–4 μl of the ligation reaction were transformed into high efficiency DH5α E. coli (NEB).

N- and C-terminal ybbR-tags (DSLEFIASKLA) were introduced between thrombin cleavage site and M1, and A589 and the stop codon using RTH-SDM with primers bearing the tag sequence in the 5′ overhang.

Protein expression and purification

PR65 WT and the ybbR-tagged version (yPR65y) were transformed into chemically competent C41 E. coli (Kommander laboratory, MRC-LMB, Cambridge). Suspension cultures were grown at 37 °C in 2xYT media containing 50 μg/ml Ampicillin, shaking at 200 rpm until an OD600 of 0.6 to 0.8 was reached. Protein expression was induced with 250 μM isopropyl β-D-1-thiogalactopyranoside (IPTG, Generon) at 25 °C overnight. Cells were harvested by centrifugation at 4000 × g for 10 min at 4 °C, before re-suspending in lysis buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 2 mM DTT) supplemented with EDTA-free protease inhibitor cocktail (Sigma Aldrich), and DNase I (Sigma Aldrich). The cells were lysed by passing the suspension through an Emulsiflex-C5 (AVESTIN) at pressures between 10000 and 15000 psi. Soluble protein was separated from cell debris and other insoluble fractions by centrifugation at 35000 × g for 35 min at 4 °C. The soluble protein fraction was applied to glutathione resin (Amintra Affinity Resins, Expedeon) equilibrated in lysis buffer. Resin was incubated at 4 °C for 1–2 hours with rotation and cleaned using wash buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 2 mM DTT), followed by on-matrix cleavage of the protein using 50 units of bovine thrombin (Sigma) per litre of culture at 4 °C overnight. Cleaved protein was removed from the resin using wash buffer. All fractions containing protein were pooled, diluted such that the NaCl concentration was <50 mM and applied to a Mono Q 10/100 GL (GE Healthcare) equilibrated in 100 mM Tris-HCl pH 8.0, 2 mM EDTA, 0.5 g/l EGTA, 2 mM DTT. After washing the column, the protein was subsequently eluted using a 20 column volume salt gradient from 0 to 1 M NaCl. If necessary, MonoQ fractions containing the protein were concentrated before application to a HiLoad 26/600 Superdex 200 pg (GE Healthcare) equilibrated in PBS pH 7.4, 2 mM DTT. Fractions containing pure protein were pooled and concentrated using a Vivaspin® centrifugal concentrator

Expression and purification of amber suppression constructs was adapted from a protocol described by Sachdeva et al. 20 . GST-fusion proteins of PR655/588TAG and yPR65277TAG were expressed in electro-competent MDS42 ΔrecA E. coli cells, grown at 37 °C in 2xYT containing 25 μg/ml Kanamycin and 37.5 μg/ml Spectinomycin. Expression was induced at 37 °C for 5 hrs when OD600 = 0.5–0.6, using 1 mM IPTG and 2 mM of either N-ε-(Prop-2-ynyloxycarbonyl)-L-lysine (Iris Biotech GmbH), N-ε-((2-Azidoethoxy)carbonyl)-L-lysine (Iris Biotech GmbH) or N-ε-[[(2-methyl-2-cyclopropene-1-yl) methoxy] carbonyl]-L-lysine (Sirius Fine Chemicals), which were dissolved in 0.2 M NaOH, diluted 1:3 using 1 M HEPES pH 7.4 and adjusted to the pH of the cell culture. Proteins containing azides and cyclopropene derivatives were purified in buffers without reducing agent. All proteins were first purified by glutathione pull-down and thrombin cleavage at 4 °C as described above. The cleavage product was applied to 1 ml of Ni-NTA resin per litre of culture (Amintra Affinity Resins, Expedeon) or a 1 ml HisTrap Excel column (GE Healthcare) equilibrated in wash buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, optionally with 2 mM DTT). The resin was incubated at 4 °C for 1–2 hours with rotation and cleaned with wash buffer, whereas column-bound protein was washed with 20 column volumes of wash buffer. Bound protein was recovered using elution buffer (100 mM Tris-HCl pH 8.0, 2 mM EDTA, 300 mM imidazole, 2 mM DTT). After analysis of the elution fractions obtained from either method by SDS-PAGE, the elutions were concentrated and buffer exchanged into PBS pH 7.4 with 1 mM DTT (alkyne) or without DTT (azide, cyclopropene) using Zeba Spin Desalting Columns (ThermoFisher Scientific).

For the expression of Sfp-synthase please refer to the Supplementary Information. All proteins were flash-frozen in liquid N2 and stored at −80 °C.

Chemically modified DNA oligomers

DNA oligos for DNA-protein conjugations (Table S2) bearing 3′-end modifications with co-enzyme A or azide were acquired from Biomers and Integrated DNA Technologies, respectively. The modified oligos were resuspended in MilliQ H2O, aliquoted and stored at −20 °C (azide, amine) or −80 °C (CoA).

The protocol for chemical modification of oligos with DBCO and tetrazine functionalities was adapted from Nojima et al. 9 . DBCO-PEG4-NHS-ester (Sigma) and 6-methyl-tetrazine-PEG5-NHS-ester (Jena Bioscience) were conjugated to 3′-amine modified DNA oligos (Integrated DNA Technologies) in 50 μl Bicine-KOH pH 8.0 containing 100 μM amine and 5 mM NHS-ester. Due to its hydrophobicity, reactions containing DBCO were performed in 25% DMSO (Sigma) to ensure full solubility of the compound. The reaction was incubated at 37 °C for 2–3 hours on an orbital shaker and loaded onto an anion-exchange colum (1 ml DEAE FF, GE Healthcare) equilibrated in 50 mM Tris-HCl pH 7.4. Bound oligo was eluted in one step using 50 mM Tris-HCl pH 7.4, 1 M NaCl. The fractions containing oligo were isolated, split into 500 μl aliquots, combined with 1 ml ice cold, absolute ethanol and incubated at −80 °C for at least 1 hour. The precipitate was pelleted for 30 min by centrifugation at 0 °C and 20,000 × g. The supernatant was carefully aspirated and discarded. The pellets were washed using 1 ml of room temperature, 95% (v/v) ethanol and collected by renewed centrifugation for 10 min at 4 °C and 20,000 × g. The supernatant was discarded, pellets were air dried and then resuspended in MilliQ H2O to be stored at −20 °C.

Conjugating DNA and protein

For optimization purposes CuAAC was performed in 20 μl volumes of PBS with 5 μM of alkyne bearing protein which were reacted to 100 μM azide using a range of catalyst concentrations. Copper sulfate (CuSO4), sodium ascorbate (NaAsc) and THPTA were pre-mixed into a “click mix” (CM). The 100X CM as defined by Sachdeva et al. 20 contains 10 mM CuSO4, 25 mM NaAsc, 50 mM THPTA and was used in a final concentration of 1X. A 100X click mix with ten times the amount of NaAsc (CM-A) contains 10 mM CuSO4, 250 mM NaAsc and 50 mM THPTA 25 . Samples were incubated at 25 °C for 0.5–2 hrs, or overnight. To stop the reaction the sample was either buffer exchanged or mixed directly with SDS-PAGE sample buffer. For proof of concept experiments and optimization, alkyne-bearing PR65 (alkPR65alk) was reacted with 5-FAM-azide (Lumiprobe). To produce protein-DNA chimeras, azide-functionalised DNA oligomers (Integrated DNA Technologies) were reacted with alkPR65alk. To test the functionality of the purchased azide-oligo, 20 μM of oligo was labelled with 20 μM 5-FAM alkyne dye (Lumiprobe) and increasing amounts of CM-A, sampling final CuSO4 concentration of 20 μM to 1 mM. These optimized conditions were then used to react 20 μM of protein with 100 μM azide oligo in a 20 μl volume using 10X CM-A.

SPAAC and IED-DA trial reactions were performed in 10 and 20 μl volumes of PBS containing 5 μM proteins and 10–20 μM modified oligo respectively. Control reactions for azide-bearing proteins were performed using TAMRA-DBCO (Jena Bioscience). Reaction mixtures were incubated for varying durations (0.5 hours to overnight) at room temperature in an orbital shaker.

For Sfp-synthase mediated DNA-protein conjugation, the reaction conditions as described by Yin et al. 7 (100 μl of 50 mM HEPES pH 7.5, 10 mM MgCl2, 5 μM ybbR-tagged protein, 5 μM biotin-CoA and 0.1 μM Sfp-enzyme) did not yield detectable DNA-protein conjugation. Various conditions were screened and it was found that the Sfp concentration was the limiting factor. When including the enzyme to at least an equal stochiometric amount as that of the ybbR-tag, reactions yielded the desired product. When combining SPAAC and Sfp-mediated attachments, 5 μM of PR65 constructs containing one ybbR-tag and one azide functionality were reacted to 10 μM of each CoA-oligo and DBCO-oligo in 10 μl of 50 mM HEPES pH 7.5, 10 mM MgCl2. Control reactions were performed by omitting one oligo at a time.

Reaction products of oligo-dye couplings were analysed by electrophoresis using 1% unstained agarose gels. Protein-dye and protein-oligo reactions were analysed by SDS-PAGE. Before polyacrylamide gels were stained with Coomassie Blue, fluorescent bands were imaged under UV using a trans-illuminator (UVP, LLC).

Successful reactions were scaled up to 50 μl, containing 10 μM and a 1- or 2-fold excess (depending on availability) of the modified oligo with respect to the number of ybbR-tags and/or UAAs. After over-night incubation, these reactions were purified by size exclusion chromatography using either a Superdex 200 10/300 GL (GE Healthcare) or a YMC-Pack Diol-300 (Yamamura Chemical Research). Fractions were analysed by SDS-PAGE and those containing the majority of protein conjugated to two DNA oligos were hybridized to DNA handles and analysed by agarose gel electrophoresis.

Equilibrium denaturations

WT and ybbR-tagged PR65 was buffer exchanged into 50 mM MES pH 6.5, 1 mM DTT. Samples of a total volume of 150 μl were prepared in black 96-well plates (Corning, low-binding) with urea gradients of 0 M to 8 M. The final protein concentrations were approximately 1 μM. Samples were incubated on an orbital shaker at 25 °C for 2 h. Tryptophans were excited at 295± nm and the fluorescence was monitored at 340 ± 10 nm using a CLARIOStar microplate reader (BMG Labtech). The data from 4 reads were averaged, then normalised and fitted to a three-state equation:

where FN and FU are the fluorescence of the folded and denatured states, respectively, and can be described by