Up Previous Next

Day 12 Afternoon Lecture Notes

Steve Williams, Smith College

June 17, 2004

Cloning a protein whose sequence is unknown

How do we clone a protein when we don't know the sequence and the antibody response to it in rabbit fails? Commercial vendors do amino acid sequencing of the ends of proteins. Typically they can get 12 amino acids at the amino terminus of the protein and 12 internally from a tryptic digest. A primer can be designed via reverse-lookup in the genetic codon table.

Consider for example a protein amino terminus with the sequence lys-phe-pro-tyr-his-ile-met. Lys can be coded for with 2 different sequences, phe with two, pro with 4, tyr with 2, his with 2, ile with 3 and met with 1 for a total of 192 distinct possible starting sequences. If we knew the correct sequence, we could make an oligo probe. For about $100, vendors will prepare all 192 in a "mixed-synthesis oligo" product that can be used to screen a cDNA library. (You wouldn't want to use mixed-synthesis oligos to screen a genomic library as there would be too many false positives.) Of the 192 oligos, more than one may well bind.

Now take 7 amino acids from the tryptic digest and hybridized with mixed-synthesis oligos again on another lift from the same plate. Anything that hybridizes with both mixed-synthesis oligos is likely to be the protein of interest. Of course the positive clones may only contain part of the gene.

There is a protein BLAST similar to the program for genomes. Gene expression products should be checked against this database. It's very difficult to sequence a full-length protein even with mass-spec or 2D gels. Most sequences in the protein database come from the genome. There's also a BlASTX program that find proteins which match a section of genomic DNA.

Screening a cDNA library

Note that only the DNA probe works with a genomic library, which is why cDNA libraries are often made first.

Two common methods of studying gene expression and are reverse-transcriptase PCR and quantitative PCR. Another possibility is primer extension.

The LINE1 repeat is a "molecular parasite" that reproduces itself. Only 2% of DNA encodes proteins. RNA splicing eliminates intron loops to turn pre-mRNA into mRNA. "Alternative splicing" can give rise to different mRNAs. Commonly alternative splicing hooks different modular domains together in different configurations.

In humans a gene is about 1300 nt on average with 9 exons. 2/3 of genes have "splice variants," a higher fraction than in other organisms. Humans and other mammals have more protein diversity from the same domains than simpler organisms. 40% of human genes have an unknown function. A large fraction of all identified genes are the housekeeping ones. The genes unique to humans primarily are for immunology, neural development, blood clotting, transcription factors and regulatory proteins. The human proteome has perhaps 105 proteins.

Up Previous Next