Day 12 Evening Lecture Notes
Bart Slatko, New England Biolabs
June 17, 2004
The LINE1 repeat is a "molecular parasite" that reproduces itself. Only 2% of DNA encodes proteins. RNA splicing eliminates intron loops to turn pre-mRNA into mRNA. "Alternative splicing" can give rise to different mRNAs. Commonly alternative splicing hooks different modular domains together in different configurations.
In humans a gene is about 1300 nt on average with 9 exons. 2/3 of genes have "splice variants," a higher fraction than in other organisms. Humans and other mammals have more protein diversity from the same domains than simpler organisms. 40% of human genes have an unknown function. A large fraction of all identified genes are the housekeeping ones. The genes unique to humans primarily are for immunology, neural development, blood clotting, transcription factors and regulatory proteins. The human proteome has perhaps 105 proteins.
In humans the Y chromosome mutation rate is twice that of X. The human founder population is small, perhaps about 1000 individuals. Humans have about 1.2x106 single-nucleotide polymorphisms (SNPs). The worldwide SNP consortium studies them. SNPs have many us uses, among them:
Since people are different at only the 1 ppt level, any sequencing technology must be very accurate. Right now the cost per person is about $1.5x106. Possible methods include:
The players in nanoPCR are Amersham/Pharmacia/GE with it 384-capillary sequencers; 454 Corporation, a spin-off of Curagen, which has picoliter plates with thousands of wells/plate; and Visigen which makes novel polymerases and dyes.
The players are Selexa Total Genotyping, Perlegen and Affymetrix.
Research efforts are ongoing at US Genomics and Genovoxx. Optical mapping is also under consideration.
Major efforts are underway at UCSC (where 50 bases can be read at one time) and Lynx MPSS (massively parallel signature sequencing). The leading candidate is Polony, a Harvard spin-off which is using "microPCR colonies."