Sung-Hou Kim, University of California, Berkeley,"Demography and Phylogeny of Organisms based on Whole Genome Sequences
Feb 9 2010, 11:00 am
Distinguished Lecture Series Guest Speaker:
Professor of Chemistry, University of California, Berkeley
Faculty Senior Scientist, Physical Biosciences Division, Lawrence Berkeley National Laboratory
Date & Time:Tuesday February 09, 2010
The current understanding of all organisms including mammals is primarily "gene centric" As a result, comparisons among organisms have been gene based, and highly conserved genes are preferentially compared to infer groupings (demography) and evolutionary relationships (phylogeny) among organisms. However, the coding (coding for proteins, ribosomal RNAs, transfer RNAs, and other functional RNAs) portions of genomes vary tremendously from close to 100% for prokaryotes to as little as 1-3% for mammals, and highly conserved genes account for only a very small fraction of the genes. As for the non-coding sequences (the other 99% in mammalian genomes), much of their function is unknown, yet much of this portion is indeed transcribed. Recently, the ENCODE project showed that at least 93% of analyzed human genome nucleotides were transcribed into RNA when all cell types were considered. Similarly, transcriptional analysis of human chromosomes demonstrated that transcripts originating from the non-genic regions comprise the largest fraction of the transcriptional output of the human genome. It is, thus, debatable whether species demography and phylogeny derived from a small alignable sub-fraction of the whole genome are reliable. Whole genome sequences of over 1000 organisms are available due to the dramatic improvements of sequencing technology during the last two decades. It is expected that whole genome sequences of most representative organisms will be known within one or two decades. However, the tools to compare whole genomes/proteomes have not yet been fully developed. Inspired by text comparison methods, we have developed an alignment-free method using Feature Frequency Profile (FFP) of each whole genome/proteome. The method allows us to group similar organisms and discover evolutionary relationships among the groups of organisms using the whole genome/proteome information rather than those of a few selected genes.