| Home | About ERIC | Events | Links | Genome Tools | Genomes | Publications | Training | Account |
ERIC Newsletter
In this issue...Comparative Genomics Using Mauve 2.0Nicole Perna, Ph.D.Nicole Perna is head of the Genome Evolution Laboratory at the University of Wisconsin-Madison and one of the three co-investigators for the ERIC-BRC project. Multiple genome alignments are now available for each species of enterobacteria represented in ERIC-BRC. These alignments were constructed using a newly released version of a progressive alignment tool that dramatically improves alignment in regions conserved among subsets of genomes, a particularly important feature for recognition of genomic islands. Coupled with improved visualization and navigational tools, Mauve 2.0 provides a powerful new mechanism for comparative genomics of enterobacteria. You can access the software and alignments through ERIC-BRC to use in your own research and to browse the ERIC-ASAP database from a comparative perspective. These pre-computed genome alignments are currently available:
New Functionality in Mauve 2.0
Figure 1. Partial screenshot of the ERIC-BRC Mauve 2.0 six E. coli alignment. This region shows the junction between a segment conserved among the two O157:H7 genomes, the well-characterized LEE pathogenicity island (blue), and one found in all six aligned E. coli genomes (pink). The popup menu showing links to ERIC-ASAP and NCBI appears when users click on an annotated gene (white blocks). Mauve 2.0 also facilitates visualization of these important lineage-specific regions by providing an option to colorize an alignment according to "multiplicity" to show the distribution of homologous segments across genomes. The original color scheme gave a different color to each collinear segment, to facilitate visualizing global genome rearrangements. The new scheme makes it easy to see local transitions between shared and lineage-specific segments. Users can zoom all the way in to the nucleotide sequence alignment, or all the way out to see large-scale events like inversions. (See Figure 2 below). Mauve 2.0 also includes an integrated search function allowing users to query ERIC's annotations and jump to the corresponding region of the alignment.
Figure 2. Two views of the same region of the ERIC-BRC Mauve 2.0 alignment of 6 E. coli genomes. The visualization on the left uses the default color scheme based on homologous segments. For example, the blue regions are collinear blocks that contain homologous sequence. Importantly, these collinear blocks can include islands unique to a single genome or collinear islands common to a subset of genomes. The visualization of the same aligned region shown on the right is colorized by multiplicity. Here, pink (mauve) blocks indicate that the region is conserved across all six genomes. Other colors mark regions found in only a subset of genomes. Accessing Mauve 2.0 Alignments through ERIC-BRC
Figure 3: Mauve 2.0 Alignment.This alignment includes six E. coli genomes - two O157:H7 genomes (top two), two uropathogens (middle two) and two K-12 strains (bottom two). Each genome is represented as a tier with colored bars showing the level of sequence similarity on top and annotated genes below. The rRNA-encoding genes (red) are in the middle of a region homologous across all genomes. It is easy to distinguish these "backbone" or conserved regions (pink) in the display colored by multiplicity. Other colors mark regions found in a subset of genomes. The vertical bars show the cursor position and corresponding location in other genomes (one example circled in red). Here, the cursor marks the resumption of homology across all genomes (pink) following lineage-specific segments (various colors). The blue region found in both O157:H7 genomes correspond to a cryptic prophage. The orange segment encodes unknown proteins common to both K-12 strains. Each uropathogen genome has a (distinct) prophage in this area. At any time, users can re-center the display, visually synchronizing the genomes onto homologous points in the other genomes. This interactive aspect of the display greatly facilitates examination of areas flanked by homologous sequence, but different in the middle, such as this hotspot for phage insertions. Building Your Own Mauve Alignments Training
ERIC-BRC Yersinia Genome UpdatesBradley Anderson, Ph.D.Bradley Anderson is a genome annotator and curator on the ERIC-BRC Team at the University of Wisconsin-Madison. Yersinia genomics emerged in 2002 with the publication of the CO92 (2) and KIM (3) Yersinia pestis genomes. Since then, complete and draft genome sequences for additional Yersinia strains and species have been generated and annotated by diverse groups. ERIC-BRC currently contains genome data from seven Y. pestis and two Y. pseudotuberculosis strains. The styles and content of the original GenBank deposits for these genomes vary considerably, and the oldest genome annotations are now out of date. In the interest of creating a fresh standard for comparative annotation of Yersinia genomes, ERIC-BRC selected Y. pestis CO92 as a focal point for a substantial annotation update. Over the next few months, these updates will be propagated across orthologous genes from all Yersinia genomes, and relayed to NCBI for integration. In the interim, we invite you to access the CO92 updates already available through ERIC-BRC. In brief, this update reflects several synergistic efforts. ERIC-BRC curators reviewed Yersinia primary literature published since 2002, incorporating new names and product descriptions with links to the corresponding papers in PubMed. Other substantial updates derived from comparisons with the 2006 re-annotation of the E. coli K-12 genome (4). Database cross-references were added to various other resources, including UniProt. Insertion Sequence (IS) element and pseudogene boundaries and annotations were also revised. In total, the original deposit contained approximately 41,000 annotation records, of which just over 5,000 were removed or restructured by curators, who also added over 21,000 additional annotations. A more complete description of these updates will appear as a chapter in an issue of Advances in Experimental Medicine and Biology(5)., detailing the proceedings of the October 2006 ASM Yersinia conference. Of course, keeping annotations current is an ongoing task and we are seeking your assistance. If you publish a paper that contains information that should be reflected in one or more ERIC-BRC genome annotations, there are several things you can do ensure that updates happen in a timely way. ERIC-BRC is designed for direct community input, and we are enthusiastic about signing you up and training you to make annotation updates in the ERIC-BRC database yourself. Alternately, we urge you to communicate your findings to our designated curators. Contact us to pursue either option at info@ericbrc.org.
Annotation of the EPEC plasmid pMAR7 in ERIC-ASAP - A Model for Collaboration in Genomics ProjectsValerie Burland, Ph.D. Project Background Role of pMAR7 in Virulence Features Specific to pMAR7
Figure 4. pMAR7 plasmid schematic. Two copies of Insertion Sequence (IS) element ISEc13 flank the pMAR7 tra region to create a transposon-like structure that could potentially mobilize the tra genes, leading to their deletion by recombination between the homologous elements. Insertion sequences (detected using IS Finder) accounted for >18% of the plasmid sequence, but the tra region is IS-free.
Microarray Analysis in ERIC with mAdbJohn Greene, Ph.D.John Greene is the Principal Investigator for mAdb at SRA International, and has spent a decade as a bioinformatics scientist - over five of which were spent training scientists on the mAdb system at NIH. Recently, we added the mAdb microarray database and analysis system as a component of ERIC. mAdb was developed by NIH's Center for Information Technology, with assistance from ERIC's prime contractor SRA International, for the intramural program of the National Cancer Institute. In the nearly eight years of that project, mAdb has proven scalable and reliable enough to handle over 67,000 microarray experiments and has over 1,500 users at the NIH campus, as well as their collaborators world-wide. mAdb is a completely web-based system - all you need is a browser and a good Internet connection. Unlike GEO or Array Express, mAdb is not just a MIAME-compliant repository for array data. A wide variety of analysis tools are incorporated into the system as well as a database, and it is possible to share data with collaborators through mAdb. mAdb can handle Affymetrix data as well as spotted arrays, and can use the quantitation and composite image files from a number of microarray scanners. The central concept for mAdb is that of creating filtered, reusable datasets for analyzing microarray data. Once the raw data is processed and placed in ERIC's relational database, a user can filter the data for quality, using a variety of quality filters for spot size, signal/background ratio, excluding those spots marked as Bad or Not Found by using the scanner software, as well as a number of other quantitative metrics. Normalization can be done either on the raw data or only on those spots which pass the spot quality filters set by the user. This creates a parent filtered dataset, which can then be filtered in other ways (by expression ratios, by genes (rows), or by arrays (columns)), or used directly in the analysis tools. mAdb's flexibility allows you to try multiple quality filter settings to create multiple parent datasets if you choose, since there are few absolutes to date in microarray analysis. mAdb's main display page for filtered datasets is highly customizable, allowing you to show only the data you need. Spot images can be viewed for human quality control (so you can spot a hair or smudge on a spot that the quantitation might not directly reveal). You can view Gene Ontology terms, adjust the color contrast, and decide whether to store a dataset transiently (for 24 hours), temporarily (30 days since last access), or permanently. Online help is provided for each tool by clicking on the bee image shown on each page The analysis tools allow hierarchical, K-means, and self-organizing map (Kohonen) clustering of the data by a number of metrics and linkage methods, as well as other related visualization techniques such as scatter plotting, Principal Components Analysis, and Multidimensional Scaling. One can partition an experiment into subset groups that can be compared against each other by a number of techniques, including averaging the groups; producing the mean, median, and standard deviation for each gene by group; or comparing the groups by statistical tests such as t-tests or Wilcoxon ranking for two groups, or by ANOVA or Kruskal-Wallace method for multiple groups. You can perform Boolean comparisons between two or three datasets - and create yet another subset from the desired results. Finally, you can perform PAM (Prediction Analysis for Microarrays) which allows class prediction or SAM (Significance Analysis of Microarrays), which is used to identify differentially expressed genes in single-, two-, or multiple-class comparisons, and to estimate the False Discovery Rate (FDR). Although this may seem complex, once you grasp the idea of creating one or a few filtered datasets, and you understand which tools are available that pertain to your experimental design, mAdb is quite easy to use. Each dataset has a history associated with it, so you can see how it was derived. Also, at any stage in an analysis, if you derive a filtered dataset you wish to work with extensively, you can save it as a new dataset, in effect making the child set into a new parent dataset. We will be offering both online and in person training on the mAdb component of ERIC, and there are a set of slides posted on the ERIC portal to assist you in teaching yourself. We hope your microarray research in enteropathogens will benefit from the use of this flexible and powerful microarray analysis system.
Literature Cited in this Issue
| ||
If you would like to be added to the mailing list, please send us an email. | ||