WebMCSWebMCS is a tool that can be used to detect MCSs, or Multi-species Conserved Sequences, from multiple sequence alignments provided by you. The details of this method are provided in a manuscript by E.H. Margulies, M. Blanchette, NISC Comparative Sequencing Program, D. Haussler, and E.D. Green "Identification and Characterization of Multi-Species Conserved Sequences" Genome Research (13:2507-2518). Submitting Data for MCS AnalysisSubmit Data for MCS Analysis. Or first read a brief overview of the submission requirements: A wide variety of changes have been made to the WebMCS package. The input now takes either individual FASTA-formatted sequences or a multiple alignment. Phylogenetic trees can now be inputed by the user if the species in the query are not in the default list of species. The MCS output is now formatted for graphical viewing, and for automatice upload into the UCSC Genome Browser for further analysis. Read below for more details on the new functionality. InputAt a minimum, you must provide WebMCS with a multiple sequence alignment or multiple FASTA formatted sequences and your email address. WebMCS accepts MultiPipMaker alignments, Multiple FASTA alignments, and MAF alignments. WebMCS also accepts a collection of FASTA sequences, which it will align using TBA In addition, If you also submit an annotation file and matching UCSC coordinates, WebMCS will determine the amount of overlap with annotated coding regions, and UTRs. Currently WebMCS accepts exon information in the GTF annotations format or extracts this information from the UCSC Genome Browser using valid genome coordinates. Phylogenetic tree information is computed used a
OutputWebMCS provides the results of detected MCSs in the form of UCSC Custom Tracks, as well as a number of ancillary files including a summary of detected MCSs, the computed conservation scores, identified genomic features, and others. The Custom Tracks file allows you to display your results within the UCSC Genome Browser.
OverviewThe multiple alignment is generated using TBA . This outputs a MAF formatted mulitple alignment which is converted to MFA format. The pairwise alignments are stored in a local directory. WebMCS uses uses a binomial-based algorithm to determine conservation scores for each base position in the reference sequence. The conservation score is normalized with a percentile score. This Conservation Score is calculated in a fashion that weights the relative contribution of each species' sequence by accounting for its baseline neutral substitution rate (relative to the human reference sequence). Using this weighting scheme, conserved sequences from more diverged species make a greater relative contribution to the Conservation Score than those from less diverged species. WebMCS currently measures baseline neutral substitution rates at 4-fold degenerate positions. These are the third position of certain codons that can be any base and still code for the same amino acid. The normalized phylogenetic scores and the regions determined to be MCS are inputted into a wiggle file that can be viewed in the UCSC Genome Browser. MCSs are defined as a segment of contiguous sequence where each base exceeds a defined conservation score threshold. The setting for WebMCS is to select a threshold such that MCSs include the users defined percentile of the reference sequence. Comments, suggestions and problems to Elliott Margulies |
|
|
|