FlyBase .. Aberrations .. Anatomy .. BLAST .. Genes .. Annotation/Sequences .. Gene Products .. Maps .. People .. References .. Stocks .. Transgenes/Transposons .|. Help .. Searches .. News .. Site

May 10, 2006

 

Dear Colleagues,

 

We are writing to bring you up to datewith a status report of the comparative Drosophila genome sequencing and analysis plans. 

 

Sequencing and Assembly

The finalization of assembly freezes was delayed by the opportunity to improve several of the initial assemblies using elements of alternative assemblies.  This "reconciliation" process was carried out by a group at the University of Maryland (Jim Yorke, PI), starting with Arachne, Celera and / or PCAP assemblies.  The results of reconciliation are now complete.  The set of final reference assemblies, selected by the sequencing centers as the assemblies that will be submitted to GenBank are listed below.  The assemblies are all available through the AAA site and BLAST access is provided through FlyBase (see "Accessing Drosophila Genome Resources" section below).

 

 

Dros. species

Sequencing & Assembly Status

Seq. Center

virilis

U. MD. Reconciled Arachne/Celera Assembly

Agencourt

ananassae

U. MD. Reconciled Arachne/Celera Assembly

Agencourt

mojavensis

U. MD. Reconciled Arachne/Celera Assembly

Agencourt

erecta

U. MD. Reconciled Arachne/Celera Assembly

Agencourt

grimshawi

U. MD. Reconciled Arachne/Celera Assembly

Agencourt

willistoni

U. MD. Reconciled Arachne/Celera Assembly

JCVI

persimilis

Arachne Assembly

Broad Inst.

sechellia

Arachne Assembly

Broad Inst.

yakuba

PCAP Assembly

Wash U.

simulans

Mosaic PCAP Assembly

Wash U.

 

Most species have been sequenced to deep WGS coverage levels.  The persimilis and sechellia projects have been sequenced to low WGS coverage (~3-4X) with the core assemblies derived independently and then enhanced by synteny to related species.  For simulans, there is 2.8X coverage of one strain (w501) and 1X coverage of 6 other strains.  The mosaic simulans assembly layers reads from these other strains onto the core w501 assembly.

 

For Drosophila melanogaster, the finished Release_4 arms (BDGP) and draft Release_3.2b heterochromatin (DHGP) will be used as the reference assembly.  For Drosophila pseudoobscura, the Release_2 Atlas Assembly (Baylor) will be the reference assembly.

 

Other Assemblies

While it is necessary to know which assembly will be treated as reference with regard to the eventual submission of annotated reference genomes to GenBank, it may be worthwhile for annotation groups to compare predictions that emerge from alternative assemblies.  For this reason, some additional assemblies are posted on AAA.

 

Timetables for Submissions of Annotation Sets

The goal is to establish a consensus annotation set for each genome.  This will occur in two phases:

á        By April 30, we ask all annotation groups to submit their annotation sets in GFF3 format to AAA, based on the final frozen assemblies.  Thom Kaufman has organized a workshop on Saturday evening, April 1, 2006 (9:30-11:30pm) at the Drosophila Research Conference in Houston to discuss the status of the analysis of the assemblies, the production of the annotation sets and the plans for publication.

á        By mid-June, consensus annotation sets that can be used for downstream analysis will be produced.  Early in May, we hope to convene a meeting of the annotation groups to discuss the status of the annotations and the process by which consensus sets will be produced.  We are exploring holding this meeting during the Cold Spring Harbor genome conference.

 

Downstream Analyses & Initial Publications

If we achieve these targets, our publication goal will be to submit the core manuscripts describing the basic assemblies, annotation sets and overall comparative descriptions by September 1, 2006.  So as to not constrain the research of individual laboratories, we propose to coordinate downstream analyses only insofar as we will act as liaisons with one or more journals to coordinate publication of the manuscripts together at an agreed-upon date, and by providing information on who is planning to do what analyses.  It will be up to the individual participating groups to make sure that they have completed their analyses and submitted their manuscript on time for peer review and publication. The editors of Nature and Nature Genetics have expressed a strong interest in publishing the results (including a main paper summarizing the sequencing and major findings and a collection of papers going into more depth on various aspects of the analysis).  Genome Research and Genetics would also be interested in publishing some of the in-depth analyses in an issue timed to come out at about the same time as the main paper.

 

Mapping Supercontigs onto the Chromosome Arms

There are two approaches being undertaken to create assemblies that approximate the extent of the euchromatin of the chromosome arms of each species. 

 

Using synteny to build arm-sized sequence maps: Bill GelbartŐs group (gelbart@morgan.harvard.edu) is evaluating the feasibility of aligning supercontigs into chromosome arm-sized units (ultracontigs) using syntenic information.

 

Association of sequence maps with genetic maps:  Sequence tagged genetic markers (e.g., recombinationally mapped cloned genes, microsatellite markers, SNPs) will be used to associate the supercontigs and/or ultracontigs with the linkage map of each species.  Polytene in situ hybridization using markers from anchor points on the superscaffolds and//or ultracontigs will be used to associate the sequence maps and the cytogenetic map of each chromosome arm.  Thom Kaufman (kaufman@bio.indiana.edu), Bryant McAllister (bryant-mcallister@uiowa.edu) and Teri Markow (tmarkow@public.arl.arizona.edu ) have organized this effort and identified people to take the lead on organizing their species community to establish the map associations for each species:

o       melanogaster species group (simulans, yakuba, sechellia and erecta): Michael Ashburner (ma11@gen.cam.ac.uk) and Thom Kaufman (kaufman@bio.indiana.edu)

o       ananassae: Muneo Matsuda (matsudam@kyorin-u.ac.jp) and Kiyohito Yoshida (majin@ees.hokudai.ac.jp)

o       pseudoobscura: Steve Schaeffer (swschaeffer@psu.edu)

o       persimilis: Mohamed Noor (noor@duke.edu)

o       willistoni: Claudia Rohde (claudiarohde@yahoo.com)

o       virilis: Bryant McAllister (bryant-mcallister@uiowa.edu) and Jorge Vieira (jbvieira@ibmc.up.pt)

o       mojavensis: Teri Markow (tmarkow@public.arl.arizona.edu)

o       grimshawi: Patrick O'Grady (pogrady@nature.berkeley.edu)

 

Whole Genome Alignments

Our discussions with alignment groups led us to conclude that it does not make sense to strive for a single set of DNA based alignments because they all differ somewhat, and people have their own preferences about which ones are most useful for their particular downstream analyses.  We would, however, like to make sure that we end up with alignments of a similar quality as those being produced for the human ENCODE regions.  As it stands now, it looks as though we will end up with four different sets of alignments: MERCATOR/MAVID (Pachter), Multiz (UCSC), LAGAN (Sidow/Batzoglou) and TBA (Webb Miller/Karro).

 

While we do not wish to constrain any other groups work in any other area of research on these Drosophila genomes, we are happy to invite others to contribute work that might be appropriate for the main paper on the assembly, annotation and analysis of these species.  Please contact Doug Smith (douglas.smith@agencourt.com), Bill Gelbart (gelbart@morgan.harvard.edu) or Thom Kaufman (kaufman@indiana.edu) regarding any such contributions.

 

Accessing Drosophila Genome Resources

 

 

Sincerely,

 

Doug Smith, Agencourt Inc.  (douglas.smith@agencourt.com)

Bill Gelbart, Harvard U.  (gelbart@morgan.harvard.edu)

Thom Kaufman, Indiana U. (kaufman@bio.indiana.edu)

Michael Eisen, UC Berkeley & LBNL (mbeisen@lbl.gov)

 


Send comments to us at flybase-help AT morgan.harvard.edu