May 10,
2006
Dear
Colleagues,
We are writing to bring you
up to datewith a status report of the
comparative Drosophila genome sequencing and analysis plans.
Sequencing and
Assembly
The finalization of assembly freezes
was delayed by the opportunity to improve several of the initial assemblies
using elements of alternative assemblies.
This "reconciliation" process was carried out by a group at
the University of Maryland (Jim Yorke, PI), starting with Arachne, Celera and /
or PCAP assemblies. The results of
reconciliation are now complete.
The set of final reference assemblies, selected by the sequencing
centers as the assemblies that will be submitted to GenBank are listed below. The assemblies are all available
through the AAA site and BLAST access is provided through FlyBase (see
"Accessing Drosophila Genome Resources" section below).
|
Dros. species |
Sequencing
& Assembly Status |
Seq.
Center |
|
virilis |
U. MD.
Reconciled Arachne/Celera Assembly |
Agencourt |
|
ananassae |
U. MD. Reconciled Arachne/Celera Assembly |
Agencourt |
|
mojavensis |
U. MD.
Reconciled Arachne/Celera Assembly |
Agencourt |
|
erecta |
U. MD.
Reconciled Arachne/Celera Assembly |
Agencourt |
|
grimshawi |
U. MD. Reconciled Arachne/Celera Assembly |
Agencourt |
|
willistoni |
U. MD. Reconciled Arachne/Celera Assembly |
JCVI |
|
persimilis |
Arachne
Assembly |
Broad
Inst. |
|
sechellia |
Arachne Assembly |
Broad
Inst. |
|
yakuba |
PCAP Assembly |
Wash U. |
|
simulans |
Mosaic
PCAP Assembly |
Wash U. |
Most species have been sequenced to
deep WGS coverage levels. The persimilis
and sechellia projects have been sequenced to low
WGS coverage (~3-4X) with the core assemblies derived independently and then
enhanced by synteny to related species.
For simulans, there is 2.8X coverage of one strain (w501) and 1X coverage of 6 other
strains. The mosaic simulans assembly layers reads from these
other strains onto the core w501 assembly.
For Drosophila melanogaster, the finished Release_4 arms (BDGP)
and draft Release_3.2b heterochromatin (DHGP) will be used as the reference
assembly. For Drosophila
pseudoobscura, the
Release_2 Atlas Assembly (Baylor) will be the reference assembly.
Other Assemblies
While it is necessary to know which
assembly will be treated as reference with regard to the eventual submission of
annotated reference genomes to GenBank, it may be worthwhile for annotation
groups to compare predictions that emerge from alternative assemblies. For this reason, some additional
assemblies are posted on AAA.
Timetables for Submissions of
Annotation Sets
The goal is to establish a consensus
annotation set for each genome.
This will occur in two phases:
á
By
April 30, we ask all annotation groups to submit their annotation sets in GFF3
format to AAA, based on the final frozen assemblies. Thom Kaufman has organized a workshop on Saturday evening,
April 1, 2006 (9:30-11:30pm) at the Drosophila Research Conference in Houston
to discuss the status of the analysis of the assemblies, the production of the
annotation sets and the plans for publication.
á
By
mid-June, consensus annotation sets that can be used for downstream analysis
will be produced. Early in May, we
hope to convene a meeting of the annotation groups to discuss the status of the
annotations and the process by which consensus sets will be produced. We are exploring holding this meeting
during the Cold Spring Harbor genome conference.
Downstream Analyses &
Initial Publications
If we achieve these targets, our
publication goal will be to submit the core manuscripts describing the basic
assemblies, annotation sets and overall comparative descriptions by September
1, 2006. So as to not constrain
the research of individual laboratories, we propose to coordinate downstream
analyses only insofar as we will act as liaisons with one or more journals to
coordinate publication of the manuscripts together at an agreed-upon date, and
by providing information on who is planning to do what analyses. It will be up to the individual
participating groups to make sure that they have completed their analyses and
submitted their manuscript on time for peer review and publication. The editors
of Nature and Nature Genetics have expressed a strong interest in publishing
the results (including a main paper summarizing the sequencing and major
findings and a collection of papers going into more depth on various aspects of
the analysis). Genome Research and
Genetics would also be interested in publishing some of the in-depth analyses
in an issue timed to come out at about the same time as the main paper.
Mapping Supercontigs
onto the Chromosome Arms
There are two approaches being
undertaken to create assemblies that approximate the extent of the euchromatin
of the chromosome arms of each species.
Using
synteny to build arm-sized sequence maps: Bill GelbartŐs group
(gelbart@morgan.harvard.edu) is evaluating the feasibility of aligning
supercontigs into chromosome arm-sized units (ultracontigs) using syntenic
information.
Association
of sequence maps with genetic maps: Sequence
tagged genetic markers (e.g., recombinationally mapped cloned genes,
microsatellite markers, SNPs) will be used to associate the supercontigs and/or
ultracontigs with the linkage map of each species. Polytene in situ hybridization using markers from anchor points on the
superscaffolds and//or ultracontigs will be used to associate the sequence maps
and the cytogenetic map of each chromosome arm. Thom Kaufman (kaufman@bio.indiana.edu), Bryant McAllister
(bryant-mcallister@uiowa.edu) and Teri Markow (tmarkow@public.arl.arizona.edu )
have organized this effort and identified people to take the lead on organizing
their species community to establish the map associations for each species:
o
melanogaster species group (simulans, yakuba, sechellia and erecta): Michael Ashburner (ma11@gen.cam.ac.uk) and Thom Kaufman
(kaufman@bio.indiana.edu)
o
ananassae: Muneo Matsuda
(matsudam@kyorin-u.ac.jp) and Kiyohito Yoshida (majin@ees.hokudai.ac.jp)
o
pseudoobscura: Steve Schaeffer
(swschaeffer@psu.edu)
o
persimilis: Mohamed Noor (noor@duke.edu)
o
willistoni: Claudia Rohde
(claudiarohde@yahoo.com)
o
virilis: Bryant McAllister
(bryant-mcallister@uiowa.edu) and Jorge Vieira (jbvieira@ibmc.up.pt)
o
mojavensis:
Teri Markow
(tmarkow@public.arl.arizona.edu)
o
grimshawi:
Patrick O'Grady
(pogrady@nature.berkeley.edu)
Whole Genome
Alignments
Our discussions with alignment
groups led us to conclude that it does not make sense to strive for a single
set of DNA based alignments because they all differ somewhat, and people have
their own preferences about which ones are most useful for their particular
downstream analyses. We would, however, like to make sure that we end up
with alignments of a similar quality as those being produced for the human
ENCODE regions. As it stands now, it looks as though we will end up with
four different sets of alignments: MERCATOR/MAVID (Pachter), Multiz (UCSC),
LAGAN (Sidow/Batzoglou) and TBA (Webb Miller/Karro).
While we do not wish to constrain
any other groups work in any other area of research on these Drosophila
genomes, we are happy to invite others to contribute work that might be
appropriate for the main paper on the assembly, annotation and analysis of
these species. Please contact Doug
Smith (douglas.smith@agencourt.com), Bill Gelbart (gelbart@morgan.harvard.edu)
or Thom Kaufman (kaufman@indiana.edu) regarding any such contributions.
Accessing Drosophila Genome Resources
Sincerely,
Doug Smith,
Agencourt Inc.
(douglas.smith@agencourt.com)
Bill
Gelbart, Harvard U.
(gelbart@morgan.harvard.edu)
Thom
Kaufman, Indiana U. (kaufman@bio.indiana.edu)
Michael Eisen,
UC Berkeley & LBNL (mbeisen@lbl.gov)