As part of the integration of FlyBase services, the Gadfly annotation database and Berkeley Fly BLAST are being retired. Equivalent services are now available from the FlyBase server at Indiana University. This is now the primary public server at http://flybase.net/
We welcome your input. Please e-mail flybase-help AT morgan.harvard.edu with suggestions or questions.
FlyBase-NG includes more changes 'under the hood' than you see on its public web pages. These are part of an evolution to a next generation of genome databases and information systems, and will be ongoing for some time. As FlyBase moves into its second decade, we want to ensure that the best of new genome database methods from the collective wisdom of bioinformatics are used, without sacrificing the best parts that have taken the work of a decade to match to needs of Drosophila, genomics and biosciences research communities.
FlyBase got its name in part from SyBase: commercial relational database software used since early days in the project. We are moving to a more completely publicly copyable and usable database, which is shared with other genome database projects and which can be used without commercial software.
This new database is named Chado (after "the Way of Tea", a Japanese tea ceremony, a pleasant name to be home to all those colorful Drosophila gene names). It includes a new design or schema for structuring Drosophila or any other genome information, which is still being worked out. It works with new database software, the freely usable PostgreSQL system (we hope also with other database software). It includes new data exchange format (Chado XML). And significantly, this includes a much larger group of bioinformaticians sharing the efforts of developing and using these parts: Find Chado database parts are all available in the Generic Model Organism Database group with web site http://www.GMOD.org/ .
Over the coming year more of Drosophila and other genome data will be moving to a home in Chado, and more options for database searching and data mining will become available.
Current FlyBase services now and for a while to come straddle older and newer methods, and it will be in steps as we test and work on this that the best of these methods come into operation.
The initial work with Chado database has been a migration of genome annotation data, from Gadfly database (MySQL based, with GAME XML data exchange format). At the close of December 2003, we have provided the first public use of Chado data in an update of these annotations. This data set has the same genome sequence and annotation locations as the last GadFly release 3.1, but it contains updated IDs and gene names. This now forms the annotation data searching and reporting at FlyBase.
Some statistics comparing this Chado release 3.1 (called r3.1.0_12182003) with prior Gadfly database release 3.1, as well as the incipient update release 3.2 are listed at http://flybase.net/annot/prerelease/
| |||||||||||||||||||||||||||||||||||||||||||||
FlyBase-NG can be copied, run on your local computer or informatics center, and is designed for automatic updates to keep it current. It works on popular Unix systems including MacOSX, Linux and Solaris.
We welcome help from bioinformatics centers, including industry and governmental, who wish to provide a copy of FlyBase for local and regional users.
The new genome database/web server infrastructure is called Argos, and is fully open-source, copyable and reusable. See Argos Server documents and installation information (http://flybase.net:8081/) and GMOD project (http://www.gmod.org/argos/)
Argos can be used for other organism genome databases. FlyBase-specific parts are separate from common genome web database parts, which include BLAST, GBrowse, Web server, database and informatics middleware. The euGenes multi-eukaryote genome database, a new Daphnia genome database and others are available using Argos infrastructure.
This Argos underpinning for the new FlyBase server provides a general method for making it robust to high volume usage: the server is automatically clonable, and compute intensive calls can be passed on to any of these clones, transparent to users who see only the main server URL. This is now done with data reports and BLAST calls. You will see at the bottom of many FlyBase web pages: "Run on computer xxxx". Some of our genome web/database software needs to be re-engineered to be used this way, an ongoing task. In time most of FlyBase's web database tasks will be distributed among several computers.
One obvious benefit of this is that many of the computed web pages appear much faster, and as usage increases these will be kept running fast by adding more clone servers. E.g., you now get gene reports about 5 times faster than before (1.2 seconds now versus 6.5 seconds for flybase-old).
Some folks want old data, as is a common practice in sciences and industry to go back and check on your old work. A new service at FlyBase is the maintenance of archives. We will periodically create frozen copies of the FlyBase database/server, and continue to provide these for public use. You can find archived FlyBase servers with older data at http://flybase.net:8081/flybase-archive/
Some folks want the newest data, even if it hasn't yet passed all quality checks. We have extended our long-running method for adding daily updates to include previews of major new releases of data that is still in the works. Find preview data servers at http://flybase.net:8081/flybase-preview/
Hits per Day 52,000 (ave) 98,000 (max) Usage groups: 23% commercial, 41% unresolved, 14% eduFlyBase usage a year ago (Jan 2003):
Hits per Day 33,000 (ave) 58,000 (max) Usage groups: 12% commercial, 16% unresolved, 30% edu
Statistics message: there has been about 2x growth with many more commercial users, robots, data miners and other high-volume users. FlyBase is not as busy as the NASA Mars Rover landing day by a large factor yet, but as it grows we are ready to use similar methods of distributing usage among as many servers as needed.
It is annoying to find the main FlyBase web server taking coffee breaks in the middle of the morning. Most of the recent server outages have been caused by misbehaving, over-eager robots and data-miners, and Microsoft Explorer web-archive-everything calls. These are just a small percent of clients, but when a single robot misbehaves it can drive a web server to its knees.
We switched in December from ancient Apache web software to the most widely used web server, and added complexity to handle a higher volume of compute intensive programs (blast, etc.). This came with usual problems of "newer is better" software: slower, more complex, and more memory intensive, etc. As well this new web server has a greater tendency to tie up the entire computer when something goes wrong due to a traffic-jam style backup of problems. The cure for this is attention to failure details, and adding various checks and blocks to keep it stable under a wide range of web client uses, including those valued data miners in biology who want lots of data right away.
---------- Don Gilbert ; 17 January 2004