This is a quick update on the status of the DOE-JGI E. grandis genome project and the associated genome paper. There has been some important developments that will affect the genome analysis, the schedule of the genome paper and the content of companion papers.
UPDATED ANNOTATION (V1.1)
The JGI has produced a new filtered annotation set (V1.1) of the initial 8X mapped Eucalyptus grandis BRASUZ1
genome assembly. This version (1.1) is a subset of the initial annotation (1.0) and was released together with Phytozome 8.0 (Jan 16, 2012). V1.1 was produced by filtering (out) 8620 low-confidence gene models from the original v1.0 annotation using criteria described on Phytozome http://www.phytozome.net/eucalyptus.php
. Most of the removed gene models are short, incomplete models with little or no EST support and most form singletons in the Phytozome gene cluster database (i.e. they do not cluster with any other known proteins). In short, we consider this subset (36,376 primary protein coding loci) a high confidence subset and will report it as such in the main genome paper. Some contributors have already indicated that they have expression evidence for some removed gene models, or that they can find full length versions of the gene models using their own gene predictions. It is therefore likely that some legitimate gene models are included in the filtered set, which is why V1.0 will still be available in Phytozome (browser track and searchable). My opinion is that the filtered set is an important part of the genome biology of Eucalyptus - we see many pseudogenes and it is possible that the genome is undergoing gene loss after whole-genome duplication and high rates of tandem duplication. The V1.1 annotation files are available for download at ftp://ftp.jgi-psf.org/pub/JGI_data/phytozome/v8.0/Egrandis/annotation/
. The low confidence, filtered set is also available for download there.
We have decided to report gene numbers for the initial V1.0 annotation in the main genome paper and to report the V1.1 subset as a "high confidence" subset. This means that collaborators working on companion papers can keep V1.0 models in figures and tables, and then provide summary data for the high confidence models in their papers and supplementary materials as needed. We would like people to rather highlight the high confidence subset rather than the low confidence filtered set.
GENOME PAPER DRAFT
I am at JGI this week working with Jerry Tuskan on the first full draft of the main genome paper. By the end of the week, we will have a much better idea of which contributed analyses, figures and tables can be included. Some of the analyses done for the genome paper (e.g. whole-genome duplication analysis) will have to be rerun with V1.1. We will contact people directly for updated results. In some cases we also need both a short and a detailed version of the methods used to produce the contributed results (for the Online Methods and Supplementary Notes sections of the paper). We will contact people directly for that. Please be ready to provide updated results (v1.1 numbers) and methods for the genome paper draft. At this stage we are looking at producing a final draft by end of March/early April and will circulate the first draft to core collaborators as soon as possible.
We have a list of draft companion paper abstracts that we will use to motivate a special issue of New Phytologist. If anybody wants to submit additional abstracts for consideration, those can be emailed to me by the end of this week. An updated list of submitted abstracts will be uploaded to the Eucalyptus Genome Wiki on BOGAS (http://bioinformatics.psb.ugent.be/webtools/bogas/)
At this stage, we would like companion papers to be in final draft by mid April. Papers that do not get included in the New Phytologist special issue can be submitted in other relevant journals such as BMG Genomics and Tree Genetics and Genomes.
E. CAMALDULENSIS GENOME
KASUZA DNA Research Institute in Japan has decided to release the E. camaldulensis genome sequence data which they have produced via the genome database on Kazusa DNA Inst website, http://www.kazusa.or.jp/eucaly/
. Details of the database construction will be published on the Japanese Society for Plant Cell and Molecular Biology website http://www.jspcmb.jp/english/pbcontents/index.html
. We are delighted about this development. The E. camaldulensis genome data, together with the E. globulus genome data produced by JGI and by collaborators in Australia will be very valuable for comparative genomics among these three representatives of the main sections of the subgenus Synphyomyrtus including most of the commercially grown eucalypts. We encourage collaborators to make use of the E. camaldulensis genome resource.
I hope that you find this information useful. Please email me if you have any questions or suggestions for the main genome paper or companion papers.