MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes

TitleMarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
Publication TypeJournal Article
Year of Publication2023
AuthorsGroussman R.D, Blaskowski S., Coesel S.N, Armbrust E.V
JournalScientific Data
Volume10
Pagination926
ISSN2052-4463
KeywordsClassification and taxonomy, Microbial genetics, transcriptomics
Abstract

Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data, resulting in taxonomic annotation of about half of eukaryotic environmental transcripts. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), a marine microbial eukaryotic sequence library designed for use with taxonomic annotation of eukaryotic metatranscriptomes. We gathered 902 publicly accessible marine eukaryote genomes and transcriptomes and assessed their sequence quality and cross-contamination issues, selecting 800 validated entries for inclusion in MarFERReT. Version 1.1 of MarFERReT contains reference sequences from 800 marine eukaryotic genomes and transcriptomes, covering 453 species- and strain-level taxa, totaling nearly 28 million protein sequences with associated NCBI and PR2 Taxonomy identifiers and Pfam functional annotations. The MarFERReT project repository hosts containerized build scripts, documentation on installation and use case examples, and information on new versions of MarFERReT.

URLhttps://www.nature.com/articles/s41597-023-02842-4
DOI10.1038/s41597-023-02842-4