QTLTableMiner++.
Search open-access articles for genes associated with traits and make these data increasingly FAIR.
QTL TableMiner++ (QTM)
Description
A significant amount of experimental information about Quantitative Trait Locus (QTL) studies are described in (heterogenous) tables of scientific articles. Briefly, a QTL is a genomic region that correlates with a trait of interest (phenotype). QTM is a command-line tool to retrieve and semantically annotate results obtained from QTL mapping experiments. It takes full-text articles from the Europe PMC repository as input and outputs the extracted QTLs into a relational database (SQLite) and text file (CSV).
Requirements
- Java 1.7 or later
- Apache Maven 3.x
- SQLite 3.x
- Apache Solr 6.x with domain-specific vocabularies and ontologies (Solr cores):
- Gene Ontology (GO)
- Plant Trait Ontology (TO)
- Phenotypic quality ontology (PATO)
- Solanaceae Phenotype Ontology (SPTO)
- STATistics Ontology (STATO)
- Chemical Entities of Biological Interest (ChEBI)
- access to full-text articles (in XML) from Europe PMC
Installation
git clone https://github.com/PBR/QTM.git
cd QTM
mvn install
solr/install_solr.sh
Example use
- input:
articles.txt
with PMCIDs (one per line) - output:
qtl.csv
andqtl.db
(see the database model or Entity-Relationship diagram here)
./QTM articles.txt
./QTM -h
``` ...
USAGE
QTM [-v|-h] QTM [-o FILE_PREFIX] FILE
ARGUMENTS
FILE List of full-text articles from Europe PMC. Enter one PMCID per line.
OPTIONS
-o, --output FILE_PREFIX Output files in SQLite/CSV formats. (default: qtl.{db,csv}) -v, --version Print software version. -h, --help Print this help message. ```
Note: The example I/O files are provided in the data directory. In case you don't have Internet access or the Europe PMC API does not work, please copy the articles (.xml
) from this directory to the root of this repository.