The MRPrimerV database compiled 152,380,247 high-quality PCR primer pairs for detection of 1,818 viruses, covering 2,963 viral genomes or segments and 7,144 CDSs, representing 100% of the RNA viruses in the most up-to-date NCBI RefSeq database. Due to rigorous and large-scale homology testing against all 101,684 human sequences and all viral sequences, every primer pair in MRPrimerV is highly target-specific. In addition, because MRPrimerV ranks CDSs by the penalty scores of their best primer pairs, users need only pick and use the first primer pair for a single-phase PCR experiment, or the first two primer pairs for two-phase PCR experiments.
To obtain all feasible and valid primers for RNA virus detection, we applied multiple filtering constraints and performed large-scale rigorous homology testing against all human gene sequences using the MRPrimer technology (see Publications), which returns all feasible and valid primer pairs existing in RNA virus sequences. MRPrimer performs a fairly complex, large-scale processing to simultaneously check filtering constraints and perform homology testing of all possible subsequences in a given database, based on the distributed MapReduce framework, resulting in design of very high-quality primers.
We used the entire set of RNA virus sequences in the most up-to-date RefSeq (NCBI Reference Sequence) database that have at least one coding sequence (CDS). The total number of such viruses 1,818, the total number of non-segmented genomes is 1,400; and the total number of segmented genomes is 418. The 418 segmented genomes have many segments, and the total number of segments is 1,572. Some viral genomes or segments have many CDSs, and the total number of CDSs in 1,818 viruses is 7,144. The RefSeq database provides a comprehensive, integrated, non-redundant, well-annotated set of sequences and reference standards for multiple purposes, including genome annotation, gene identification, and comparative analyses. However, the RefSeq database contains sequences for only the main genotypes of each virus. For example, it contains those for most of the conserved genomic regions of HIV-1, such as gag-pol CDS, but not those for other subtypes such as HIV-1 group M subtype B. Thus, it would be difficult to use primers designed using the RefSeq database to detect viruses belonging to such subtypes.
MRPrimer takes as input a DNA sequence database and several filtering constraints, and yields as output sorted primer pairs that satisfy both homology testing and following filtering constraints. For filtering constraints, MRPrimerV considers eight parameters for each primer and five parameters for each pair.
|Single filtering||primer length||19~23 bp||19~23 bp|
|melting temperature (TM)||58~62℃||57~62℃|
|self-complementarity||< 5-mer||< 9-mer|
|3’ self-complementarity||< 4-mer||< 4-mer|
|contiguous residue||< 6-mer||< 6-mer|
|End stability (∆G)||>= -9 kcal/mol||>= -9 kcal/mol|
|Hairpin||< 4||< 9|
|Pair filtering||length difference||<= 3-mer||<= 5-mer|
|TM difference||<= 5℃||<= 5℃|
|product size||100~500 bp||70~500 bp|
|pair-complementarity||< 5-mer||< 9-mer|
|3’ pair-complementarity||< 4-mer||< 4-mer|
The MRPrimerV database consists of nine key-value tables: one table for PCR primers, one table for TaqMan probes, five partial annotation tables for five query types, one full annotation table for viral genomes, and one full annotation table for viral coding sequences. The database is physically stored using Redis, an in-memory key-value store that supports various kinds of data structures for various types of values
MRPrimerV provides two kinds of interface that users search for primer pairs for a target RNA virus: simple search and glossary. In simple search interface, users input a target RNA virus (as organism, keywords, GenBank accession, NCBI gene symbol, or NCBI Gene ID) and click the search button. MRPrimerV then immediately outputs the best primer pairs for each coding sequence of the target RNA virus. In glossary page, lists of RNA viruses are sorted alphabetically, and so, users can easily browse the RNA viruses and get the primer pairs for a specific virus by clicking it.
|Total number of non-segmented genomes or segments||2,972 (100%)||2,972 (100%)|
|Number of non-segmented genomes or segments covered by CDS-specific primers||2,944 (99.1%)||2,960 (99.6%)|
|Number of non-segmented genomes or segments covered by both CDS- and virus-specific primers||2,955 (99.4%)||2,963 (99.7%)|
|Total number of RNA viruses||1,818 (100%)||1,818 (100%)|
|Number of viruses covered by MRPrimerV||1,817 (99.9%)||1,818 (100%)|
MRPrimerV supports following browsers:
Users are welcomed to send their own experimental validation results using MRPrimerV. Please send validation results with following information.
Send email with questions or comments about this web site.