Interpreting the cluster pages
This page provides explanations for the different parts of the cluster pages. Please refer to the VirMic
introduction page for an overview on the project and its goals.
OverviewThe VirMic clusters represent groups of genes that are likely to be of viral origin and have the same functionality. The clusters are generated automatically for the most part, and may suffer from redundancies (several clusters representing the same functionality) or errors (wrong annotations, different functions within the same cluster etc.). The VirMic cluster pages are supposed to provide information that will help you to determine whether a cluster is authentic, and also to find its relations to other clusters. Specifically, you'll be able to find information regarding the following types of analysis:
Line Islands recruitmentEach scaffold was blasted against combined datasets of three viral and three microbial samples coming from three of the four Northern Line Islands (one viral and one microbial samples from each location). Samples from the Christmas island were excluded due to a large portion of microbial sequences that were discovered in the viral samples that are probably not authentic. Recruitment is defined as the percentage of positions on the scaffold that were covered by at least one recruited Line Isnland read and therefore may vary between 0 to 100%.
High recruitment from the viral samples together with low recruitment from the microbial ones is probably indicative of a scaffold of viral origin. Zero or near-zero recruitment from both samples probably means that the sequence is not abundant in the Line Islands, and therefore no conclusion can be made as for its origin.
The figure below presents a summary of recruitment for all VirMic scaffolds (blue dots) against control GOS scaffolds carrying some fragment of the 16s rDNA genes (red dots). As can be observed, recruitment for the VirMic scaffolds is usually higher by the viral samples (X axis), in contrast to the 16s scaffolds whose microbial recruitment is much higher than their viral.
Sequence similarity statisticsSequence simlarity analysis provides information about statistics of best hits when cluster members are BLASTed against For VirMic:Microbial clusters, interpretation of the results may be done as follows: Note that for small clusters, the number of alignments against VirMic is expected to be low.
RefSeq hitsThis table lists all RefSeq hits associated with this cluster. "Occurences" represents the number of cluster members each RefSeq hits "represents" (percent from all cluster members), "Description" is the protein annotation as appears in refseq-microbial or viral.
When the description of majority of RefSeq hits is consistent the cluster is probably a true one. Otherwise (many inconsistent RefSeq hits) the cluster may be wrong and should be further validated using other methods.
COGsThis list contains the best hit statistics of BLASTing all cluster members against the COG database. Similarly to the RefSeq hits case, our confidence in the cluster annotation and makeup increases when its members are associated with one or a few consistent COGs.
Neighboring clusters on the same scaffoldsLists other clusters that were found on the same scaffolds together with this cluster. This list can be used for additional validation of cluster members' tagging and also for searching for related clusters that are found in the context of this one.
Line Islands recruitment
Sequence similarity statistics
Neighboring clusters on the same scaffold