Manual inspection of microbial clusters

The manual inspection process was done in two stages. First, a fast scan of all microbial clusters was done in order to remove clearly false clusters. Next, a through inspection of the remaining clusters was done in order to verify both the origin of the cluster members as well as the cluster annotation. Criteria used are the following:

Cluster annotation

  • % identity of cluster members to proxy proteins
  • Matching of start/end coordinates of proxy proteins in the alignments to their real begin and end positions. For example: for a gene that is not located at the edges of a scaffold or next to a series of N's we would expect to see an alignment that fits the full length of its proxy protein. In other cases we would expect to see an alignment to either the beginning or end of the proxy protein, but not to some inner portion.
  • Possible alternative annotations (via blast of cluster members against nr and scanning the best hits)
  • Possible alignment due to low-complexity regions
  • Origin

  • Support of Northern Line Islands recruitment to affiliation of scaffolds containing the cluster genes
  • % of viral genes on scaffolds containing cluster genes (higher is better)
  • Origin of viral genes on cluster scaffolds (same origin for most viral genes is a stronger evidence than varying origin)
  • Alignment quality of viral genes to their RefSeq hits
  • Possible alternative non-viral affiliation for viral genes. This was done using blast against nr in cases of indefinite viral affiliation of scaffolds