banner
뉴스 센터
세련되고 현대적인

글로벌 바이롬 데이터베이스를 활용하여 반추위 생태계의 바이러스성 암흑물질 조사

Jun 18, 2023

Nature Communications 14권, 기사 번호: 5254(2023) 이 기사 인용

331 액세스

8 알트메트릭

측정항목 세부정보

다양한 반추위 바이롬은 반추위 미생물군집을 조절할 수 있지만 아직은 거의 탐구되지 않은 상태입니다. 여기에서 우리는 바이러스 서열에 대해 975개의 발표된 반추위 메타게놈을 채굴하고, 글로벌 반추위 바이러스 데이터베이스(RVD)를 생성하고, 반추위 바이러스의 다양성, 바이러스-숙주 연결 및 반추위 기능에 영향을 미치는 잠재적인 역할을 분석합니다. 397,180종 수준의 vOTU(바이러스 작동 분류 단위)를 포함하는 RVD는 IMG/VR V3에 비해 메타게놈에서 반추위 바이러스의 탐지율을 크게 높입니다. 분류된 vOTU의 대부분은 Caudovirales에 속하며 인간 장에서 발견되는 것과 다릅니다. 반추위 바이롬은 섬유 분해제 및 메탄 생성균을 포함한 핵심 반추위 미생물을 감염시킬 것으로 예상되며 다양한 보조 대사 유전자를 운반하므로 하향식 및 상향식 방식으로 반추위 생태계에 영향을 미칠 가능성이 높습니다. RVD와 연구 결과는 바이러스가 반추위 생태계와 소화 생리학에 어떻게 영향을 미칠 수 있는지 조사하기 위한 향후 연구에 유용한 리소스와 기본 프레임워크를 제공합니다.

최근 바이러스에 초점을 맞춘 메타게놈학 연구는 해양 바이러스1,2, 인간 장3,4,5 및 토양6을 포함한 여러 생태계에 대한 매우 큰 바이러스 게놈 카탈로그 및 데이터베이스를 생성했습니다. 그들은 매우 다양한 바이롬을 밝혀냈고, 수많은 보조 대사 유전자를 확인했으며, 바이러스가 생태학적으로 미치는 영향에 새로운 빛을 비췄습니다. 더욱이, 모델 시스템에 초점을 맞춘 연구에서는 바이러스가 숙주의 생태적 적합성과 신진대사를 변화시키는 뚜렷한 바이로셀을 형성함으로써 어떻게 원핵 숙주의 대사를 재프로그램할 수 있는지 밝혀내기 시작했습니다7. 새로운 증거는 바이러스가 해양 생지화학1,8, 인간 생리4 및 질병 상태9에 미치는 잠재적 영향을 뒷받침합니다. 반추위 바이러스 또는 반추위 특정 바이러스 데이터베이스에 대한 유사한 연구는 없습니다.

반추위에는 박테리아, 고세균, 곰팡이, 원생동물 및 바이러스가 포함된 다양한 다계 생태계가 있습니다. 종합적으로, 반추위 미생물군집은 소화되지 않는 사료를 소화하고 발효시키며, 반추동물이 고기와 우유를 재배하고 생산하는 데 필요한 대부분의 에너지(휘발성 지방산 형태)와 대사 가능한 질소(미생물 단백질 형태)를 제공합니다. 반추위 박테리아, 고세균 및 원생동물과 사료 효율성, 메탄(CH4) 배출 및 동물 건강의 강력한 연관성이 문서화되어 있지만10 반추위의 특성화에 기여하는 바이러스 중심 연구에도 불구하고 풍부한 반추위 바이러스는 잘 이해되지 않은 상태로 남아 있습니다. virome11,12. 전자 현미경을 사용한 초기 연구에서는 형태학적으로 다양한 박테리오파지가 문서화되었으며 꼬리가 있는 파지가 우세하다는 사실이 밝혀졌습니다. 초기 배양 의존적 연구에서는 Prevotella, Ruminococcus 및 Streptococcus의 널리 퍼진 종을 포함하여 광범위한 종 또는 반추위 박테리아 계통을 감염시킬 수 있는 박테리오파지를 발견했으며 이러한 파지의 대부분을 형태에 따라 Myoviridae, Siphoviridae, Podoviridae 과로 분류했습니다. 및 Inoviridae(Gilbert 및 Klieve15 검토). 이러한 연구는 반추위 바이러스에 대한 귀중한 정보를 제공했지만 파지의 단순한 형태는 신뢰할 수 있는 분류학적 분류를 허용하지 않으므로 국제 바이러스 분류 위원회(ICTV: https://ictv.global/taxonomy)는 더 이상 형태를 인정하지 않습니다. 기반 바이러스 분류.

유전체학, 메타유전체학, 메타전사체학은 반추위 바이롬을 포함한 바이러스를 연구하는 주요 기술이 되었습니다. 최근 배양 의존적 전체 게놈 시퀀싱을 통해 사료 소화 및 발효에 중요한 역할을 하는 Prevotella ruminicola, Ruminococcus albus, Streptococcus bovis 및 Butyrivibrio fibrisolvens16,17을 감염시키는 10개의 파지를 식별했습니다. 이러한 파지 게놈은 모듈형 게놈 조직, 보존된 바이러스 유전자, 용해성 및 용원성 가능성을 모두 보여줍니다17. 반추위 바이러스는 또한 바이러스 유사 입자(VLP)의 메타게놈을 사용하여 연구되었습니다(11에서 검토). 그러나 사용된 참조 게놈 데이터베이스는 반추위 바이러스를 과소대표하므로 반추위 바이러스의 식별 및 분류와 숙주 예측이 제한됩니다. 예를 들어, 다양한 유전자형을 가진 반추위 바이러스가 발견되었지만 대부분은 참조 바이러스 서열과의 일치가 부족하여 분류되지 않았습니다. Miller 등18은 일부 반추위 미생물 게놈 및 메타게놈에서 클러스터링된 규칙적으로 간격을 둔 짧은 회문 반복(CRISPR)/CRISPR 관련 단백질(Cas) 요소를 발견했지만 숙주 예측을 위해 반추위 바이러스 서열과 일치하는 스페이서 서열을 거의 발견하지 못했습니다. 따라서 특히 새로운 바이러스와 관련하여 반추위 바이롬을 특성화하는 것이 어려웠습니다.

12-fold) and IMG/VR V3 and improving the identification of viral sequences based on rumen metagenomics, RVD will be useful as a new community resource and will provide new insights for future studies on the rumen virome and its implication in feed digestion, microbial protein synthesis, feed efficiency, and CH4 emissions./p>5 kb each and clustered them into 411,125 vOTUs. After validation with VIBRANT23, we constructed a rumen virome database (RVD, download available at https://zenodo.org/record/7412085#.ZDsE2XbMK5c) representing 397,180 vOTUs (Supplementary Fig. 1), with 193,327 vOTUs of >10 kb. Checking with CheckV21 revealed 4400 complete vOTUs, 4396 high-quality vOTUs, and 32,942 medium-quality vOTUs. The completeness and quality of the RVD vOTUs were probably underestimated because CheckV is database dependent, and the databases used are primarily derived from other ecosystems. All the vOTUs in RVD meet Uncultivated Virus Genome (MIUViG) standards25./p>50% completeness of the current study and the two largest human gut virome databases (MGV4 and GPD5). For better visualization, only one representative vOTU (the longest and most complete) was included for each genus-level vOTU (714 in total). The branches were color-coded: green, the Caudovirales lineages exclusively found in the human virome; red, the lineages exclusively found in the rumen virome of the current study; blue, the lineages found in both the rumen and the human viromes. Lysogeny rates (proportion) were calculated with VIBRANT and shown as the inner ring. The number of vOTUs representing each lineage was shown as a bar plot (red for human viruses, and black for human viruses). d Proportion of lineages of Caudovirales viruses unique to the human intestine, the rumen, and shared. e A rarefaction curve of the vOTUs identified in the rumen virome. The upward trend of the rarefaction curve indicates that more rumen viruses remain to be identified at the specie level./p>1 phage per host genome. The percentage of lysogenic viruses varied among the host genera, and it was low for most host genera (Fig. 3c). Most ciliate SAGs presented multiple EVEs, among which all five SAGs of Isotricha sp. YL-2021b and Dasytricha ruminantium presented the greatest number (>50) EVEs per SAG (Supplementary Fig. 5). Little is known about viruses infecting ciliates, and no EVEs have been reported for even model ciliate species (e.g., Tetrahymena thermophila). However, EVEs have been recently found in Entamoeba and Giardia in human stool metagenomes32. Therefore, rumen ciliates probably carry EVEs. The large number of EVEs per ciliate SAG may correspond to the high polyploidy and the enormous numbers of chromosomes found in many rumen ciliates (e.g., >10,000 in Entodinium caudatum33)./p>12-fold). Based on the gene-sharing network, most rumen vOTUs were clustered into four groups (Fig. 3b). Groups I (the largest) and IV (the smallest) contained more classified vOTUs than groups II and III. Groups I and IV had a broader host range among bacterial phyla, including both gram-positive and gram-negative bacteria with different niches and capacities, but few of their genera or families were predominant in the rumen. Groups II and III mainly infected Bacteroidota and Methanobacteriota, respectively (Fig. 3c), and most viruses of these two groups could not be classified with any of the current virome databases; thus, they represent new viral lineages. The narrow host range (a single phylum) of groups II and III supports the notion that phages with a high degree of gene sharing generally infect phylogenetically related hosts./p>2400) and bacteriophages (>40,000) down to the species level, and many of the host species are known to play important roles in feed digestion, fermentation, and methane emissions. Advancement in the prediction of hosts and virus‒host linkages will aid in understanding the ecological roles of rumen viruses. Such information will be especially useful when both the rumen metagenome and virome are investigated for their association with major rumen functions. Among the rumen vOTUs with a predicted host match, 99.5% were inferred to infect prokaryotes primarily found in the rumen, even though most of the reference prokaryote genomes that were used came from prokaryotes in other environments, demonstrating the rigor and low false positive rate of our host prediction pipeline./p>5 kb were verified using VirSorter222 (option: --min-score 0.5), and the contigs that passed the verification procedure were input to CheckV21 to trim off host sequences flanking prophages. We only chose viral contigs >5 kb because the currently available bioinformatics tools show a relatively high false positive rate when identifying viral contigs <5 kb30. Only the contigs falling into categories Keep1 and Keep2 were retained as putative viral contigs (708,580 in total) for further analyses./p>10 kb to genus-level viral taxa based on a gene-sharing network using vConTACT226, which uses NCBI RefSeq Viral (release 88) as reference genomes. The vOTUs that could be clustered with the reference genomes of a viral genus were assigned to that genus according to the vConTACT2 workflow. We assigned the vOTUs that failed to be assigned to a viral genus and those <10 kb to family-level viral taxa using the majority rule, as applied previously4. Briefly, we predicted the ORFs of each vOTU using Prodigal56 and then aligned the ORF sequences with those of NCBI RefSeq Viral using BLASTp with a bit score of ≥50. The vOTUs that were aligned with the NCBI RefSeq Viral genomes of a viral family with >50% of their protein sequences were assigned to that family. We identified crAss-like phages using BLASTn against 2,478 crAss-like phage genomes identified from previous studies57,58,59, with a threshold of ≥80% sequence identity along ≥50% of the length of previously identified crAss-like vOTUs./p>50% were included in the search. We then aligned each of the marker genes from the three databases using MAFFT62, sliced out the positions with >50% gaps using trimAl63, concatenated each aligned marker gene, and filled the gap where a marker gene was absent. Only the concatenated marker genes that each showed >3 marker genes and were found in >5% of all the aligned concatemers were retained, resulting in 10,203 Caudovirales marker gene concatemers, each with 13,573 alignment columns. These marker gene concatemers were clustered into genus-level vOTUs as described previously5, where benchmarking was performed to achieve high taxonomic homogeneity using NCBI RefSeq Viral genomes. We built a phylogenetic tree of Caudovirales viruses using FastTree v.2.1.9 (option: -mlacc 2 -slownni -wag)64 and aligned the concatenated marker genes of the representative vOTUs sequences of all the genus-level vOTUs with genome completeness >50% (based on CheckV analysis). The Caudovirales tree was visualized using iTOL65. The vOTUs identified as prophages or encoding an integrase were considered lysogenic. The lysogenic rate (%) was calculated based on the VIBRANT results as the percentage of lysogenic viruses of all the viruses for each genus of their probable hosts./p>2,500 bp of a host genome or MAG matched a vOTU sequence at >90% sequence identity over 75% of the vOTU sequence length4. We predicted probable protozoal hosts of the rumen viruses by searching the 52 high-quality ciliate SAGs68 for EVEs using BLASTn and the above criteria./p>10 kb (5912 in total) for AMG identification using the criteria recommended in a benchmarking paper30. The selected vMAGs were then subjected to AMG identification and genome annotation using DRAMv72 after processing with VirSorter2 with the options “—prep-for-dramv” applied. Second, the AMG-carrying vMAGs were removed if the AMGs were at an end of the vMAGs or if the AMGs were not flanked by both one viral hallmark gene and one viral-like gene or by two viral hallmark genes (category 1 and category 2 as determined by DRAMv). Third, the remaining vMAGs were further manually curated based on the criteria specified in the VirSorter2 SOP (https://doi.org/10.17504/protocols.io.bwm5pc86; also see https://github.com/yan1365/RVD/blob/main/vmags_check_helper/readme.txt). We eventually obtained 1,880 vMAGs. To further minimize false identification, we manually checked the genomic context of these vMAGs and found that some of them were still possible genomic islands. Therefore, we filtered the 1880 vMAGs based on the criteria established by Sun and Pratama et al. (unpublished data). Briefly, vMAGs with only integrases/transposases, tail fiber genes, or any nonviral genes were removed. The remaining vMAGs were filtered again to remove those that did not have at least one of the viral structural genes (i.e., capsid protein, portal protein, phage coat protein, baseplate, head protein, tail protein, virion structural protein, and terminase) and those containing genes encoding an endonuclease, plasmid stability protein, lipopolysaccharide biosynthesis enzyme, glycosyltransferase (GT) families 11 and 25, nucleotidyltransferase, carbohydrate kinase, or nucleotide sugar epimerase. We eventually obtained 504 vMAGs free of genomic islands. To benchmark our curation pipeline, 100 of the vMAGs were randomly selected for detailed manual curation based on their genomic context. According to the benchmarking results, we were confident that we retained only complete vMAGs for AMG prediction. Detailed results of each curation step and full annotation of the final vMAGs and the annotation of the identified AMGs are presented in Supplementary Data 4. We compared the AMGs identified in the rumen virome to the previously identified AMGs from other viromes, which are available in an expert-curated AMG database (https://github.com/WrightonLabCSU/DRAM/blob/master/data/amg_database.tsv). For the newly identified AMGs, we double-checked the annotations and searched the literature to ensure that they were truly AMGs./p>50% concentrate). First, we transformed the raw abundance table into a binary matrix (presence or absence). Then, the prevalence of each vOTU in each sample was calculated. A vOTU was included in the core rumen virome if its prevalence exceeded 50% of the prevalence for each concentrate level or all cattle. Based on prevalence, the vOTUs were categorized as individualized (observed in only one sample), one concentrate level (observed in more than 1 sample but exclusively from a single concentrate level), two concentrate levels (observed in animals from two concentrate levels) and three concentrate levels (observed in all three concentrate levels). The numbers of vOTUs shared by the core viromes among the three concentrate levels were visualized with a Venn graph in R. We examined whether animals from the same diet or same breed share more vOTUs compared to animals fed different diets or of different breeds using subsets of data from Stewart et al.78 and Li et al.79 respectively. The Kruskal–Wallis test was used to compare the numbers of shared vOTUs in different groups in R./p>12 metagenomes were retained for the analysis. The number of vOTUs shared by two studies was compared for every study pair, and the results were subjected to hierarchical clustering. The hierarchical clustering results were visualized in R with the ComplexHeatmap package81 and annotated according to the metadata./p>