Over the last decade, the emerging research methods, like comparative genomic analysis and phylogenetic study, have yielded new insights into the understanding of logics of genotype and phenotype among closely related bacterial strains. Several findings have revealed that genome structure variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, could lead to different phenotypes among strains and investigation of genes locating in structure varied regions may extend our knowledge about the relationships between SVs and phenotypes in microbes, especially in pathogenic bacterium.

In this work, we introduce a ‘Genome Topology Network’ (GTN) method based on gene homology relationships and gene locations to analyze genome structure variations and perform phylogenetic analysis. Furthermore, we propose the concept of ‘unfixed ortholog’, whose members vary in terms of genome topology among close species. In order to improve the precision of ‘unfixed ortholog’ recognition, we apply a strategy to detect annotation difference and complete the un-annotated genes among close genomes. To illustratively address the features of GTN method, a set of thirteen complete M. tuberculosis genomes is analyzed as a case study. We build GTNs with two different gene homology-assigning methods, which are Clusters of Orthologous Group (COG) method and orthoMCL clustering method respectively, and construct two phylogenetic trees, which may provide additional insights into whole genome-based phylogenetic analysis. As a result, we obtain 24 most unfixable COG groups and most of them relate to immunogenicity and drug-resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309).

The tool allows re-annotating the ‘lost’ genes among close genomes, analyzing genes located in genome structure varied regions, and performing phylogenetic analysis. With this tool, we find out that many immunogenic-related and drug-resistant genes locate in structure varied regions on M. tuberculosis genomes. We believe that the GTN method would be suitable for the exploration of genome structure variations in connection with biological features of bacterial strains and the GTN based phylogenetic analysis would provide additional insights into whole genome-based phylogenetic analysis.

The process of GTN:

GTN process

Get detail process(Process).

Latest release:

Download GTN tools(GTN-v1.1.0.zip).

Manual(GTN manual v1.1.0)

Old versions:

Download GTN tools(GTN-v1.zip) and manual (GTN manual v1.0.1).

Updates:

2014/09/02: The version, V1.1.0 that adds bootstrap test in phylogenetic analysis is coming soon!

2015/02/19: The article has been published on BMC Genomics. The link is Jiang, J., Gu, J., Zhang, L., Zhang, C., Deng, X., Dou, T., … & Zhou, Y. (2015). Comparing Mycobacterium tuberculosis genomes using genome topology networks. BMC Genomics16(1), 85.

2015/03/10: The V1.1.0 released. The R package ape is required in this version.

If you have any question,please feel free to contact jiangjian_123@163.com.

 

2018/3/30: The GTN version 2.0 released.

1. The draft genomes now can be analyzed.

Initiate the GTN with “gtn_CompleteOnly.pl” if your data consist of complete genomes only, or utilize “gtn_WithDraft.pl”. The GTN firstly locates the synteny block in genomes if draft data exist.

2. The Markov Cluster Algorithm is introduced to assign the gene families.

The Markov Cluster (MCL) Algorithm is introduced in this version. This algorithm is also used in many clustering tool, such as orthoMCL,

3. Unique gene connection information will be yielded as a result file.

GTN is an approach which studies closely-related bacterial genomes by analyzing the adjacent gene family pairs in genomes. The evolution distance calculation is essentially based on the different relationship of gene connection. In other words, the relationship of gene connection influences the separation of genomes in phylogenetic tree. The unique gene connection reflects the evolution events, such as gene duplication, gene insertion.

Hence, the GTN provides unique gene connection and the genes in these connection information to users for further research, such as function enrichment.

 

The new workflow:

 

Download gtn2.0 in here.

The usage of the GTN and instruction of result files can be found in “README” in GTN or here.

The test data can be download here:

complete_testdata_1

complete_testdata_2

complete_testdata_3

complete_testdata_4

draft_testdata_1

 

Comments are closed.

Set your Twitter account name in your settings to use the TwitterBar Section.