Mycobacterium tuberculosis (MTB), the causal organism of the oldest infectious disease tuberculosis is the leading cause of morbidity and mortality worldwide. This pathogenic organism has been evolved into variety of strains with diverse genotype, phenotype and pathogenic properties such as MTB H37Rv and CDC1551 strains which are virulent, while MTB H37Ra is an a virulent strain and MTB KZN strain is resistant to different antituberculosis drugs. Due to the advancement in genome sequencing and molecular biology, whole genomes of different MTB strains have been completely sequenced. Genomic as well as proteomic comparison among the sequenced strains will help in understanding the differences between virulent, a virulent and resistant organisms. This article reviews the information available on completely sequenced MTB strains and presents the studies reported by researchers on genomic and proteomic comparison of various MTB strains.
Keywords: Virulence; Drug resistant; tuberculosis; Genomics; Proteomics
tuberculosis (TB), is the oldest infectious disease caused by a bacterium called Mycobacterium tuberculosis (MTB), that most commonly infects the lungs. It is transmitted from person to person via droplets from the throat and lungs of people with the active respiratory disease. Even after decades of discovery of MTB, TB infection remains a main cause for extensive morbidity and mortality. One in three of the world’s population is considered to be infected with MTB, with 9.6 million people are estimated to have active TB infection in 2014 amongst which 12% were HIV-positive. Further, of the 4,80,000 cases of multidrug-resistant TB (MDR-TB) estimated to have occurred in 2014, only about a quarter of these -1,23,000 were detected and reported . Outbreaks of extensively drug-resistant (XDR) tuberculosis have also been an increasing threat in certain regions around the world . Even if MTB is very virulent, there is no simple answer found so far, that explains for the virulence of this organism. In course of time, this pathogenic organism has been evolved into different resistant strains with diverse genotype, phenotype and pathogenic properties. However, recent studies about MTB propose very little diversity in its genomic sequence [3,4].
All the strains of MTB have developed several mechanisms to survive inside the host environment. First, MTB enters into the macrophages via cell surface molecules, including those of the integrin family CR1 and CR3 complement receptors . Subsequently, after engulfment by macrophages, most tubercle bacilli are directed to phagolysosomes . Then MTB bacilli bud out from the fused phagolysosomes into vacuoles which further fail to fuse to the secondary lysosomes and thus escape from lysosomal killing. Due to this mechanism MTB bacilli able to reside temporarily in the phagolysosome which stimulate a response to the intracellular environment in MTB that helps its long-term survival and reproduction . The dormancy or latency of MTB allows the bacterium to escape the activated immune system of the host. Though the detailed mechanisms by which MTB enters into the host cell, circumvents host defenses and reaches to neighboring cell are not fully understood, it has developed different effective survival approaches for surviving effectively in the host environment, which includes (a) The inhibition of phagosome-lysosome fusion; (b) The inhibition of phagosome acidification; (c) The recruitment and retention of tryptophan aspartate containing coat protein on phagosomes to prevent their delivery to lysosomes; and (d) The expression of members of the host-induced repetitive glycine-rich protein family of proteins .
Further, there are many genes of MTB are reported to be associated with pathogenesis and virulence. The protein codes by mce gene of MTB has the ability to invade HeLa cells and survive within the host macrophages  which suggests that this gene is involved in the invasion of host tissues . The MTB erp gene which codes a secretory protein helps in survival of microorganism in host macrophages . The iron-regulated genes of gram negative bacteria are reported as important factors for virulence  and the MTB also synthesizes two distinct iron-regulated siderophores  which helps in the growth and survival during the course of infection. Further, the fad D33 gene, encoding an acyl-coenzyme a synthase, plays a vital role in MTB virulence .
The complete genome sequence information [14,15] of different MTB strains, have provided valuable understanding of its biology. The accessibility of the genomic and proteomic information of MTB combined with high-throughput technologies might open the new landscape for the development of novel diagnostic techniques, better vaccine and drugs against TB  With the decreased expenses of genome sequencing technology  and advancement in functional genomics & molecular biology, whole genome sequence information of different strains of MTB has been released and available in the public domain. As of June 2016, complete genome sequence of several clinical and laboratory strains of MTB are available at “National Center for Biotechnology Information (NCBI)” (Table 1).
|Organism Name||Strain||Size (Mb)||GC%||Genes||Proteins|
|Mycobacterium tuberculosis H37Rv||H37Rv||4.41153||65.6||4008||3906|
|Mycobacterium tuberculosis CDC1551||CDC1551||4.40384||65.6||4113||3964|
|Mycobacterium tuberculosis H37Ra||H37Ra; ATCC 25177||4.41998||65.6||4153||4069|
|Mycobacterium tuberculosis F11||F11||4.42443||65.6||4139||4043|
|Mycobacterium tuberculosisKZN 1435||KZN 1435||4.39825||65.6||4118||4014|
|Mycobacterium tuberculosisstr. Haarlem||Haarlem||4.40822||65.6||4112||4015|
|Mycobacterium tuberculosisKZN 4207||KZN 4207||4.39499||65.6||4115||4028|
|Mycobacterium tuberculosisKZN 605||KZN 605||4.39912||65.6||4122||4016|
|Mycobacterium tuberculosisstr. Erdman = ATCC 35801||Erdman (ATCC35801)||4.39235||65.6||4129||4010|
|Mycobacterium tuberculosisstr. Beijing/NITR203||Beijing/NITR203||4.41113||65.6||4141||3937|
|Mycobacterium tuberculosisstr. Kurono||Kurono||4.41508||65.6||4139||4054|
|Mycobacterium tuberculosisH37Rv||H37Rv; TMC 102||4.39612||65.6||4135||4023|
|Mycobacterium tuberculosis||SCAID 187.0||4.37951||65.6||4099||3994|
|Mycobacterium tuberculosisstr. Haarlem/NITR202||Haarlem /NITR202||4.40479||65.6||3730||3681|
Table 1: Different strains of Mycobacterium tuberculosiscompletely sequenced so far.
The phylogenetic tree (Figure 1) obtained by taking the whole genome of these strains using MAFFT multiple sequence alignment software version 7  revealed the genetic divergence among them. Although, these strains are phenotypically and genotypically different from each other and have different virulent power, they cause the same disease in humans.
MTB H37Rv and CDC1551 strains are virulent, most causative agents of tuberculosis and susceptible to most of the anti-tuberculous drugs, while MTB H37Ra is an avirulent strain and MTB KZN (KwaZulu-Natal, South Africa) strain is resistant to different drugs like isoniazid, ofloxacin, rifampicin, kanamycin, pyrazinamide, ethambutol etc. . Further, the F11 strain of MTB reported as predominant in South African epidemic while CCDC5180 was isolated from multidrug-resistant clinical isolate. There are three strains of MTB KZN completely sequenced so far out of which MTB KZN 1435 is multi drug resistance strain while KZN 605 is an extensively drugresistant clinical isolate but KZN 4207 is a drug-susceptible clinical isolate. These dissimilarities in different strains may be due to mutation in gene resulting into generation of mutated proteins. So, there is a need for genomic as well as proteomic analysis among different strains of MTB to understand the mechanism of variation among them. Genomic and proteomic analysis of virulent, avirulent and drug resistance strains will help in developing more effective and safe drugs for TB control.
In 1905, Edward R. Baldwin isolated H37 from a male nineteen years old pulmonary tuberculosis patient . The MTB H37Rv strain obtained originally from the human-lung H37 isolate in 1934, since then it has been broadly used in biomedical research worldwide. MTB H37Rv preserves its complete virulence properties in animal model and is susceptible to anti tubercular drugs. The whole genome of this pathogenic strain was sequenced in 1998 . The genome consists of 4411532 base pairs having 65.6% guanine+cytosine (GC) content. It contains more than 4000 protein coding genes and the gene density is at one gene per kilo bases. Genes in the genome are evenly dispersed on both forward and reverse strands. Almost one half of the coding sequences are due to gene duplication and domain shuffling .
MTB H37Ra is an avirulent strain derived from the H37 which has several distinct properties as compared to MTB H37Rv. Those includes lack of cord formation , declined survival inside macrophages  or under anaerobic conditions , a “raised colony morphology”, lack of neutral red dye binding  and decreased virulence in mice  and guinea pigs . In spite of several genetic and biochemical studies in the past 70 years, the molecular mechanism responsible for the diminishment of virulence in MTB H37Ra is still under study . The whole genome of the avirulent strain of MTB was sequenced by the Chinese National Human Genome Center at Shanghai. It has genome length of 4419977 base pairs with G+C content of 65.6%. Out of 4084 genes, 4034 are protein coding, 45 genes are responsible for coding tRNA whereas 3 for rRNA and 2 for others RNA.
The MTB CDC1551 strain, also nicknamed "Oshkosh", is a recent clinical isolate Isolated from a large outbreak of tuberculosis from 1994 to 1996 around Tennessee and Kentucky in the US. The CDC1551 strain appears to be highly infectious in humans, is comparable in virulence to strain H37Rv in animal models . However, this strain has not caused epidemics in man and is sensitive to a wide range of drugs. It is also highly virulent in a mouse lung model, producing several orders of magnitude resulting in more bacteria than the H37Rv strain when inoculated. The Mycobacterium tuberculosis CDC1551 genome was sequenced by TIGR and has a total length of 4403837 base pairs and 4113 genes .
MTB F11 strain is an aerobic, nonmotile, chemoorganotroph, rodshaped, non-sporulating human pathogen. It was isolated in tuberculosis patients during a TB epidemic in the Western Cape of South Africa in 1990s. The MTB F11 genome was sequenced by The Broad Institute and has a total length of 4424435 base pairs with 4139 genes. Isolates of F11 not only are a major contributor to the TB epidemic in South Africa but also are present in four different continents and at least 25 other countries in the world .
Three strains of MTB isolated from patients in KwaZulu-Natal, South Africa have been sequenced using both Solexa and Sanger sequencing technology. These three strains were selected because they represent a range of important drug resistance phenotypes spanning fully drug-sensitive (DS) to multiply drug resistant (MDR), and to extensively drug resistant (XDR). The XDR (KZN 605), MDR (KZN 1435), and DS (KZN 4207) strains were selected for sequencing from among other strains in the KZN region. Genomic features of these strains are tabulated in Table 1.
Due to advances in genomics and associated novel technologies, vast amount of data sets are generating which provide new openings for indulgent and combating both genetic & infectious diseases in humans . Comparative genomic analysis of different mycobacterial strains is also helpful in identifying the genetic basis of varying phenotypes which may further give new insights in the development of novel drugs and vaccines . Comparative genomics is a powerful and novel tool for revealing microbial evolution and identifying genes which might be responsible for encoding novel drug targets . The comparison study revealed that all members of MTB complex share 99.9% identity in their DNA sequence and having identical 16s rRNA [15,33].
A genomic approach was first carried out by Brosch et al., for identifying the variations between MTB H37Ra and MTB H37Rv at genetic level. Their study revealed dual polymorphisms in these two strains i.e., a fragment of 480 kilo bases in MTB H37Rv was found to be substituted by two segments of size 260 and 220 kilo bases in MTB H37Ra and presence of a DraI segment of size 7900 bases in MTB H37Ra which was absent in MTB H37Rv. The reported 7900 bases polymorphism was due to the removal of MTB H37Rv RvD2 in MTB H37Ra. Three IS6110 deletions (RvD3 to RvD5) from the MTB H37Rv genome were also found in MTB H37Ra. Authors of this study also described the occurrence and mechanisms of genomic differences at genomic level between MTB H37Rv and MTB H37Ra but they were not clear about the role of variation in the MTB H37Ra attenuation .
Genomic comparison between MTB H37Rv and H37Ra also revealed that, the genome of MTB H37Rv is very similar to that of MTB H37Ra and is 8,445 base pair smaller than that of H37Ra. In H37Ra and H37Rv, only 98 “single nucleotide variations (SNVs)” were identified. Out of them, 119 were found identical between MTB CDC1551 and MTB H37Ra and three were because of MTB H37Rv variation, leaving only 76 MTB H37Ra specific SNVs that affecting only 32 genes .
An in silico analyses of PE/PPE family of MTB H37Ra and MTB H37Rv revealed genetic variations in terms of numerous SNVs along with some deletions and insertions between these two strains. Due to these variations, changes are also observed in their physico-chemical properties, protein: protein interacting domains and phosphorylation, sites which can be correlated to differences in their virulence and pathogenesis .
A link between the avirulence of MTB H37Ra and a single amino acid substitution in the PhoP protein was observed by Gonzalo- Asensio et al. In this study, they focused on the phoP gene, which was found to have significant role in MTB virulence. This gene is completely conserved in all MTB complexes including MTB H37Rv except that of MTB H37Ra. There is point mutation in phoP gene resulting formation of mutilated protein with single amino acid variation i.e., replacement of the polar residue Ser219 by the nonpolar residue Leu .
Malen et al. compared membrane proteins of MTB H37Rv with its avirulent sister strain MTB H37Ra and identified more than seventeen hundred proteins. Among these proteins identified by them, majority were found to have comparable abundance in both the strains. There were 29 “membrane-associated proteins” reported with a five or more fold variation in their comparative abundance when compared one strain with the other. There were nineteen membrane and lipo proteins of MTB H37Rv and 10 other proteins of MTB H37Ra, observed with higher abundance in corresponding strains .
in silicocomparative proteomic analysis between MTB H37Rv and MTB H37Ra revealed 3759 identical proteins in MTB H37Rv and H37Ra strains while 244 proteins of MTB H37Rv and 260 of MTB H37Ra were found to be non-identical. Among these non-identical proteins, 172 were identified with mutations (Insertions/Deletions/ Substitutions) in MTB H37Ra. However, 53 proteins of MTB H37Rv and 85 proteins of MTB H37Ra were found to be distinct . Nonidentical proteins identified in MTB H37Ra and MTB H37Rv may have some important role for the variation in pathogenic property between these two strains. Further, out of 244 non-identical proteins of MTB H37Rv, 19 reported to have important biological function which showed mutation in MTB H37Ra. 40 proteins were identified with single amino acid variation in MTB H37Ra. Different mutation analysis systems and online Bioinformatics resources were employed to study the effect on protein variation and observed that five proteins of MTB H37Rv have lost their normal function in MTB H37Ra strains with the single amino acid mutation . These proteins may have important contribution in pathogenicity of MTB H37Rv strain.
With the help of comparative genomics, two tandem duplications of 29 and 36 kb in the chromosome of Mycobacteriumbovis BCG Pasteur strain have been revealed . The entire genome comparison among different strains of MTB complex revealed the mutation (insertion/deletion/substitution), gene duplication and selection on the MTB strain evolution.
The CDC 1551 strain known to cause TB outbreak in the United States in 1990s  was observed to be comparatively less virulent that MTB H37Rv . The genomic comparison of MTB H37Rv and MTB CDC1551 revealed 86 InDels and 1075 Single Nucleotide Polymorphisms (SNPs), of which 579 were observed to be nonsynonymous, focusing on the association of genotypic changes with phenotypic variation .
The Mycobacterium bovis(affecting cattle) genome sequence was found to be 99.95% identical to the genomes of MTB CDC1551and MTB H37Rv but with slightly smaller genome size. With the comparison of 2504 coding sequences (CDS) among these three genomes revealed 1600 CDS of M. bovis identical to MTB H37Rv and MTB CDC1551 respectively. There were 2400 SNPs identified between the two MTB strains and M. bovis [6,15]. The genome of Mycobacterium leprae (M. leprae ) has undergone enormous gene loss, leaving only 1604 functional protein coding genes in the bacillus . M. leprae is known to cause leprosy. Out of 1439 common genes of MTB and M. leprae , a set of 219 genes were found to be unique to mycobacteria through in silico comparative analysis . Arnold et al., revealed the existence of short sequence repeats in MTB used for genotyping schemes through whole genome comparison . Comparative genomics will also provide a proficient direction in making out the genetic based variation responsible for difference in phenotype, pathogenicity and host range among different mycobacterial species/strains. The current advances in comparative and functional genomics have also improved our understanding of genetic diversity among the MTB complex. Diaz et al., explored and identified genetic variability among different MTB strains through DNA microarrays technology .
The genome comparison of M. bovis BCG Pasteur 1173P2 (BCG Pasteur) with MTB H37Rv, MTB CDC1551, and M. bovis AF2122/97 discovered Large Sequence Polymorphisms (LSPs) which led to the loss of 133 genes in BCG Pasteur [45,46]. in silico analysis of MTB proteomes identified the existence of two novel protein families, PE and PPE . Upon proteomic comparison of two M. bovis BCG non virulent strains (Chicago and Copenhagen) with two MTB virulent strains (Erdman and H37Rv), Jungblut et al. identified distinct proteins by mass spectrometry . 27 diverse proteins specific to MTB were identified upon proteomic comparison of culture supernatant from MTB H37Rv and M. bovis BCG strain . Miallau et al. identified “RelBE-like toxin-antitoxin complexes” associated with lethality of MTB .
Whole-genome sequencing of KwaZulu-Natal MDR and XDR outbreak strains revealed that the MDR strain (KZN-1435) has 150 short indels (<100 bp) and 46 large indels (>100 bp) relative to the wild-type strain, and the XDR sequence has 162 short indels and 37 large indels. The MDR and XDR strains contain typical mutations in gyrA, rpoB, rrs, katG, and the promoter of inhA that explain resistance to fluroquinalones, rifampicin, kanamycin, and isoniazid . Further, 22 novel mutations were identified which were unique to the XDR genome or shared only by the MDR and XDR genomes and not already known to be associated with drug resistance .
The computational genomics-proteomics analysis of 21 mycobacterial genomes, reported that 1250 Mycobacterium gene families are conserved across all species . Further, a recent study by Viale et al. describes the contribution of genomics and functional genomics to studies of the evolution, virulence, epidemiology and diagnosis of Mycobacterium bovis and Mycobacterium avium subspecies paratuberculosis . Genomic and transcriptomic analysis of the streptomycin-dependent MTB strain 18b provide insight into both the evolution of tubercle bacilli and the functioning of the ribosome . Further another comparative genomics and proteomic analysis of four Non-tuberculous Mycobacterium spp. and MTB complex reported the occurrence ESX-1, ESX-3, and ESX-4 regions (suppose to code for immunogenic proteins) in the genomes of most mycobacteria .
Comparative genomics revealed the genomic diversity among different MTB strains, specifically, the identification of particular genes that differ between virulent, avirulent or attenuated and drug resistance MTB strains. Though, most of the comparative genomics studies have been carried out on MTB H37Rv, MTB H37Ra, MTB Erdman, MTB CDC1551 and Mycobacterium bovis BCG, comparative genomics and proteomics studies are yet to be done for many strains of MTB sequenced recently. Therefore, the development of an in silico technology to study the gene and protein variations of different MTB strains will enhance understanding of virulence and drug resistance properties among them which may give insights the molecular mechanisms of pathogenicity, drug resistance and also give a new direction for the development of new drugs against TB.