Review Article - (2017) Volume 8, Issue 2

Beyond SNPs and CNV: Pharmacogenomics of Polymorphic Tandem Repeats

Evgeny Krynetskiy*
School of Pharmacy, Temple University, PA, USA
*Corresponding Author: Evgeny Krynetskiy, School of Pharmacy, Temple University, PA 19140, USA, Fax: 215-707-5620 Email:


Polymorphic Short Tandem Repeats (STR) emerged as a separate class of genetic mutation, which together with Single Nucleotide Polymorphisms (SNPs) and Copy Number Variations (CNVs) can explain variability in response to pharmacotherapy. STR draws interest in pharmacogenomics research because of their prevalence in the human genome, and their putative functional role as regulators of gene expression. Depending on the search algorithm, there are approximately 700,000–1,000,000 STR loci with 2-6 bp long motifs in the human reference genome. STR is non-randomly distributed across Untranslated Regions (UTRs), protein-coding sequences, and introns, and is overrepresented in the promoter regions of the human genes. The functional role of STR has been demonstrated by effects on gene expression, splicing, protein sequence, and association with pathogenic effects. An intrinsic property of STR is the high rate of mutation by expansion or contraction in the number of repeat units. Variation in the length of STR plays an important role in modulating gene expression, and STR is likely to be general regulatory elements which attenuate expression of multiple genes. Elucidating the effects of STR on gene expression may in part explain variability in drug response, something that cannot be achieved by focusing analysis exclusively on SNPs or CNV. This review summarizes the role of polymorphic STR in clinical manifestations including response to pharmacotherapy.

Keywords: Short Tandem Repeats (STR); Pharmacogenomics; Pharmacogenetics; Gene expression; Microsatellite; Human genome


Polymorphic tandem repeats draw interest in pharmacogenomics research because of their prevalence in the human genome, and their putative functional role. Polymorphic Short Tandem Repeats (STR) emerged as a separate class of genetic mutations, which together with Single Nucleotide Polymorphisms (SNPs), Copy Number Variations (CNVs), and biallelic indels can explain variability in response to pharmacotherapy.

Repetitive DNA sequences may comprise over two-thirds of the human genome [1]. Depending on the length of the repeated motif, tandem repeats are categorized as microsatellites (short tandem repeats of DNA motifs 1 to 6 bp long, or STR), minisatellites (tandem repeats of moderate motifs 10-100 bp long), and macrosatellites with motifs longer than 100 bp. In humans, STR makes up to 3% of the total genomic DNA which exceeds the protein coding part of the human genome [2].

Depending on the search algorithm, there are approximately 700,000–1,000,000 STR loci with 2-6 bp long motifs in the human reference genome [3,4]. Di- and tetra-nucleotide STR constitute about 75% of STR, with the remaining loci containing tri-, penta-, and hexa-nucleotide repeats. The overall STR density in the human genome is comparable across chromosomes (mean ± SD=13,613 ± 1,887 bp/Mb), with chromosome 19 showing the highest STR density (20,351 bp/Mb) [5]. Within genes, microsatellite repeats are non-randomly distributed across protein-coding sequences, untranslated regions (UTRs), and introns. In the coding regions of the genes, repeats predominantly have either trimeric or hexameric repeat unit, likely as a result of selection against frameshift mutations [4,6]. STR containing dinucleotide repeat units are much more abundant in the regulatory or UTR regions than in other genomic regions [2].

Initially labeled as nonfunctional (junk) DNA, STR is now considered to have biological functions. Microsatellite repeats are concentrated at the start of human genes, where they are highly conserved near transcription start sites [6,7]. About 19% of human genes contain at least one STR in their upstream regulatory region [6,8]. Recent studies demonstrated that STR in the human genome contributes to variation in gene expression [9]. Statistical analysis of tandem repeat distribution showed a sharp increase around Transcription Start Site (TSS), spanning several kilobases up- and downstream from the TSS [7].

The exact mechanism of expression modulation by STR remains a matter of discussion, and may vary for different STR motifs. For example, poly-A STR (microsatellites with A/T motif), the most frequent microsatellites in the human genome, are common elements of the promoters in the human genome [7]. Poly-A STR are hypothesized to participate in the regulation of gene activity because they disrupt nucleosome binding which could be a molecular mechanism for modulating gene expression [10]. Another example of STR overrepresented in the promoter regions of multiple genes is AC/GT dinucleotide tandem repeat [7]. The AC/GT tandem repeat sequence composed of alternating purine-pyrimidine bases facilitates formation of Z-DNA, and could prevent nucleosome binding leading to chromatin opening. Formation of H-DNA triplex structure composed of CT/AG poly-purine/poly-pyrimidine mirror repeats potentially leads to modified chromatin structure and transcriptional activation.

An intrinsic property of STR is the high rate of mutation. STR mutates by expansion or contraction in the number of repeat units, and is often described as a Variable Number of Tandem Repeats (VNTR). The frequency of STR mutations depends on the length of the repeat unit, the number of the repeats, and the match to the consensus sequence (purity) of the repeat tract [11,12]. The mutation rates of STRs often lie between 10−3-10−6 per cell generation) which is 10 to 105-fold higher than the average mutation rates observed in non-repeated regions of the genome [8,12,13]. Apparently, short repeats consisting of two or three repeat units are the starting point for the microsatellite expansion, either through base substitution, or duplication of an adjacent sequence. Once the critical number of repeat units is formed (10 repeat units for A/T and 5-6 repeats for AC/GT), the locus becomes hyper-variable [14,15]. Based on this mechanism, a true microsatellite can be defined as a repeat containing a minimal number of units required for the production of indel mutations at a frequency greater than the average frequency of indel mutations within the genome. Importantly, repeats in the coding regions are significantly less variable than repeats in introns, intergenic regions, 3’UTR, and non-coding RNA [16].

There are currently two major models that describe the mechanisms by which STR expand or contract: strand-slippage replication and recombination. Strand-slippage replication, also known as slippedstrand mispairing, or DNA slippage, occurs during replication of the tandem repeats. After the newly synthesized DNA strand denatures from the template strand during the synthesis of a tandem repeat region, it may shift its position along the tandem repeat in the process of renaturation ([8], and references therein). Recombination events, including unequal crossing over and gene conversion, can also lead to contraction and expansions of STR sequences. Recombination mechanism is hypothesized to activate in response to double-strand breaks formed during DNA replication. The repair of double-strand breaks by recombination will result either in addition or loss of repeat units. The importance of DNA strand breakage in the mutation of tandem repeat tracts is supported by studies demonstrating involvement of the double-strand break repair pathway in tandem repeat expansion and contraction [17].

While the causal relation with the variable numbers of repeats in microsatellites was convincingly documented for many diseases, importance of STR for drug response remains largely unexplored. Elucidating the effects of STR on expression of the genes involved in drug metabolism, drug transport, and drug-target interaction may help explain variability in efficacy and adverse reactions to pharmacotherapy, thus complementing analysis of SNPs, CNV, and biallelic indels in individual genomes. This review summarizes the role polymorphic STR play in clinical manifestations including their potential importance for pharmacogenetic analysis.

Polymorphic Tandem Repeats Modify Gene Expression

Several pieces of evidence confirm the role of STR in regulation of gene expression. First, microsatellite tandem repeats are overrepresented in the vicinity of transcription start points within the promoter regions of many genes in the human genome [6,7]. Next, polymorphic promoters STR are associated with increased variance in local gene expression and DNA methylation, suggesting functional role for STR [18]. Finally, recent studies revealed contribution of STR alleles to gene expression levels and phenotypes [9]. The functional role of STR has been demonstrated by effects on gene expression, splicing, protein sequence, and association with pathogenic effects [18]. Inherently hypervariable microsatellites modulate gene expression through several possible mechanisms, e.g. direct addition of functional DNA motifs, modification of local DNA or RNA structure, epigenetic modification of the local region, altered spacing or orientation of regulatory molecules, and alteration of nucleosome positioning [9,18].

The mechanisms by which microsatellite repeats affect gene expression are likely to be region-specific rather than site-specific because within the gene, regulatory microsatellite sequences are found in proximal or distal promoter regions, 5’-UTR, and introns.

Polymorphic microsatellites with putative regulatory functions, the corresponding genes and the effect of microsatellite length variability on gene expression are exemplified by fifteen STR in Table 1 which effects on gene expression were comprehensively documented. The corresponding genes encode proteins with different functions including receptors (ESK1, IFNAR1, GRIN2A, TLR2), growth factor (IGF1), cytokine (TNF), enzymes (HMOX1, MMP9, UGT1A1, UGT1A8/9), and transcription factors (FOXA2, FOXP3, STAT6). To find correlation between transcription level and a particular STR allele, several in vitro, in vivo, and ex vivo experimental techniques were used. Using transient transfection of cultured cells, the effect of STR on promoter activity can be estimated using reporter constructs where a specific promoter region drives expression of a reporter gene. Despite the simple experimental design, this method suffers certain limitations, particularly because important regulatory elements might be missing in the interrogated promoter region. Analysis of mRNA level in tissues or Peripheral Blood Mononuclear Cells (PBMC) obtained from individuals with a certain genotype often provides more relevant data on gene regulation by STR allelic variants. The analysis of the gene product (protein level, or enzymatic activity), or clinical phenotype (disease susceptibility, or drug response) is also used to find a correlation with the specific STR allele. Finally, bioinformatics analysis of multiple genomes followed by an association study can provide data on correlation between STR length and expression rate [19]. This last approach may provide a wealth of information for correlative studies though methodical problems still remain to be resolved [9,18].

S. No. Motif Gene ID Gene region Position Chr Reporter
Longer alleles associated with Reference
1 (GT)n, n=12-19 FOXP3 Promoter Exon 0
(-150 – -177)
X Yes Yes n/a Decreased expression in three cell lines (HeLa, COS-7, and Jurkat T) [35,36,38]
2 (GT)n,
GRIN2A Promoter Upstream of TSS (-721 –
16 Yes n/a Yes Decreased expression in cell culture [33,34]
3 (GT)n, n=10-43 HMXO1 Promoter Upstream of TSS (-172) 22 Yes Yes Yes Decreased expression [23,24,27-29,31]
4 (CA)n,
EGFR Intron 1 Upstream of enhacer 2 (+1,788 to +2,318) 7 Yes Yes Yes Decreased expression [40,42,43,77]
5 (TA)n, n=5-8 UGT1A1 Promoter TATA box Upstream of TSS (-23-38) 2 yes n/a yes Decreased expression [87-89]
6 (GT)n,
STAT6 5’-UTR Downstream from TSS (+94) (rs71802646); three more STR upstream of TSS
(-801); (-734);
12 yes n/a n/a Decreased expression in HMC-1, BEAS-2B, and Jurkat cells [90]
7 (CGC)n, n=5 ≥ 200 FMR1 5'-UTR Downstream from TSS (+85) X Yes Yes Yes Decreased expression [69-72]
8 (AC)n, n=14-27; (GT)n,
COL1A2 Promoter Intron 1 AC repeat
(-1457-1374) GT repeat (+1413 -+1480)
7 Yes n/a n/a Expression non-linearly depends on both repeats [39]
9 (CCT)n, n=13-19 FOXA2 Intron 1 (-415) upstream of TSS in alternative exon 2 20 Yes Yes n/a Non-linear (allele with n=14 shows max expression) [46,71]
10 (GT)n, n=15-27 VWF Promoter Upstream of TSS (-2144 -2105) 12 Yes n/a n/a Increased expression [67,91]
11 (CA)n, n=12-28 MMP9 Promoter Upstream of TSS (-90) 20 Yes n/a n/a Increased expression [92-94]
12 (GT)n, n=10-17 IFNG Intron 1 Downstream from TSS (+994 bp) (rs3138557) 12 Yes n/a Yes Increased expression in cultured cells (HepG2, Jurkat); decreased expression ex vivo [95-97]
13 (GT)n, n=12-28 TLR2 Intron 2 (-100 upstream of the start codon) 4 Yes Yes n/a Increased expression in K562 cells; decreased expression ex vivo [45,98]
14 (T)n, n=9-19 UTG1A8 UGT1A9 Promoter Upstream of TSS (-120) 2 Yes n/a n/a Increased expression in Caco 2 cells [99]
15 (GAG)n, n=4-10 GCLC 5' UTR Upstream of the start codon (-10) 6 Yes n/a yes Increased expression [100,101]

Table 1: Polymorphic microsatellite sequences in the regulatory regions of the human genes, and their effect on gene expression.

Heme oxygenase 1 encoded by the HMOX1 gene is an important stress response, anti-inflammatory, and antioxidant protein inducible in response to several types of stress and drug therapy. The GT repeat in HMOX1 gene is one of the most extensively characterized examples of regulatory polymorphic microsatellites, and detailed mechanistic study of HMOX1 gene expression helped to elucidate the role of polymorphic STR in its promoter. HMOX1 gene contains a microsatellite sequence in the promoter region with a variable number of 10-43 GT repeats [20]. The effect of polymorphic STR in the HMOX1 promoter on the gene expression was extensively studied using reporter constructs, gene expression experiments in patients’ samples, and enzymatic activity. Transfection of rat aortic smooth muscle cells with reporter constructs [21], HMOX1 mRNA level quantification in three malignant melanoma cell lines [22], and analysis of HO-1 protein in six urothelial cancer cell lines indicated that shorter GT repeats (n<25) were associated with higher level of basal expression of HO-1, and with higher induction of this protein [23,24]. Inducible promoter activity of 5’-flanking regions in the HMOX1 gene was estimated by transfection of A549 and Hep3B cancer cells containing different number of GT repeats, with reporter constructs [23]. Exposure to H2O2 induced the transcription of the reporter constructs with short (GT16 and GT20) but not long (GT29 or GT38) microsatellites. Similarly, umbilical endothelial cells (HUVEC) with short GT repeats produced more HO-1 upon induction with H2O2 [25]. In contrast to these findings, baseline levels of HMOX1 mRNA were found lower (and protein level higher) in PBMC from healthy subjects [26], while both carriers and non-carriers of L allele (n>32) showed similar HMOX1 mRNA expression. Importantly, upon induction with heat or hemin, non-L carriers manifested 1.9-fold higher increase in mRNA level in response to heat, and after hemin stimulation the median HMOX1 mRNA in non-L carriers was 3.9 fold higher than in L carriers (p=0.0028) [27]. Studies of transcription in cell lines and blood cells collected from patients [27-29], protein synthesis and enzymatic activity [23,28,30], and clinical effects [27,30] demonstrated significant association with the number of GT-repeats in this STR. HO-1 mRNA expression and HO activity were significantly higher in lymphoblastoid cell lines with SS genotype compared with those with LL genotype [28]. Inducible regulation of HMOX1 gene is under control of a complex regulatory mechanism [31], and the phenotypic manifestation of STR polymorphism may therefore be obscured [26]. The level of HO-1 protein determined by flow cytometry in PBMC from healthy individual carriers of S alleles (number of repeats ≤25) and L alleles (number of repeats >25) was significantly different: following LPS stimulation, monocytes from individuals with SS genotype showed a significantly higher HO-1 expression compared to LL homozygous individuals [30]. In a separate study, PBMC isolated from patients with suspected coronary atherosclerosis were treated with hemin. Assessment of HMOX1 mRNA revealed significantly higher hemin-stimulated mRNA expression in SS genotypes compared to SL and LL genotypes (S allele, number of repeats <26, and L allele, number of repeats >26) [29]. Analysis of the microsatellite effect is often complicated by the presence of alternative TSS or splicing. A novel exon 1a was found in the HMOX1 gene placing a (GT) microsatellite in intronic position within the 5’-untranslated region [31]. The quantitative outcome of alternative splicing within the 5’-untranslated region was affected by (GT)n microsatellite polymorphism.

The similar relation between the number of GT repeats in the promoter region and gene expression activity was detected in GRIN2A gene (Table 1) encoding the NR2A subunit of NMDA receptor expressed in neurons [32]. Reporter constructs with GRIN2A promoter coupled with luc gene demonstrated increased luc expression with shortening STR in GRIN2A promoter [33,34]. The promoter activity of the construct with 25-42 GT-repeats was 50-61% lower than that with no GT repeats. The receptor binding assay in postmortem brains indicated reduction of GRIN2A expression in the carriers of longer GT repeats [33].

The FOXP3 gene is located on the X chromosome and mediates functional activity of T regulatory cells (Table 1). An association between FOXP3 microsatellite polymorphism in a region with promoter/enhancer activity, and gene expression has been found in several cell lines (HeLa, COS-7, and Jurkat T) transfected with reporter constructs [35,36]. In addition, an association between the number of GT repeat in FOXP3 gene and several conditions including type 1 diabetes, development of severe acute graft versus host disease in patients transplanted from donors harboring short alleles, and renal allograft survival was detected [35-37]. Type 1 diabetes–associated allele (GT)15 in FOXP3 promoter demonstrated a higher activity in reporter gene experiments with three cell lines supporting a direct effect of STR on FOXP3 expression [35,36]. On the other hand, there was no significant difference in FOXP3 expression level between the groups of asthmatic patients stratified by (GT)n genotype [38]. This latter finding may indicate that this microsatellite polymorphism does not directly regulate the expression of the chromosomal gene.

Transcription driven by COL1A2 promoter (Table 1) is enhanced by the presence of two dinucleotide repeats located in the 5’-flanking region of the gene, and the first intron of the gene, at the distance about 1.4 kB [39]. When tested in the reporter construct-transfected human skin fibroblasts, these repeats were found essential for the transcriptional stimulation of the COL1A2 gene. Importantly, the stimulating effect was due to the presence of both repeats, rather than any one of them. Moreover, the transcriptional activity was modulated by combination of the alleles differed in the number of dinucleotide repeats.

In addition to microsatellites in the promoter region, AC/GT repeats with recognized gene modulating activity were detected in intronic sequences (Table 1). An important example of a functional microsatellite residing outside the promoter was found in the gene coding for Epidermal Growth Factor Receptor (EGFR). The human EGFR gene is located on chromosome 7, and regulated by a single promoter and two enhancer regions. The length of a highly polymorphic CA/TG microsatellite in the intron 1 of EGFR correlates with expression of EGFR both in vitro and in vivo [40,41]. The longer CA21 allele exhibits 80% lower transcription than CA16 allele [42]. Cells with shorter CA repeats manifested higher transcription level of EGFR mRNA, higher level of EGFR protein, and were more sensitive to tyrosine kinase inhibitor erlotinib in 12 head and neck cancer cell lines [43]. Importantly, the length of CA repeats modulates EGFR transcription in NSCLC patients, and affects protein expression and response to therapy [43,44].

Finally, GT microsatellite repeat in the second intron of TLR2 gene was demonstrated to affect promoter activity. Expression of TLR2 protein in PBMC cells in the carriers of short alleles (n≤16) was higher than in non-carriers of short alleles [45]. CCT trinucleotide repeat in the alternative promoter (212 bp upstream of TSS.1) modulates expression of FoxA2 transcription factor, with the highest expression from the allele containing 14 repeats [46]. Transfection experiments with polymorphic repeats in the reporter constructs confirmed the data on the transcriptional activity of this allele in HepG2 cells.

In many cases, the shorter GT repeats reveal increased promoter activity, as demonstrated by promoter assays. Table 1 summarizes examples of microsatellite polymorphisms in the regulatory regions of the human genes, and available data on promoter activity. For instance, reporter constructs with shorter AC/GT repeats demonstrated increased activity with COL2A1, FOXP3, GRIN2A, HMOX1, MMP9, STAT6 promoters (Table 1). This correlation holds true also for GT repeats in intronic sequences of EGFR, TLR2, as well as trinucleotide repeats of FOXA2 (intron 1), and FMR1 (5’-UTR) genes. On the other hand, such correlation was not found for GT repeat-containing promoters in VWF, IFNG genes, mononucleotide (T)n STR in UGT1A8/9, and trinucleotide STR in the 5’-UTR of GCLC gene.

STR and Clinical Phenotype

Because of high variability, microsatellite loci are often used in forensics, population genetics, and genetic genealogy [47]. Significant associations were demonstrated between microsatellite variants and genetic diseases including some neurological conditions such as Huntington’s disease, Parkinson’s disease, autism, amyotrophic lateral sclerosis, and certain types of ataxia [48,49]. Such relationship supports the premise about phenotypic manifestation of microsatellite polymorphisms. Multiple studies evidenced the clinical impact of GTn microsatellite polymorphism in pulmonary disease, cardiovascular disease, renal transplantation, obstetrics, neurological disease, and hematological/serological disorders [21,22,50-59]. Clinical manifestations of microsatellite polymorphisms are exemplified by twenty STR with clearly defined phenotypes shown in Table 2.

S. No. Gene Protein function Motif Associated condition Potential drug interactions Ref
1 ESR1 Receptor (GT)n,
n=11-18 (TA)n,
Breast cancer; lone atrial fibrillation; postpartum depression; harm avoidance score; osteoporosis Anastrozole, atorvastatin, cisplatin, conjugated estrogens, dexamethasone, exemestane, glibenclamide, leflunomide, letrozole, medroxyprogesterone, methamphetamine, raloxifene, tamoxifen [102-109]
2 IFNAR1 Receptor (GT)n,
Depressive symptoms during IFG-alpha therapy for hepatitis C; response to interferon therapy of hepatitis C Glatiramer acetate [107-109]
3 IGF1 Insulin-like growth factor (CA)n,
Endometrial cancer; age at natural menopause; disease onset in HNPCC; colorectal cancer No known drug association [110-114]
4 TIGR/MYOC Cytoskeletal
n=13-14  and n=15-16
Juvenile-onset primary open-angle glaucoma; glaucoma No known drug association [115,116]
5 TNF Cytokine (GT)n,
Myocardial infarction; systemic lupus erythematosus; dengue; gasric and hepatocellular cancer Adalimumab, atorvastatin, carbamazepine, clozapine, cyclosporine, etanercept, ethambutol, infliximab, isoniazid, lansoprazole, Mycophenolate mofetil, omeprazole, pyrazinamide, rabeprazole, rifampin, rituximab, sirolimus, sorafenib, stavudine, tumor necrosis factor alpha (TNF-alpha) inhibitors [117-121]
6 COL1A2 The fibrillar collagen (AC)n,
n=14-27; (GT)n,
Bone mineral density; systemic sclerosis Daunorubicin, doxorubicin [39,74,122-124]
7 EGFR Target (CA)n,
Thymoma aggressiveness; reponse to TKI therapy in NSCLC patients; response to 5-FU therapy Afatinib, alkylating agents, carboplatin, cetuximab, docetaxel, erlotinib, fluorouracil, gefitinib, geldanamycin, gemcitabine, irinotecan, leucovorin, paclitaxel, panitumumab, tegafur, topoisomerase I inhibitors [40,42,78,125-128]
8 FMR1 Cognitive development (CGC)n,
n=5 ≥ 200
Premature ovarian failure; primary ovarian insufficiency and tremor-ataxia syndrome No drug interaction [68,71,72,129,130]
9 FOXA2 Forkhead box A2 (CCT)n, n=13-19 Type 2 diabetes; CYP3A4 expression Drugs metabolized by CYP3A4 [46,131]
10 FOXP3 Transcription factor (GT)n,
Type 1 diabetes; survival of renal transplant patients; graft versus host disease Tacrolimus [35-38]
11 GCLC Target (GAG)n,
Schizophrenia; Type 1 diabetes Sulphamethoxazole  [100,101,132-134]
12 GRIN2A NMDA receptor (GT)n,
Schizzophrenia, alcoholism; hippocampal and amygdala volumes; concussion recovery Methylphenidate [33,34,60-64]
13 HMXO1 Stress response anti-inflammatory, anti-oxidant, anti-proliferative (GT)n,
Melanoma; inhibitory Ab to F8 in severe hemophilia A; Type 2 diabetes mellitus; coronary atherosclerosis; emphysema; severe malaria; pulmonary disease, cardivascular disease, renal transplantation, obstetrics, neurological disease, hematological/serological disorders Aspirin, statins, mimetic pepetides, probucol, losartan, paclitaxel, rapamycin, cyclosporin, curcumin, resveratrol [22-29,53,56,82,135]
14 IFNG Interferon-gamma (GT)n,
Generalized vitiligo; malaria; response to immunosuppressive treatment; sporadic breast cancer Infliximab, adalimumab, etanercept [96,97,109,136]
15 MMP9 Matrix metalloproteinase (CA)n,
Diabetic end-stage renal disease; diabetic nephropathy; bladder cancer invasiveness; multiple sclerosis; age-related macular degeneration; carotid atherosclerosis Hydralazine, nifedipine, methyldopa [92-94,137]
16 STAT6 Transcription factor (GT)n,
Bronchial asthma, atopic dermatitis, food-related anaphylaxis astma; eosinophil cell count; No known drug interactions [90,138-140]
17 TLR2 Receptor (GT)n,
Colorectal cancer; leprosy; TB; acute pancreatitis; spontaneous bacterial peritonitis; rheumatoid arthritis TNF alpha inhibitors [45,141-146]
18 UGT1A1 DME (TA)n,
Gilbert's syndrome, Hyperbilirubinemia transient familial neonatal, Crigler-Najjar syndrome Acetaminophen, antivirals for treatment of HIV infections, atazanavir, belinostat, bevacizumab, bilirubin, cisplatin, deferasirox, dolutegravir, fluorouracil, gepirone hydrochloride, irinotecan, leucovorin, nilotinib, olanzapine, oxaliplatin, pazopanib, peginterferon alfa-2b, raloxifene, raltegravir, ribavirin, ritonavir, SN-38, sorafenib [87,147,148]
19 UTG1A8 UGT1A9 DME (T)n,
Pericholangitis UGT1A8: ABT-751, allopurinol, anthracyclines and related substances, atazanavir, cyclosporine, febuxostat, irinotecan, lamivudine, mycophenolate mofetil, mycophenolic acid, sirolimus, tacrolimus, tipifarnib, valproic acid, zidovudine                                                         UGT1A9: acetaminophen, allopurinol, anthracyclines and related substances, aspirin, atazanavir, cisplatin, entacapone, febuxostat, irinotecan, labetalol, lamivudine, microsatellite, mycophenolate mofetil, mycophenolic acid, oxcarbazepine, propofol, raltegravir, ritonavir, simvastatin, SN-38, sorafenib, sulfinpyrazone, tipifarnib, tolcapone, valproic acid, zidovudine [99]
20 VWF An antihemophilic factor carrier (GT)n,
Circulating level of VWF; Cortisol-dependent increase of VWF No known drug interactions [65,91]

Table 2: Association of polymorphic microsatellite sequences in the regulatory regions of the human genes with clinical phenotypes, and potential effect on drug response.

The functional role of GT repeat polymorphism in HMOX1 promoter in human disease was overviewed by Exner et al. [50]. Daenen et al. performed a systematic review and a meta-analysis on the association of GT microsatellite polymorphism in the HMOX1 promoter and cardiovascular disease with the cutoff of the short allele set to 25-27 repeats [59]. The results from 41 selected studies revealed that the proportion of the short SS genotype was lower in the Cardiovascular patient (CVD) group compared with non-CVD group (13.3% vs. 18.9%, P<0.0001). The odds ratio in LL vs SS genotype was 1.769 (95% Confidence Interval [CI], 1594-1.963). Another systematic review and meta-analysis of 5 studies of the association between the microsatellite polymorphism in the HMOX1 promoter and type 2 diabetes contained data on 1751 cases and 2902 controls. The odds ratio for type 2 diabetes in persons with LL genotype was significantly increased compared with the SS genotype (OR=1.25, 95% CI: 1.04, 1.50; P=0.02). Statistical analysis showed that carriers of longer GT repeats (≥25-27 repeats) in the HMOX1 promoter had higher risk of type 2 diabetes [53]. Analysis of GT repeat distribution in 942 children with sickle cell disease demonstrated that children with two short alleles (≤25 repeats) had lower rate of hospitalization for acute chest syndrome (incidence rate ratio 0.28, 95% CI, 0.10-0.81) [54]. The GT repeat polymorphism in the HMOX1 promoter was associated with severe disease and death in Gambian children with malaria [27].

Polymorphism of microsatellite sequences was also associated with psychiatric disorders, and concussion recovery rate. Clinical studies indicated that the longer alleles of GRIN2A were overrepresented in schizophrenics, and the score of symptom severity correlated with repeat length. This study was later expanded to 672 schizophrenics vs. 686 controls, and confirmed the significant association between GT-repeat polymorphism and disease [60]. Similarly, association of the GT-repeat polymorphism with schizophrenia was demonstrated in 122 Chinese sib-pair families [61]. Clinical studies demonstrated association of GT-polymorphism in GRIN2A promoter and hippocampal and amygdala volumes [62], alcoholism [63], and D-serine level in schizophrenics [34]. In a study of 87 athletes suffering with a concussion, homozygous carriers of the longer alleles were six times more likely to experience longer recovery, compared with homozygous carriers of short (<25 repeats) alleles [32,64].

The GT repeat element is a part of VWF promoter coding for the von Willebrand factor, an essential plasma glycoprotein which mediates platelet adhesion and aggregation at the sites of vascular injury. Plasma concentration of VWF protein is under control of several factors including genetic polymorphism, and the dinucleotide tandem repeat (GT)n was hypothesized to influence the VWF level. An association of variable number of tandem GT repeats with the level of VWF was demonstrated in sixty-nine Cushing’s syndrome patients [65]. Short repeats (n=15-19) were found more frequently in a group with high VWF induced by glucocorticoid excess, while long repeats (n=20-24) were predominant in a group with normal VWF. Risk of cortisolinduced increase of VWF was three times higher for allele’s with15-19 GT repeats (GT15-19) than for 20-24 repeats (GT20-24), and 13-fold higher for non-carriers of (GT20-24) alleles compared to non-carriers of (GT15-19) alleles. In a larger group of samples (1115 European male and female healthy controls), VWF:Ag values were lower in homozygous carriers of short allele (<20 repeats) when compared to heterozygous carriers of short and long alleles, or homozygous carriers of long (≥20 repeats) allele. It should be mentioned that in a separate study in a group of 394 healthy individuals, the number of GT repeats did not correlate with VWF level [66]. These results were in contrast to greater VWF promoter activity under shear stress conditions when long GT repeats were present [67]. In this study, the bovine aortic endothelial cells (BAEC) were transfected with reporter construct prepared with different combinations of SNPs and GT repeats in the VWF promoter governing expression of a reporter gene. The upstream SNP haplotype did not affect promoter activation after shear stress. Rather, the promoters were more active when contained 23 repeats vs. 17 repeats. Importantly, the absence of GT repeats did not change the basal VWF promoter activity, but they were essential for the shear stress induced promoter activation [67]. One possible explanation for this discrepancy is different mechanisms for VWF upregulation after shear stress and cortisol induction [65].

Increased level of FMR1 transcription was observed among premutation (61-200 CGG repeats) carriers of expanded CGG repeat in the 5’-untranslated region of the FMR1 gene. The premutation form is highly unstable when transmitted from parent to child. Increase in the number of CGG repeats leads to the increase of FMR1 transcript [68,69]. Interestingly, FMR protein levels decrease, likely due to poor initiation of translation at the downstream initiation codon [70,71]. A large study in 238 individuals confirmed a significant linear relationship between transcript level and CGG repeat size within the premutation size alleles [72]. Expansion of the repeat number beyond 200 results in hypermethylation and silencing of the gene which is phenotypically manifested as Fragile X Syndrome (FXS) [73].

An association between the microsatellite GT polymorphism in the intron 1 of COL1A2 gene and bone mineral density was found in Chinese population after analysis of 388 nuclear families with a total of 1220 individuals [74]. In addition, an association was found between the number of GT repeats in FOXP3 gene and several conditions including type 1 diabetes, development of severe acute graft versus host disease in patients transplanted from donors harboring short alleles, and renal allograft survival [35-37]. Importantly, no difference in the mean expression of FOXP3 mRNA was detected between asthmatic (n=49) and healthy (n=7) subjects grouped by GT repeat genotypes [38].

Pharmacogenomics of STR

Phenotypic manifestation of variability in the length of STR remains poorly understood despite the fact that their association with neuromuscular and neurodegenerative diseases, several complex disorders, and several types of cancer has been convincingly demonstrated [6]. While genome-wide association studies of complex phenotypes are complicated, pharmacogenomic analysis of drug therapy provides a simpler model for deducing macro scale characteristics of a human organism based on his/her genome. Drug response is often determined by a variant of one or a few genes, for example those involved in drug metabolism, drug transport, or drugtarget interaction [75-129]. Pharmacogenomic datasets include big populations of patients who have been medically characterized, have been treated with a standardized chemical stimulus (drug therapy), have records on the medical outcome, and for whom the genome sequence is often available.

Information about the effect of genetic variants on drug response is summarized in the PharmGKB database ( This database contains manually curated information about genetic variant-drug pairs based on individual PubMed publications. Because PharmGKB database collects information about all kinds of genetic variants (mostly SNPs), it is impossible at this time to focus this analysis on microsatellite length variations. Table 2 lists clinically used medications that may give observable clinical effects of STR polymorphism. The examples shown in the Table 2 substantiate future studies of relationship between microsatellite variants and drug response, and can be used as a starting point for a search of STR important for pharmacogenetic analysis.

Variable UGT1A1 expression due to microsatellite polymorphism in the TATA box of its promoter is extensively studied, and UGT1A1*28 allele is an important pharmacogenetic polymorphism [76-148]. The strong effect of additional TA repeats upstream the TSS is likely explained by the deviation of the TATA box from the canonical sequence. Expression analysis of (TA)7 sequence in the UGT1A1*28 allele provides mechanistic insights into the effect of polymorphic STR on transcription.

Reduced expression of EGFR mRNA associated with elongation of CA/TG STR located in intron 1 upstream of enhancer 2 of the EGFR gene is another well-documented example of STR effect on gene expression. Longer alleles for the CA SSR I repeat were associated with significantly lower EGFR expression, and predicted poor outcome of chemotherapy [77]. Evaluation of CA repeats in 62 EGFR somatic mutation-positive patients with advanced NSCLC treated with erlotinib demonstrated a significantly higher median progression free survival (HR=0.39, 0.22- 0.70; p=0.002) and overall survival (HR = 0.43, 0.23-0.78); p=0.006) in patients harboring short CA repeat alleles (n≤16 repeats in any allele) compared to those with long alleles (n>16 in both alleles) [78]. The length of CA repeats was also a significant predictor for clinical outcome in 84 advanced NSCLC patients treated with gefitinib. The response rate of short CA repeat genotype was significantly higher (88.5% vs. 48.3%, p<0.001), and a combination of shorter CA repeat genotype with rs2293347GG had pronounced clinical benefit. More than 90% of patients with rs2293347GG and short CA repeat genotypes respond to gefitinib therapy, vs. 26% response in patients who carried longer CA repeats along with at least one rs2293347A allele [79]. Finally, the EGFR intron 1 CA repeat polymorphism was associated with survival of 38 advanced gastric cancer patients treated with cetuximab (a monoclonal antibody targeting EGFR). Among 38 patients, twenty-one had short repeats (sum of both alleles ≤37), and 17 patients carried longer alleles (sum ≥38). The first category had longer progression-free survival (HR-0.42, 0.19-0.96; p=0.040) and overall survival (HR=0.40, 0.16-0.99; p=0.048) compared to the second category of patients. EGFR expression in the tumor tissues was higher in patients with short CA repeats [80].

In patients with colorectal cancer, 84% of patients with a sum of alleles of <35 developed an acneiform rash, compared with 33% of those with the sum of alleles of ≥35 (p=0.04) [43]. Association between the length of the CA SSR I and the response of locally advanced rectal cancer following adjuvant or neoadjuvant chemo radiation therapy was related to the additive effect of the EGFR R497K polymorphism and the length of CA SSR I. In addition, carriers of <20 CA repeats were more likely to show disease progression than were patients with ≥20 repeats (P< 0.05) when treated with 5-FU/oxaliplatin chemotherapy. These results suggest that the short CA SSR I alleles and, consequently, higher EGFR expression may predict for worse outcome after conventional therapy [81].

N-acetylcystein (NAC) is used to improve the lung function of patients with COPD, and to reduce the risk of re-hospitalization. To explore a relationship between the effectiveness of oral NAC and the HMOX1 promoter polymorphism in COPD patients, a total of 386 patients were genotyped, and were allocated to standard therapy plus NAC [57]. The non-carriers of L allele (>32 GT repeats) manifested improvements in forced expiratory volume in 1 second (FEV1) from 1.44 ± 0.37 to 1.58 ± 0.38 (P=0.04), and FEV1% predicted (from 56.6 ± 19.2 to 59.7 ± 17.2, P=0.03). The number of yearly COPD exacerbations in non-carriers of L allele was lower when compared with carriers of L allele (1.5 ± 0.66 vs. 2.1 ± 0.53, P<0.01). The improvement of the outcome of 6-min walking distance test was higher in non-carriers than in the carriers of L allele [57].

Variation in HMOX1 expression due to genetic polymorphism is hypothesized to affect drug response, and therefore is a pharmacogenetic factor. Considering significant interest in potential modulators of HO-1 activity [82-86], detailed analysis of this genetic polymorphism is warranted. It is quite conceivable that genetic variants with different levels of HO-1 expression may change the therapeutic effect of several drug inducers of HO-1 including aspirin, statins, mimetic peptides, probucol, losartan, paclitaxel, rapamycin, cyclosporine [82].


Variations in the STR length play an important role in modulating gene expression, and STR are likely to be general regulatory elements which attenuate expression of multiple genes. Moreover, regulatory STR manifests significant polymorphism because of their high intrinsic mutation rate. Because many genes with regulatory STR manifest variability in the expression level, it is now possible to assess effects of STR on gene expression by mining the existing databases. Several technical problems still remain to be addressed, such as relatively short reads generated by next-generation sequencing technologies, “stuttering” of DNA polymerase on STR sequences, insufficient accuracy of alignment through the monotonous repeats, and duplicating and compressing the sequencing data. Despite these hurdles, correlative analysis of gene expression versus STR allelic variants is quite possible. The STR catalogs have been generated using various approaches, and became useful tools to elucidate the role of STR in genome variability and evolution. Several important mechanistic questions remained unanswered: first, does the length of microsatellites have a general effect on gene expression, or is it relevant only for certain types of promoters? Does this effect depend on the microsatellite orientation (e.g., CA repeat versus TG repeat)? What is the role of the adjacent sequences surrounding the microsatellite? Why does length variation in some STR enhance gene expression while in the others, it has an opposite effect? Finally, what protein factors facilitate the regulatory functions of STR?

Several facts substantiate further analysis of STR as potentially important candidates for pharmacogenetic analysis. First, these genetic elements are widely spread across the human genome where they are located at the beginning of the genes. Second, a significant proportion of microsatellites manifest genetic polymorphism, mostly the variable number of tandem repeats. The regulatory functions of repeats at the level of transcription, translation, biological activity, and clinical manifestation were convincingly demonstrated for multiple genes. Finally, many studies demonstrated an association between the number of repeats and the clinical effect, e.g. drug response. The number of the human genes with CA/TG repeats (n ≥ 25) within 1 kB region upstream from transcription start site exceeds 700, including ABC and SLC transporters, drug metabolizing enzymes, and drug targets. Elucidating the effects of STR on gene expression may help explain variability in drug response, something that is not achieved by focusing exclusively on SNPs or CNV. This will be the next step toward deducing the macro characteristics of an organism from its genome, a problem brilliantly solved by myriads of fertilized egg cells.


Citation: Krynetskiy E (2017) Beyond SNPs and CNV: Pharmacogenomics of Polymorphic Tandem Repeats. J Pharmacogenomics Pharmacoproteomics 8:170.

Copyright: © 2017 Krynetskiy E. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.