Genome Mining of Rice (Oryza sativa subsp. indica) for Detection and Characterization of Long Palindromic Sequences

Elmira Katanchi Kheiavi, Asadollah Ahmadikhah and Ali Mohammadian Mosammam

Because the rice genome has been sequenced entirely, search to find specific features at genome-wide scale is of high importance for studying genome evolution and subsequent applications. Palindromic sequences are important DNA motifs involved in the regulation of different cellular processes and are a potential source of genetic instability. A genome mining approach was applied to detect and characterize the long palindromic sequences in the rice genome. All palindromes, defined as identical inverted repeats with spacer DNA, could be analyzed and sorted according to their frequency, size, GC content, compact index etc. The results showed that the overall palindrome frequency is high in rice genome (nearly 51000 palindromes), that totally cover 41.4% of nuclear genome of rice, with highest and lowest number of palindromes, respectively belongs to chromosome 1 and 12. Palindrome number could well explain the rice chromosome expansion (R2>92%). Average GC content of the palindromic sequences is 42.1%, indicating AT-richness and hence, the low-complexity of palindromic sequences. The results also showed different compact indices of palindromes in different chromosomes (43.2 per cM in chromosome 8 and 34.5 per cM in chromosome 3, as highest and lowest, respectively). Co-location analysis showed that more than 20% of rice genes overlapped with palindromic regions, mainly concentrating on chromosomal arms. Based on the results of this research it can be concluded that the rice genome is rich in long palindromic sequences that triggered most variation during evolution. Generally, both sections of palindromic sequences including stems and loops are AT-rich, indicating that these regions locate in the low-complexity segments of the rice chromosomes.