Abstracts
Permanent URI for this collection
Browse
Browsing Abstracts by Subject "bioinformatics"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access IDENTIFICATION OF KAZAKH SPECIFIC GENOMIC VARIANTS USING COMPARATIVE GENOMICS ANALYSIS(International conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astana, 2020) Molkenov, A.; Daniyarov, A.; Sharip, A.; Seisenova, A.; Karabayev, D.; Kairov, U.Introduction: The modern development of high-performance genomic technologies opens up new possibilities for studying the human genome. Large-scale genomic research generates huge amounts of data, the active development of bioinformatics with the availability of modern methods and approaches of analysis makes it possible to create detailed databases and comprehensively study genomic data. One of contemporary task is to study and identify specific genomic variants of population by detailed analysis of complete genome and complete exome data comparison with open large-scale population datasets. Materials and methods: Materials of the study are 14 complete genomes and 125 complete exomes of Kazakhstani individuals. Our dataset was replenished with data from large whole genome population datasets (SGDP, PRJEB26349, HGDP and 1000 Genomes) for comparative population genomics and to search and identify specific genomic variants. The data in the raw format was mapped and aligned on a single reference genome hg19, then genomic variants were searched and an individual map of the found variants was formed for each dataset in the VCF format. For replenished datasets formed a general map of all variants, which were then excluded from the total number variants found for of Kazakh sampling to search for specific genomic variants. Then the filtered variants were annotated and interpreted. Results: For Kazakр whole exomes were found 9 heterozygous or mutant variants unique among formed genomic databases. 7 variants located on the intron region, 1on the upstream and the last variant frameshift deletion on exonic region. For the Kazakh whole genomes were found 4732heterozygous or mutant variants, 517 variants presented among all Kazakh samples and 144 variants were completely mutant. Only 8 SNVs are located at exonic region: 4 synonymous SNV, 3 nonsynonymous SNV, and 1frameshift deletion. Conclusion: We have discovered unique several genomic variants specific for now to the kazakh individuals. These results can serve as a basis for the creation of a Kazakh reference genome, subsequent research and comparative analysis of Kazakh individuals with various populations of the world. Grant references: AP05135430; MES RK.Item Open Access LAUNCH OF Q-SYMPHONY BIOINFORMATICS COMPUTING SYSTEM: A HIGH-PERFORMANCE CLUSTER FOR ANALYSIS OF LARGE-SCALE GENOMIC DATASETS(International conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astana, 2020) Molkenov, A.; Daniyarov, A.; Sharip, A.; Seisenova, A.; Karabayev, D.; Kairov, U.Introduction: One whole human genome, provided by next generation sequencing platforms, in raw format takes 20 to 50 GB. In the course of bioinformatics analysis and data analysis, the data volume increases to 300-500 GB per genome. with an increase in the number of samples, the occupied volume increases. Such a large amount of data required for the analysis of whole genomes demands powerful computing power in the form of servers and data warehouses combined into clusters. We at Laboratory of Bioinformatics and Systems Biology have developed and launched Q-Symphony bioinformatics computing system called (“Qazaq Symphony of Bioinformatics”) for bioinformatics analyses of solving large scale genomic datasets. Materials and methods: The Q-Symphony bioinformatics computing system consists 12high-performance HPE servers: 1control node, 8 compute nodes, 1fat-memory compute node, and 2storage nodes. The system runs on Red Hat Enterprise Linux. The management node controls access to user profiles, data warehouse and Moab Workload Manager. The total number of processing cores is 172, the total amount of RAM is 3072GB, and the total storage capacity is 198 TB, a peak performance of the system of 7.3 TFlops. All nodes use high-speed Infiniband network connections, which allow the data exchange between nodes at 100 Gbps speed. The computational capabilities of the Q-symphony system allow us to evenly distribute resources for each task performed, monitor the load on processor and memory resources in real time, and queue and execute sequentially large lists of tasks. Results: Benchmark measurements performed on Q-symphony system showed an increase of subtasks execution from 15 to 54 times compared to standard solutions built on similar computational processors. Conclusion: The presence of Q-Symphony, well-established and proven bioinformatics methods will make it possible to successfully analyze large-scale human genomic data and determine structural genomic variants and carry out complex comparative and population analysis.