A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES
dc.contributor.author | Karabayev, Daniyar | |
dc.contributor.author | Molkenov, Askhat | |
dc.contributor.author | Yerulanuly, Kaiyrgali | |
dc.contributor.author | Daniyarov, Asset | |
dc.contributor.author | Sharip, Aigul | |
dc.contributor.author | Seisenova, Ainur | |
dc.contributor.author | Zhumadilov, Zhaxybay | |
dc.contributor.author | Kairov, Ulykbek | |
dc.date.accessioned | 2020-11-25T08:41:40Z | |
dc.date.available | 2020-11-25T08:41:40Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Introduction: High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant Call Format (VCF) is a standard format containing genomic information and variants of sequenced samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application with a user-friendly GUI developed to simplify genomic data mining from VCF files. Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in chunks. Simple and intuitive GUI was built using Tkinter library. Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file, setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file. re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following filtering and extraction options: header extraction, keyword search, sample extraction, and genotype format conversion. Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of sequencing data is one of the important steps performed by researchers during analysis and metaanalysis of genotype/phenotype associations. We have developed and introduced an easy-to-use bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with a simple graphical user interface that makes it easily available for clinicians and researchers without any computational skills. The software publicly available on the GitHub repository (https://github.com/ LabBandSB/re-Searcher) | en_US |
dc.identifier.uri | http://nur.nu.edu.kz/handle/123456789/5129 | |
dc.language.iso | en | en_US |
dc.publisher | International conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astana | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.subject | Research Subject Categories::MEDICINE | en_US |
dc.subject | VCF | en_US |
dc.subject | Variant Call Format | en_US |
dc.subject | Research Subject Categories::MEDICINE | en_US |
dc.title | A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES | en_US |
dc.type | Abstract | en_US |
workflow.import.source | science |