A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES

dc.contributor.authorKarabayev, Daniyar
dc.contributor.authorMolkenov, Askhat
dc.contributor.authorYerulanuly, Kaiyrgali
dc.contributor.authorDaniyarov, Asset
dc.contributor.authorSharip, Aigul
dc.contributor.authorSeisenova, Ainur
dc.contributor.authorZhumadilov, Zhaxybay
dc.contributor.authorKairov, Ulykbek
dc.date.accessioned2020-11-25T08:41:40Z
dc.date.available2020-11-25T08:41:40Z
dc.date.issued2020
dc.description.abstractIntroduction: High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant Call Format (VCF) is a standard format containing genomic information and variants of sequenced samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application with a user-friendly GUI developed to simplify genomic data mining from VCF files. Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in chunks. Simple and intuitive GUI was built using Tkinter library. Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file, setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file. re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following filtering and extraction options: header extraction, keyword search, sample extraction, and genotype format conversion. Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of sequencing data is one of the important steps performed by researchers during analysis and metaanalysis of genotype/phenotype associations. We have developed and introduced an easy-to-use bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with a simple graphical user interface that makes it easily available for clinicians and researchers without any computational skills. The software publicly available on the GitHub repository (https://github.com/ LabBandSB/re-Searcher)en_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/5129
dc.language.isoenen_US
dc.publisherInternational conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astanaen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectResearch Subject Categories::MEDICINEen_US
dc.subjectVCFen_US
dc.subjectVariant Call Formaten_US
dc.subjectResearch Subject Categories::MEDICINEen_US
dc.titleA USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILESen_US
dc.typeAbstracten_US
workflow.import.sourcescience

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abstract 34.pdf
Size:
215.29 KB
Format:
Adobe Portable Document Format
Description:
Abstract
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.28 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections