Learning the pattern-based CRF for prediction of a protein local structure

dc.contributor.authorMukanov Zhalgas
dc.contributor.authorTakhanov Rustem
dc.date.accessioned2025-08-27T04:57:01Z
dc.date.available2025-08-27T04:57:01Z
dc.date.issued2022-09-05
dc.description.abstractPrediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important problems in computational biology and is considered a rich source of challenging problem formulations for machine learning. In this field, methods of supervised learning coexist alongside statistical physics and information theory. According to the classical results of Anfinsen, the conformational structure of a protein is fully determined by its primary structure—i.e., its amino acid sequence—and the energy landscape theory states that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominant approaches to protein structure prediction. The first is based on minimizing physics-based free energies with certain unknown parameters, while the second is a knowledge-based approach that does not necessarily rely on the notion of free energy but aims to achieve high prediction accuracy [14]. In comparison with these two approaches, there is a lack of intermediate methods where the goal is to find knowledge-based parameterizations of free energy that approximate real free energy for certain protein families, while maintaining predictive accuracy comparable to purely knowledge-based models.According to M. Gromov, if energy landscape theory holds, then “probably, free energy can be encoded with a reasonable accuracy by something like 10⁴–10⁶ bits of information,” and the main mathematical difficulty lies in the absence of “general mathematical parameter-fitting methods which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies” [10].In this paper, we introduce a probabilistic model based on a specific parameterization of free energy, which we expect to be useful both for predicting protein dihedral angles and for investigating the structure of the energy landscape. This model is founded on the idea that free energy is largely determined by pairwise interactions between amino acids that are close to each other on the protein sequence. Although this approach may not fully capture the complexity of general proteins, we expect it to approximate the energy landscape of all-alpha proteins with reasonable accuracy.en
dc.identifier.citationMukanov Zhalgas; Takhanov Rustem. (2022). Learning the pattern-based CRF for prediction of a protein local structure. Informatica. https://doi.org/10.31449/inf.v46i6.3787en
dc.identifier.doi10.31449/inf.v46i6.3787
dc.identifier.urihttps://doi.org/10.31449/inf.v46i6.3787
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/10466
dc.language.isoen
dc.publisherSlovenian Association Informatika
dc.source(2022)en
dc.subjectpattern-based CRFs, sequence labeling, protein conformation prediction, energy landscape, structural SVMen
dc.titleLearning the pattern-based CRF for prediction of a protein local structureen
dc.typearticleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10.31449_inf.v46i6.3787.pdf
Size:
330.44 KB
Format:
Adobe Portable Document Format

Collections