Avinash Yaganapu, Sai Phani Parsa, and Mingon Kang (all Computer Science) published a paper, "Prediction of bacterial Cytochrome P450-compound interactions based on positive-unlabeled deep learning," in Bioinformatics.
Many biological interaction datasets (e.g., protein-compound interactions) suffer from a fundamental problem: we often only observe positive examples. Reliable negative samples are rarely available, which makes it difficult to train conventional machine learning models. In our new work, we address this challenge by developing BIN-PU, a novel positive–unlabeled learning framework for predicting bacterial protein–compound interactions. Instead of requiring negative samples, our approach generates reliable pseudo labels and allows deep learning models to learn effectively from positive-only datasets. Using bacterial cytochrome P450 datasets, the framework shows substantial improvements over existing approaches and strong generalization across datasets.
More broadly, this work highlights how AI methods can unlock biological insights even from incomplete datasets, which are common in many areas of computational biology.