349: Predicting Protein Gel Hardness Using Machine Learning and Deep Learning Models

Monday, July 14, 2025 10:00 AM to Wednesday, July 16, 2025 3:00 PM · 2 days 5 hr. (America/Chicago)

Exhibit Hall A - Posters

Expo OnlyTotal Access Registration

Information

Introduction

Protein gelation properties play a crucial role in the development of protein rich products and plant-based meat alternatives. Accurate prediction of protein gelation properties could significantly shorten the time needed for ingredient development and selection. In collaboration with AI Bobby SAS (a pioneer in the use of AI models to optimize protein ingredient development), we present a generative AI based approach comparing different machine learning algorithms with gradient tree-based algorithms for the prediction of gel hardness (GH).

Methods

A preliminary dataset was generated consisting of over 1000 data points (including data on GH, protein type, protein concentration, pH, salt and its ionic strength, pre-treatments and gelation conditions, etc.) that were extracted manually from published literature. Protein gels were categorized as soft, firm and rigid, based on the GH. The data was then subjected to predictive modeling using various machine learning algorithms such as Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), LightGBM, Decision Tree and Random Forest in which classification was performed, confusion matrices were evaluated, and feature importance for each model was obtained.

Results

The traditional Machine learning models showed lower accuracy (F1 Scores of 0.942 and 0.945, respectively) compared to the tree-based models that showed an F1 score of 0.995 and 0.991, respectively. In addition, a Voting Classifier model was trained, which combines key features of the higher performing models. The voting classifier model showed an F1 score of 0.955 and was selected as the optimal choice due to better strength, higher accuracy, robust performance and excellent scalability. The classification accuracy of all four models were compared. The XGBoost model showed 99.5% accuracy for soft gel, 60.9% for firm gel, and 100% for rigid gel, whereas Voting Classifier showed 98.9% accuracy for soft gel, 65.0% for firm gel and 100.0% for Rigid gel. The voting classifier model performed the best in predicting the gelation properties of proteins.

Significance

This project lays the foundation and provides a proof of concept to develop predictive models for various protein functionalities. The predicting powder will help design and develop high-quality protein ingredients in a much more efficient way.

Authors: Meryem El Alaoui Hassani, Harrison Ndiba, Ali Raza, Mozhgan Esmaeelian, Rabiul Alam Roni, David A. Hecht, Dominik Grabinski, Jing Zhao

Short Description

The project developed and compared various deep learning and machine learning models for the prediction of protein gel strength, as a proof of concept for the prediction of various protein functionalities. The voting classifier model performed the best in predicting the protein gel strength and showed promising results.

Track

Protein