Estimating Phenylalanine of Commercial Foods : A Comparison Between a Mathematical Approach and a Machine Learning Approach

Amruthavarshini Talikoti, Purdue University

Abstract

Phenylketonuria (PKU) is an inherited metabolic disorder affecting 1 in every 10,000 to 15,000 newborns in the United States every year. Caused by a genetic mutation, PKU results in an excessive build up of the amino acid Phenylalanine (Phe) in the body leading to symptoms including but not limited to intellectual disability, hyperactivity, psychiatric disorders and seizures. Most PKU patients must follow a strict diet limited in Phe. The aim of this research study is to formulate, implement and compare techniques for Phe estimation in commercial foods using the information on the food label (Nutritional Fact Label and ordered ingredient list). Ideally, the techniques should be both accurate and amenable to a user friendly implementation as a Phe calculator that would aid PKU patients monitor their dietary Phe intake. The first approach to solve the above problem is a mathematical one that comprises three steps. The three steps were separately proposed as methods by Jieun Kim in her dissertation. It was assumed that the third method, which is more computationally expensive, was the most accurate one. However, by performing the three methods subsequently in three different steps and combining the results, we actually obtained better results than by merely using the third method. The first step makes use of the protein content in the foods and Phe:protein multipliers. The second step enumerates all the ingredients in the food and uses the minimum and maximum Phe:protein multipliers of the ingredients along with the protein content. The third step lists the ingredients in decreasing order of their weights, which gives rise to inequality constraints. These constraints hold assumng that there is no loss in the preparation process. The inequality constraints are optimized numerically in two phases. The first involves nutrient content estimation by approximating the ingredient amounts. The second phase is a refinement of the above estimates using the Simplex algorithm. The final Phe range is obtained by performing an interval intersection of the results of the three steps. We implemented all three steps as web applications. Our proposed three-step method yields a high accuracy of Phe estimation (error ≤ ±13.04mg Phe per serving for 90% of foods). The above mathematical procedure is contrasted against a machine learning approach that uses the data in an existing database as training data to infer the Phe in any given food. Specifically, we use the K-Nearest Neighbors (K-NN) classification method using a feature vector containing the (rounded) nutrient data. In other words, the Phe content of the test food is a weighted average of the Phe values of the neighbors closest to it using the nutrient values as attributes. A four-fold cross validation is carried out to determine the hyper-parameters and the training is performed using the United States Department of Agriculture (USDA) food nutrient database. Our tests indicate that this approach is not very accurate for general foods (error ≤ ±50mg Phe per 100g in about 38% of the foods tested). However, for lowprotein foods which are typically consumed by PKU patients, the accuracy increases significantly (error ≤ ±50mg Phe per 100g in over 77% foods).

Degree

M.Sc.

Advisors

Boutin, Purdue University.

Subject Area

Artificial intelligence|Disability studies|Mental health|Nutrition

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS