Start Date

7-3-2024 10:30 AM

Abstract

The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute it into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects, but does not necessarily indicate the initial point of interference within the network. The objective of this project is to take advantage of large scale and genome-wide perturbational datasets by using them to train a tuned machine learning model that is capable of predicting the effects of unseen perturbations in new data. Perturb-Seq datasets are large libraries of single cell multi-omics data collected from extensive typically CRISPR driven knockout models in cultured cell lines. The advantage of synthetically generated perturbational datasets is that they provide a systematic analogue for labeled training data. The advent of the most recent generation of generative Machine Learning (ML) algorithms, particularly transformers, make it an ideal time to re-assess large scale data libraries in order to grasp cell and even organism-wide genomic expression motifs. By tailoring an algorithm to target intended features centered around the catalogued perturbations, we intend to create a solution model capable of predicting the effects of multiple perturbations in combination, locating points of origin for perturbation in new datasets, predicting the effects of known perturbations in new datasets, annotation of large-scale network motifs. Experimentally determined perturbational behaviors are useful as a means of further tuning the predictive capabilities of the model. Consequently, the NASA spaceflight multi-omic datasets provide an ideal testbed in order to bolster previous findings and confirm new ones.

Share

COinS
 
Mar 7th, 10:30 AM

A Machine Learning Model of Perturb-Seq Data for Use in Space Flight Gene Expression Profile Analysis

The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute it into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects, but does not necessarily indicate the initial point of interference within the network. The objective of this project is to take advantage of large scale and genome-wide perturbational datasets by using them to train a tuned machine learning model that is capable of predicting the effects of unseen perturbations in new data. Perturb-Seq datasets are large libraries of single cell multi-omics data collected from extensive typically CRISPR driven knockout models in cultured cell lines. The advantage of synthetically generated perturbational datasets is that they provide a systematic analogue for labeled training data. The advent of the most recent generation of generative Machine Learning (ML) algorithms, particularly transformers, make it an ideal time to re-assess large scale data libraries in order to grasp cell and even organism-wide genomic expression motifs. By tailoring an algorithm to target intended features centered around the catalogued perturbations, we intend to create a solution model capable of predicting the effects of multiple perturbations in combination, locating points of origin for perturbation in new datasets, predicting the effects of known perturbations in new datasets, annotation of large-scale network motifs. Experimentally determined perturbational behaviors are useful as a means of further tuning the predictive capabilities of the model. Consequently, the NASA spaceflight multi-omic datasets provide an ideal testbed in order to bolster previous findings and confirm new ones.