Senior Honors Thesis Presentation: "Dataset Evaluation for Data Trading Using Expected Loss and Homomorphic Encryption"

Speaker: Michael Joo, Washington University in Saint Louis

Abstract: Supervised machine learning suffers from the "garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation protocol with homomorphic encryption, assuming semi-honest parties, for computation of the expected loss between the buyer and the seller without compromising the data. With experimental results, I show the promise of this approach and also current limitations in real-life feasibility.

Host: Netanel Raviv