Sayan Shaw CS 410 Project Video
From Sayan Shaw
views
comments
From Sayan Shaw
Data scientists that work on building machine learning models have the imperative task of determining and conglomerating the best datasets for training models given their specific use case, and this is very costly in terms of research and development time. In order to solve this issue, I propose building a tool that leverages LLMs themselves to determine the best datasets to train a model with, collecting them, and storing them, for ease-of-use and to expedite the data science and engineering work required to build these models.