Computer vision: Identifying Manufactured Home Parks in North Carolina
I identified 8,460 Mobile Home Parks in North Carolina using a computer vision approach from 15 cm resolution aerial imagery. Details summarized in a poster can be found here. I led the execution of the project from start to completion.
Large Language Models: Curating LLM fine-tuning data for high-fideliy domain adaptation
We created a post-training dataset from FineWeb dataset for high-fidelity domain adaptation of open weight LLM (Google Flan). Parameter efficient fine-tuning through prompt tuning resulted in remarkable improvement in perplexity scores as well as demonstration of ability of the tuned model to generalize based on information in the tuning dataset.
Slides are available here.
Data dashboard: CarolinaTracker
This project tracked North Carolina’s economic recovery post-pandemic in 2020-2021 using different economic and social datasets and communicated the stories of recovery using data stories and a data dashboard. My role included: co-designing data pipeline and schema, web-scraping, and data storytelling.