Projects

Lexically Constrained Decoding of Transformers (2025.02-2025.03) [paper] [code] [slides]

In a class project, we adapted the constrained decoding algorithm Grid Beam Search (GBS) to impose lexical constraints on transformers. GBS was originally applied to stateful Neural Translation Models for seq2seq translation. We developed a new pipeline for generating structured outputs from a given prompt and set of lexical constraints that supports any pretrained autoregressive language model, which we chose to be GPT2. Then, we fine-tuned GPT-2 on a corpus of Chekhov’s stories. Our subjective analysis showed that GBS + fine-tuned GPT2 gave more interesting and meaningful domain-specific results than GBS + GPT2 alone.

Synthesizing Composite Hierarchical Structure from Symbolic Music Corpora (2024.03-2025.01) [paper] [code] [slides]

I introduced a unified, hierarchical meta-representation of sequence data structure called the structural temporal graph (STG), a k-partite directed acyclic graph, and applied it to symbolic music. Then, I used simulated annealing to develop a measure of structural distance between two music pieces rooted in graph isomorphism. Finally, I combined the formal guarantees of SMT solvers with nested simulated annealing over structural distances to frame and solve the dually NP-hard combinatorial optimization problem of music structure summarization as an extension of the Generalized Median Graph problem.

The Impact of GitHub Copilot on Test-First Development (2024.09-2024.12) [paper] [code]

In a class project, we examined the impact of GitHub Copilot on Test-First Development, a software development paradigm where tests are written before implementation. We conducted a between-subjects (no Copilot vs Copilot) pilot study, consisting of coding tasks and pre- and post-task surveys. Participants iterated between writing a comprehensive API and test suite for a problem, with or without Copilot. We found that while Copilot enhanced coding speed, it resulted in superficial problem comprehension and decreased scope of the test suites.

pgen-rs: LLM-Aided Genomic Data Wrangling (2024.03-2024.05) [paper] [code] [slides]

In order to perform genomic data analyses, bioinformaticians are frequently forced to do cubmersome pre-processing of their data. To addres these challenges, in a class project, we developed pgen-rs, a tool enabling end-users to write their data wrangling requirements in natural language enhanced by LLM suggestions. pgen-rs converts natural language to a DSL that is executed by out Rust-based high-performance genomic data processor enabled by the PLINK file format.

Continuous Enumeration for Just-In-Time Bottom-Up Synthesis (2024.01-2024.03) [paper] [code]

In a class project, we developed ProCon, a tool implementing continuous, rule-based enumeration for just-in-time bottom-up search in SyGuS problems. Programs are enumerated in order of continuous, nonrounded weights as determined by a probabilistic weighting function. ProCon thus leverages the full power of the probabilistic model since it does not round the probabilities to discrete weights as size-based enumeration does.

MusAssist: Domain Specific Language for Music (2022.01-2022.05) [paper] [thesis] [demo] [code]

Under the mentorship of Dr. Ben Wiedermann at Harvey Mudd College, I created MusAssist, a DSL for music notation bridging the abstraction gap between music theory and composition. MusAssist gives end-users the novel ability to describe musical templates, such as cadences, at precisely the abstraction level as the music theoretic structures they represent, which models what a composer would organically conceive when composing by hand. I designed MusAssist’s syntax using a participatory design approach, and then wrote a Haskell-based compiler that translates MusAssist to MusicXML, a lower-level language accepted by most music notation software, for further manual editing.

Markov Chains for Computer Music Generation (2020.09-2020.12) [paper] [code]

Under the mentorship of Dr. Mark Huber at Claremont McKenna College, I created a novel system of Markov chains using inverse transform sampling to rapidly generate musical sketches, giving end-users the ability to create custom-length music in the style of a desired piece.

Virtual Ensemble Assembly: Musicality in Separation (2020.08-2020.12) [paper] [code]

In an independent study during my undergrad, I assisted Dr. Christopher Raphael at Indiana University Bloomington on a project investigating audio algorithms for music score alignment. We developed a novel algorithm to synchronously assemble remotely recorded audio tracks without click tracks in order to address the need for remote media collaboration induced by the ongoing COVID pandemic.

DNA to Music (MIDI) Translation (2019.01-2019.05) [paper] [code]

In a class research project during my first year of undergrad, I created an original Python-based model to sonify genetic material by translating DNA to MIDI piano chords. By mapping nucleotides and codons to musical chords, my work introduced a novel and straightforward means of conceptualizing both the structure of a gene and the processes of biological splicing and translation that is accessible to users of all scientific backgrounds.

Ilana Shapiro