Code for the paper GRPO is Secretly a Process Reward Model. Implements λ-GRPO: a custom, PRM-sensitive GRPO variant informed by the theoretical results in the paper.
Code for the paper Procedural Environment Generation for Tool-Use Agents. RandomWorld is a pipeline for the procedural generation of interactive tools and compositional tool-use environments for RL (and SFT) fine-tuning.
Code for the paper Exploring Graph Representations of Logical Forms for Language Modeling (and also a large chunk of my dissertation). GFoLDS is a custom-built transformer architecture that takes as input graph representations of logical forms, which allows it to exceed the performance of comparably-sized standard transformer models while using 6.5x less data.
Code for the experiments I ran in the paper It is not True that Transformers are Inductive Learners: Probing NLI Models with External Negation (which was also my Master's capstone project).
A python implementation of the Modified Adsorption algorithm from Talukdar and Crammer (2009). Just for fun :)