Projects - Matt McManus

Empirical Scaling Harness

Scaling law research

Jan 2025

A controlled experimentation framework for training families of transformer models across multiple sizes and fitting scaling curves to quantify performance vs compute. It's designed to test hypotheses like 'SwiGLU changes scaling behavior vs GeLU' using parameter-matched, apples-to-apples runs and validation on held-out sizes. Net effect: evidence-driven architecture and budget decisions instead of vibes.

Source Code

Elastic Training Harness

Fault-tolerant distributed training

Jan 2025

A distributed PyTorch training harness built to survive node failures and cluster resizing without losing training progress or corrupting data order. It combines fast, layered checkpointing with deterministic data/token sharding so runs can resume cleanly even when the world size changes. Net effect: fewer dead runs, less wasted GPU spend, and faster recovery when hardware or networking flakes.

Source Code

RLHF Canary

Regression detection for training pipelines

Jan 2025

An automated test suite for SFT/DPO/PPO-style training pipelines that flags performance slowdowns, instability (NaNs/divergence), and correctness issues when code changes. It's meant to run in CI with tiers (smoke/perf/nightly) so regressions get caught before they burn days of GPU time. Net effect: safer iteration speed and fewer 'we broke training' surprises.

Source Code

Torch-Velocity

Adaptive speculative decoding

Jan 2025

An inference engine that speeds up LLM generation by using a small draft model to propose multiple tokens and a larger target model to verify them in parallel, preserving correctness with rejection sampling. It adapts the lookahead length based on real-time acceptance/entropy and manages KV cache rollback efficiently. Net effect: faster throughput and lower serving cost without changing model weights.

Source Code

Resy Bot

Automated restaurant reservations

Jan 2024

An async Rust bot for snagging hard-to-get restaurant reservations on Resy. Built with tokio for concurrent HTTP requests, reqwest for API communication, and chrono for timezone-aware scheduling. Queries available slots, retrieves booking tokens, and submits reservations automatically.

MIT Pokerbots 2023

Poker bot competition infrastructure

Jan 2023

Core game engine for MIT's 6.176 poker bot programming competition during IAP. Supports multi-language skeleton bots (Python, Java, C++) with socket-based communication protocol, configurable betting structures, and automated tournament bracket management.

Source Code

MIT Pokerbots 2022

Poker bot competition infrastructure

Jan 2022

Core game engine for MIT's 6.176 poker bot programming competition during IAP. Supports multi-language skeleton bots (Python, Java, C++) with socket-based communication protocol, configurable betting structures, and automated tournament bracket management.

Source Code

MIT Pokerbots 2021

Poker bot competition infrastructure

Jan 2021

Core game engine for MIT's 6.176 poker bot programming competition during IAP. Supports multi-language skeleton bots (Python, Java, C++) with socket-based communication protocol, configurable betting structures, and automated tournament bracket management.

Source Code

Side Projects

Empirical Scaling Harness

Elastic Training Harness

RLHF Canary

Torch-Velocity

Resy Bot

MIT Pokerbots 2023

MIT Pokerbots 2022

MIT Pokerbots 2021