Problem Statement
AI-powered commit message generators upload your diffs to cloud APIs. For proprietary codebases, this is a non-starter — your code changes, variable names, and business logic are exposed to external services.
I wanted a tool that runs entirely on-device: analyze the diff locally, generate the message locally, never transmit code off the machine.
Demo
$ python commit_gen.py
# Analyzes staged + unstaged diffs, extracts ticket ID from branch name
[PROJ-142] fix: Add retry logic with exponential backoff
Introduces a retry mechanism with exponential backoff to handle
transient connection errors during payment processing. Max retries
set to 3 with 1s base delay.
Files: services/payment.py, tests/test_payment.py
Breaking: NoArchitecture
The schema constrains the LLM output to valid conventional commit format:
@generable("A structured conventional git commit message")
class CommitMessage:
type: str = guide("Commit type",
anyOf=["fix", "feat", "refactor", "test", "docs", "chore"])
ticket_id: str = guide("Ticket ID from branch name")
title: str = guide("Imperative commit title, max 50 chars")
body: str = guide("What changed and why")
breaking: bool = guide("Breaking change flag")
files_changed: List[str] = guide("Key files modified", max_items=10)Tech Stack
- Python — CLI tool and LLM orchestration
- Apple Foundation Models SDK — on-device ~3B parameter model
- Guided generation — typed schema with constraints ensures valid output
- Git — diff parsing, branch name extraction, recent commit style reference
Lessons Learned
-
Truncate diffs to ~3KB. Large diffs exceed the context window. The first 3KB usually contains enough signal for a good commit message — file headers, function signatures, and key changes.
-
Extract ticket IDs from branch names. Branch naming conventions (e.g.,
feature/PROJ-142-add-retry) are reliable ticket ID sources. Regex extraction beats asking the LLM to guess. -
Recent commits as style reference. Feeding the last 3-5 commit messages as context helps the model match the repository's existing commit style — conventional commits, imperative mood, etc.
-
Guided generation eliminates post-processing. With
anyOfconstraints on the type field and typed schema, the output is always valid. No parsing, no regex cleanup, no "sometimes it adds a prefix" bugs.