(Strict) Precommit Hooks: Shifting AI Feedback Left

I’ve been working with AI assisted coding for a long while now and I’ve gotten to experience the strength but also the weaknesses of agent coding. I’ve found a current set up that helps mitigate the weaknesses that involves strict quality gates at the time of commit. The goal? Never commit “bad” code!

At first it appears that creating software is a solved problem and we can spin up multiple agents on multiple projects and autonomously churn out software. Hurray! We’re truly limited only by our ability to generate interesting ideas! Quickly though we all come to realize that we aren’t at that point quite yet. Code is haphazardly thrown anywhere, architecture be damned, and the agent just does whatever it wants. Translating human language into deterministic software features is hard — a realization that is absolutely not unique to the dawn of AI coding.

So we do the next best thing: waterfall, er, I mean spec driven development! What if we write out exactly what we’re looking for from the software. We collaborate with the agent ensuring we clarify any vague spots. Ta-da we’ve created all this fantastic documentation and plans on how to create software. Surely any model driving the agent session can turn these fantastic docs into working software. Heck maybe SDD is how we can “one shot” an agent to produce the working software we’re after!

No. Turns out there were hidden places of vagueness and contradictions… Now you have two artifacts you maintain: the software itself (both production software and the testing software) and the “spec” documentation. Practitioners may say “don’t look at the code, fix the documentation. The AI agent is just the compiler that turns your docs into software” which I’m sure can work but it bothers me. Aren’t the specs of the software ideally supposed to be represented in different suites of testing?

Where I am now is centering on the idea of pushing feedback to the agent “left” — instead of having it author some change sets and pushing up to then fail a CI pipeline, just “push” those checks as early as possible. I’ve landed on using git pre-commit hooks to run a set of software quality gates. My agent follows the rule to commit once it provably has a feature complete and naturally it will fix anything that comes up while committing. Fantastic!

There is probably a way of shifting this feedback even more left. A topic I have simmering on my mind’s back burner.

Software Quality Gates:

Gate	Category	Tier	What it catches	Example tools	Notes (incl. AI-agent angle)
Secret scanning	Security	Universal	API keys, tokens, credentials, private keys committed to the repo	gitleaks, trufflehog, detect-secrets	Catastrophic and irreversible failure mode. Also catches agents hardcoding hallucinated keys.
File / repo hygiene	Repo health	Universal	Merge conflict markers, large binaries, invalid JSON/YAML, broken symlinks, stray .env/.DS_Store	pre-commit-hooks (check-merge-conflict, check-added-large-files, check-json)	Cheap, project-agnostic, zero judgment required.
Formatting	Style	Universal	Indentation, spacing, quote style, import ordering	prettier, black, gofmt, rustfmt, ruff format	Eliminates bikeshedding and diff noise. Agents are particularly prone to formatting drift, so auto-fixers are ideal.
Linting / static analysis	Code quality	Universal	Unused imports, undeclared variables, obvious bugs, anti-patterns	eslint, ruff, golangci-lint, clippy, shellcheck	Fast and high-signal; every mainstream language has a good one.
Unit tests	Testing	Universal	Logic regressions in individual functions/modules	pytest, jest, go test, cargo test, junit	The executable spec layer — keeps agents honest about whether code actually runs.
Commit message hygiene	Repo health	Universal	Non-conforming or empty commit messages, missing references	commitlint, gitlint, conventional-commits hooks	Compounding returns for changelogs, bisecting, auditability — especially with agent-authored commits.
Type checking	Code quality	Universal (where supported)	Type mismatches, null/undefined access, contract violations	tsc, mypy, pyright, sorbet	Catches agents inventing methods or passing wrong shapes before runtime.
Dependency vulnerability scanning	Security	Universal	Known CVEs in direct and transitive deps	npm audit, pip-audit, osv-scanner, dependabot, snyk	Also catches agents importing hallucinated or severely outdated packages.
Build / compile check	Build	Universal	Code that doesn’t compile, broken imports, missing assets	Language-native build tools	“It builds” is the floor.
Integration tests	Testing	Strongly recommended	Module-boundary breakages, contract mismatches, wiring bugs	pytest + testcontainers, supertest, RestAssured	Where bugs actually live in multi-component systems. Usually CI, not pre-commit.
SAST	Security	Strongly recommended	SQLi, XSS sinks, insecure crypto, hardcoded credentials beyond simple secrets	semgrep, codeql, bandit, sonarqube	Prevents agents from reaching for vulnerable patterns. More signal than linting, less noise than DAST.
License compliance	Legal	Strongly recommended	Incompatible dependency licenses (GPL in proprietary code, missing attributions)	license-checker, fossa, scancode	Cheap to automate, expensive in court. Critical if shipping/distributing.
Test coverage threshold	Testing	Strongly recommended	Untested code paths, coverage regressions	jest –coverage, coverage.py, jacoco, codecov	Best as a regression gate, strict on new code, lenient on legacy. Forces agents to test what they just generated.
Dead code / unused export detection	Code quality	Strongly recommended	Orphaned files, unreferenced exports, unreachable branches	knip, ts-prune, vulture, deadcode	High value with agents, which love leaving scaffolding behind.
Smoke tests	Testing	Context-dependent	“Does the app even start” failures, broken happy paths	Custom scripts, playwright trace tests	Critical for deployable apps; meaningless for libraries.
Acceptance / E2E tests	Testing	Context-dependent	User-facing feature regressions, full-stack integration failures	playwright, cypress, selenium, cucumber	Essential for product-facing apps; thinned out for backend libs, CLIs, infra code. Ensures agents don’t break the frontend while “fixing” the backend.
Container / image scanning	Security	Context-dependent	Vulnerable base images, OS packages, misconfigurations	trivy, grype, docker scout, hadolint	Nearly mandatory if you ship containers.
IaC scanning	Security	Context-dependent	Misconfigured cloud resources, public buckets, overly permissive IAM	checkov, tfsec, kics, terrascan	Catches agents writing Dockerfiles/Terraform that expose ports or run as root.
API contract / schema validation	Testing	Context-dependent	Breaking changes to public APIs, OpenAPI/GraphQL schema drift	openapi-diff, graphql-inspector, pact, spectral	Critical for services with external consumers; overkill for internal monoliths.
Accessibility (a11y)	UX / compliance	Context-dependent	Missing alt text, contrast failures, keyboard traps, ARIA misuse	axe-core, pa11y, lighthouse-ci	Mandatory for regulated/public-sector frontends; recommended for any UI; N/A for backend.
Performance / load testing	Performance	Context-dependent	Latency regressions, throughput cliffs, memory leaks	k6, locust, jmeter, hyperfine, benchmark suites	High value for user-facing services at scale; too slow for pre-commit.
Visual regression	Testing	Context-dependent	Unintended CSS/layout changes, broken styling	percy, chromatic, playwright snapshots	Catches agents accidentally breaking layout while refactoring components.
Mutation testing	Testing	Context-dependent	Tests that pass without actually testing anything	stryker, mutmut, pitest	Excellent signal on test quality, especially for agent-written tests. Too slow for pre-commit.
DAST	Security	Context-dependent	Runtime vulnerabilities only visible against a running app	owasp zap, burp, nuclei	Needs a deployed environment; CI/CD-stage, not pre-commit.
SBOM generation	Security / compliance	Context-dependent	Missing software bill of materials	syft, cyclonedx, spdx-tools	Required for federal/regulated supply chains.
Database migration checks	Data	Context-dependent	Destructive migrations, missing rollbacks, locking statements on large tables	squawk, sqlfluff, custom migration linters	Only matters if you own a schema, but then high-value.
i18n / l10n checks	UX	Context-dependent	Hardcoded strings, missing translation keys, RTL breakage	i18n-lint, custom extractors	Only for localized products.
Architecture / dependency rules	Code quality	Context-dependent	Forbidden imports across module boundaries, layering violations	dependency-cruiser, archunit, import-linter	High value in large codebases with intentional architecture.
Code complexity / maintainability	Code quality	Context-dependent	Cyclomatic complexity ceilings, duplication, tech debt accumulation	sonarqube, pmd, radon, lizard	Useful as a project grows; overhead for small repos.
Domain-specific compliance	Compliance	Context-dependent	HIPAA PHI leakage, PCI cardholder data, GDPR data flows, SOC2 controls	DLP scanners, OPA/Gatekeeper, custom policies	Determined by regulatory regime.
Documentation / generated artifact sync	Docs	Context-dependent	Stale OpenAPI specs, generated code out of sync	spectral, custom scripts	High value for API-heavy or docs-driven projects.
Spell checking (code + docs)	Polish	Context-dependent	Typos in identifiers, comments, user-facing strings	cspell, codespell, typos	High ROI in agent-authored code; some teams find the noise not worth it.
ML-specific gates	Data / ML	Niche	Data drift, model bias, edge-case performance	Great Expectations, custom ML pipelines	Only for ML/AI projects.
Property-based testing	Testing	Niche	Edge cases unit tests miss	Hypothesis, fast-check, QuickCheck	High-reliability domains (finance, embedded).

Leave a comment Cancel reply