(Strict) Precommit Hooks: Shifting AI Feedback Left

I’ve been working with AI assisted coding for a long while now and I’ve gotten to experience the strength but also the weaknesses of agent coding. I’ve found a current set up that helps mitigate the weaknesses that involves strict quality gates at the time of commit. The goal? Never commit “bad” code!

At first it appears that creating software is a solved problem and we can spin up multiple agents on multiple projects and autonomously churn out software. Hurray! We’re truly limited only by our ability to generate interesting ideas! Quickly though we all come to realize that we aren’t at that point quite yet. Code is haphazardly thrown anywhere, architecture be damned, and the agent just does whatever it wants. Translating human language into deterministic software features is hard — a realization that is absolutely not unique to the dawn of AI coding.

So we do the next best thing: waterfall, er, I mean spec driven development! What if we write out exactly what we’re looking for from the software. We collaborate with the agent ensuring we clarify any vague spots. Ta-da we’ve created all this fantastic documentation and plans on how to create software. Surely any model driving the agent session can turn these fantastic docs into working software. Heck maybe SDD is how we can “one shot” an agent to produce the working software we’re after!

No. Turns out there were hidden places of vagueness and contradictions… Now you have two artifacts you maintain: the software itself (both production software and the testing software) and the “spec” documentation. Practitioners may say “don’t look at the code, fix the documentation. The AI agent is just the compiler that turns your docs into software” which I’m sure can work but it bothers me. Aren’t the specs of the software ideally supposed to be represented in different suites of testing?

Where I am now is centering on the idea of pushing feedback to the agent “left” — instead of having it author some change sets and pushing up to then fail a CI pipeline, just “push” those checks as early as possible. I’ve landed on using git pre-commit hooks to run a set of software quality gates. My agent follows the rule to commit once it provably has a feature complete and naturally it will fix anything that comes up while committing. Fantastic!

There is probably a way of shifting this feedback even more left. A topic I have simmering on my mind’s back burner.


Software Quality Gates:

GateCategoryTierWhat it catchesExample toolsNotes (incl. AI-agent angle)
Secret scanningSecurityUniversalAPI keys, tokens, credentials, private keys committed to the repogitleaks, trufflehog, detect-secretsCatastrophic and irreversible failure mode. Also catches agents hardcoding hallucinated keys.
File / repo hygieneRepo healthUniversalMerge conflict markers, large binaries, invalid JSON/YAML, broken symlinks, stray .env/.DS_Storepre-commit-hooks (check-merge-conflict, check-added-large-files, check-json)Cheap, project-agnostic, zero judgment required.
FormattingStyleUniversalIndentation, spacing, quote style, import orderingprettier, black, gofmt, rustfmt, ruff formatEliminates bikeshedding and diff noise. Agents are particularly prone to formatting drift, so auto-fixers are ideal.
Linting / static analysisCode qualityUniversalUnused imports, undeclared variables, obvious bugs, anti-patternseslint, ruff, golangci-lint, clippy, shellcheckFast and high-signal; every mainstream language has a good one.
Unit testsTestingUniversalLogic regressions in individual functions/modulespytest, jest, go test, cargo test, junitThe executable spec layer — keeps agents honest about whether code actually runs.
Commit message hygieneRepo healthUniversalNon-conforming or empty commit messages, missing referencescommitlint, gitlint, conventional-commits hooksCompounding returns for changelogs, bisecting, auditability — especially with agent-authored commits.
Type checkingCode qualityUniversal (where supported)Type mismatches, null/undefined access, contract violationstsc, mypy, pyright, sorbetCatches agents inventing methods or passing wrong shapes before runtime.
Dependency vulnerability scanningSecurityUniversalKnown CVEs in direct and transitive depsnpm audit, pip-audit, osv-scanner, dependabot, snykAlso catches agents importing hallucinated or severely outdated packages.
Build / compile checkBuildUniversalCode that doesn’t compile, broken imports, missing assetsLanguage-native build tools“It builds” is the floor.
Integration testsTestingStrongly recommendedModule-boundary breakages, contract mismatches, wiring bugspytest + testcontainers, supertest, RestAssuredWhere bugs actually live in multi-component systems. Usually CI, not pre-commit.
SASTSecurityStrongly recommendedSQLi, XSS sinks, insecure crypto, hardcoded credentials beyond simple secretssemgrep, codeql, bandit, sonarqubePrevents agents from reaching for vulnerable patterns. More signal than linting, less noise than DAST.
License complianceLegalStrongly recommendedIncompatible dependency licenses (GPL in proprietary code, missing attributions)license-checker, fossa, scancodeCheap to automate, expensive in court. Critical if shipping/distributing.
Test coverage thresholdTestingStrongly recommendedUntested code paths, coverage regressionsjest –coverage, coverage.py, jacoco, codecovBest as a regression gate, strict on new code, lenient on legacy. Forces agents to test what they just generated.
Dead code / unused export detectionCode qualityStrongly recommendedOrphaned files, unreferenced exports, unreachable branchesknip, ts-prune, vulture, deadcodeHigh value with agents, which love leaving scaffolding behind.
Smoke testsTestingContext-dependent“Does the app even start” failures, broken happy pathsCustom scripts, playwright trace testsCritical for deployable apps; meaningless for libraries.
Acceptance / E2E testsTestingContext-dependentUser-facing feature regressions, full-stack integration failuresplaywright, cypress, selenium, cucumberEssential for product-facing apps; thinned out for backend libs, CLIs, infra code. Ensures agents don’t break the frontend while “fixing” the backend.
Container / image scanningSecurityContext-dependentVulnerable base images, OS packages, misconfigurationstrivy, grype, docker scout, hadolintNearly mandatory if you ship containers.
IaC scanningSecurityContext-dependentMisconfigured cloud resources, public buckets, overly permissive IAMcheckov, tfsec, kics, terrascanCatches agents writing Dockerfiles/Terraform that expose ports or run as root.
API contract / schema validationTestingContext-dependentBreaking changes to public APIs, OpenAPI/GraphQL schema driftopenapi-diff, graphql-inspector, pact, spectralCritical for services with external consumers; overkill for internal monoliths.
Accessibility (a11y)UX / complianceContext-dependentMissing alt text, contrast failures, keyboard traps, ARIA misuseaxe-core, pa11y, lighthouse-ciMandatory for regulated/public-sector frontends; recommended for any UI; N/A for backend.
Performance / load testingPerformanceContext-dependentLatency regressions, throughput cliffs, memory leaksk6, locust, jmeter, hyperfine, benchmark suitesHigh value for user-facing services at scale; too slow for pre-commit.
Visual regressionTestingContext-dependentUnintended CSS/layout changes, broken stylingpercy, chromatic, playwright snapshotsCatches agents accidentally breaking layout while refactoring components.
Mutation testingTestingContext-dependentTests that pass without actually testing anythingstryker, mutmut, pitestExcellent signal on test quality, especially for agent-written tests. Too slow for pre-commit.
DASTSecurityContext-dependentRuntime vulnerabilities only visible against a running appowasp zap, burp, nucleiNeeds a deployed environment; CI/CD-stage, not pre-commit.
SBOM generationSecurity / complianceContext-dependentMissing software bill of materialssyft, cyclonedx, spdx-toolsRequired for federal/regulated supply chains.
Database migration checksDataContext-dependentDestructive migrations, missing rollbacks, locking statements on large tablessquawk, sqlfluff, custom migration lintersOnly matters if you own a schema, but then high-value.
i18n / l10n checksUXContext-dependentHardcoded strings, missing translation keys, RTL breakagei18n-lint, custom extractorsOnly for localized products.
Architecture / dependency rulesCode qualityContext-dependentForbidden imports across module boundaries, layering violationsdependency-cruiser, archunit, import-linterHigh value in large codebases with intentional architecture.
Code complexity / maintainabilityCode qualityContext-dependentCyclomatic complexity ceilings, duplication, tech debt accumulationsonarqube, pmd, radon, lizardUseful as a project grows; overhead for small repos.
Domain-specific complianceComplianceContext-dependentHIPAA PHI leakage, PCI cardholder data, GDPR data flows, SOC2 controlsDLP scanners, OPA/Gatekeeper, custom policiesDetermined by regulatory regime.
Documentation / generated artifact syncDocsContext-dependentStale OpenAPI specs, generated code out of syncspectral, custom scriptsHigh value for API-heavy or docs-driven projects.
Spell checking (code + docs)PolishContext-dependentTypos in identifiers, comments, user-facing stringscspell, codespell, typosHigh ROI in agent-authored code; some teams find the noise not worth it.
ML-specific gatesData / MLNicheData drift, model bias, edge-case performanceGreat Expectations, custom ML pipelinesOnly for ML/AI projects.
Property-based testingTestingNicheEdge cases unit tests missHypothesis, fast-check, QuickCheckHigh-reliability domains (finance, embedded).

Leave a comment

Your email address will not be published. Required fields are marked *