How We Test AI-Generated Code (Without Losing Our Minds)
"How do you test code you didn't write?"
That's the question everyone asks when they learn I don't code anymore. The answer: differently than you think, and more rigorously than before.
Here's how we test AI-generated code across four ventures—and why traditional testing approaches don't quite work.
Why AI-Generated Code Needs Different Testing
Traditional testing assumes:
- You wrote the code, you know what it does
- You understand the edge cases
- You know the implementation details
- Tests verify your logic
AI-generated code reality:
- You didn't write it, you specified it
- AI might have considered edge cases you didn't
- Implementation details might surprise you
- Tests verify AI understood your intent
The shift: Testing moves from "does my code work?" to "did AI build what I asked for?"
Our Testing Philosophy: Three Layers
Core principle: Trust but verify. Always.
The three layers:
- Intent verification - Did AI understand what I wanted?
- Functional testing - Does it actually work?
- Integration testing - Does it work with everything else?
Layer 1: Intent Verification (Most Important)
This is checking if AI built what you actually asked for. Not just "does it work" but "is it the right thing."
How we do it:
- Review the approach - Read the code, understand the strategy
- Check against requirements - Does it match what I specified?
- Look for surprises - What did AI do that I didn't expect?
- Verify assumptions - Did AI make the same assumptions I did?
Real example:
- Asked for: "User authentication with JWT tokens"
- AI built: Working auth system with JWT
- Intent check revealed: AI used 24-hour token expiry (I wanted 1 hour)
- Result: Functionally correct, but wrong intent
Why this matters: AI can build something that works but isn't what you wanted. Catching this early saves massive refactoring later. This is where your judgment as architect matters most.
Layer 2: Functional Testing
Happy Path First
Does it work when everything goes right?
Process:
- AI generates code
- Immediately test the happy path manually
- If it works, move to edge cases
- If it fails, back to AI with specific feedback
Don't proceed until the happy path works. Period.
Edge Cases Second
- What happens with empty inputs?
- What about invalid data?
- How does it handle errors?
- What about extreme values?
The AI advantage: Ask AI to generate edge case tests. AI often thinks of cases you didn't. But verify AI's test cases are actually valid.
Error Handling (AI's Weak Spot)
- Does it fail gracefully?
- Are error messages helpful?
- Does it log appropriately?
- Can users recover from errors?
Common AI failure: AI often builds the happy path perfectly. Error handling is often weak or missing. You must explicitly specify error handling requirements.
Layer 3: Integration Testing
This is where the real issues hide.
What we test:
- Does it work with existing code?
- Does it break anything else?
- Are the interfaces compatible?
- Does data flow correctly?
Our approach:
- Test in isolation first - Does the component work alone?
- Test with dependencies - Does it work with what it needs?
- Test downstream - Does it break anything that uses it?
- Test the full flow - Does the entire user journey work?
Why this is critical: AI doesn't always know the full system context. Integration bugs are the most expensive. Catch them early.
Our Testing Workflow
Step 1: Specify with tests in mind
- Tell AI what to build
- Include testing requirements
- Specify edge cases and error handling upfront
Step 2: AI generates code
- Review the approach (intent verification)
- Check for obvious issues
- Verify it matches requirements
Step 3: Immediate manual testing
- Test happy path first
- If it fails, back to AI immediately
- Don't proceed until happy path works
Step 4: Edge case testing
- Test with invalid inputs
- Test error conditions
- Test boundary conditions
Step 5: Integration testing
- Test with real dependencies
- Test the full user flow
- Check for side effects
Step 6: Automated tests (sometimes)
- For critical paths, add automated tests
- For frequently changing code, maybe not
- Balance test maintenance vs. value
When We Write Automated Tests
We DO write automated tests for:
- Critical business logic (payments, auth, data integrity)
- Stable APIs that won't change often
- Complex algorithms hard to verify manually
- Integration points between services
- Regression prevention for known bugs
We DON'T write automated tests for:
- UI components that change frequently
- Experimental features
- One-off scripts
- Code that's easier to test manually
- Features that might be removed soon
Why: AI-generated code changes fast. Test maintenance can become a burden. Focus automated tests where they provide most value.
Real Examples from the Trenches
Brand-Heart Agent Orchestration:
- Intent: Agents should communicate asynchronously
- AI built: Synchronous communication (worked, but wrong)
- Caught in: Intent verification
- Fix: Clarified async requirement, AI rebuilt correctly
LessonLight Curriculum Alignment:
- Intent: Match lessons to curriculum standards
- AI built: Exact string matching (too rigid)
- Caught in: Edge case testing with real curriculum data
- Fix: Specified fuzzy matching, AI implemented better solution
Smart Services API:
- Intent: Unified API for all services
- AI built: Working API with inconsistent error responses
- Caught in: Integration testing across services
- Fix: Specified error response format, AI standardized
Common Testing Mistakes
- Trusting "it compiles" - Compiling means nothing. Always test functionality.
- Skipping intent verification - Jumping straight to functional testing, missing that AI built the wrong thing.
- Over-relying on automated tests - Writing tests for everything, test maintenance becomes a burden.
- Under-testing integrations - Testing components in isolation, missing integration failures.
- Not testing immediately - Letting AI generate multiple changes, harder to identify what broke.
The Mindset Shift
Old mindset:
- "I wrote this, I know it works"
- Tests are for catching my mistakes
- Comprehensive test coverage is the goal
New mindset:
- "AI wrote this, I need to verify it"
- Tests verify AI understood my intent
- Strategic test coverage where it matters most
- Manual testing is faster for rapidly changing code
Practical Advice
Start here:
- Always verify intent before testing functionality
- Test immediately after every AI change
- Focus on integration testing—that's where AI struggles
- Write automated tests only for critical, stable code
- Use AI to help generate tests, but verify them
Avoid:
- Trusting AI without verification
- Skipping manual testing
- Over-investing in automated tests for changing code
- Testing in isolation without integration checks
- Proceeding when tests fail
The Bottom Line
Testing AI-generated code isn't about catching bugs in code you wrote. It's about verifying that AI built what you actually wanted, and that it works in the real world.
Intent verification matters more than test coverage. Manual testing is faster than automated for rapidly changing code. Integration testing is where the real issues hide.
But the goal is the same: ship code that works, doesn't break things, and solves the actual problem.
We've tested our way through four ventures. These approaches work.
Test differently. Test smarter. Trust but verify.
How This Was Created: This post was written by Mike, architected by Mike, and drafted with AI assistance. The testing approaches are from real experience across four AI-built ventures. The execution is AI-augmented. Just like everything we build at Wizewerx.