How We Test AI-Generated Code (Without Losing Our Minds)

"How do you test code you didn't write?"

That's the question everyone asks when they learn I don't code anymore. The answer: differently than you think, and more rigorously than before.

Here's how we test AI-generated code across four ventures—and why traditional testing approaches don't quite work.

Why AI-Generated Code Needs Different Testing

Traditional testing assumes:

You wrote the code, you know what it does
You understand the edge cases
You know the implementation details
Tests verify your logic

AI-generated code reality:

You didn't write it, you specified it
AI might have considered edge cases you didn't
Implementation details might surprise you
Tests verify AI understood your intent

The shift: Testing moves from "does my code work?" to "did AI build what I asked for?"

Our Testing Philosophy: Three Layers

Core principle: Trust but verify. Always.

The three layers:

Intent verification - Did AI understand what I wanted?
Functional testing - Does it actually work?
Integration testing - Does it work with everything else?

Layer 1: Intent Verification (Most Important)

This is checking if AI built what you actually asked for. Not just "does it work" but "is it the right thing."

How we do it:

Review the approach - Read the code, understand the strategy
Check against requirements - Does it match what I specified?
Look for surprises - What did AI do that I didn't expect?
Verify assumptions - Did AI make the same assumptions I did?

Real example:

Asked for: "User authentication with JWT tokens"
AI built: Working auth system with JWT
Intent check revealed: AI used 24-hour token expiry (I wanted 1 hour)
Result: Functionally correct, but wrong intent

Why this matters: AI can build something that works but isn't what you wanted. Catching this early saves massive refactoring later. This is where your judgment as architect matters most.

Layer 2: Functional Testing

Happy Path First

Does it work when everything goes right?

Process:

AI generates code
Immediately test the happy path manually
If it works, move to edge cases
If it fails, back to AI with specific feedback

Don't proceed until the happy path works. Period.

Edge Cases Second

What happens with empty inputs?
What about invalid data?
How does it handle errors?
What about extreme values?

The AI advantage: Ask AI to generate edge case tests. AI often thinks of cases you didn't. But verify AI's test cases are actually valid.

Error Handling (AI's Weak Spot)

Does it fail gracefully?
Are error messages helpful?
Does it log appropriately?
Can users recover from errors?

Common AI failure: AI often builds the happy path perfectly. Error handling is often weak or missing. You must explicitly specify error handling requirements.

Layer 3: Integration Testing

This is where the real issues hide.

What we test:

Does it work with existing code?
Does it break anything else?
Are the interfaces compatible?
Does data flow correctly?

Our approach:

Test in isolation first - Does the component work alone?
Test with dependencies - Does it work with what it needs?
Test downstream - Does it break anything that uses it?
Test the full flow - Does the entire user journey work?

Why this is critical: AI doesn't always know the full system context. Integration bugs are the most expensive. Catch them early.

Our Testing Workflow

Step 1: Specify with tests in mind

Tell AI what to build
Include testing requirements
Specify edge cases and error handling upfront

Step 2: AI generates code

Review the approach (intent verification)
Check for obvious issues
Verify it matches requirements

Step 3: Immediate manual testing

Test happy path first
If it fails, back to AI immediately
Don't proceed until happy path works

Step 4: Edge case testing

Test with invalid inputs
Test error conditions
Test boundary conditions

Step 5: Integration testing

Test with real dependencies
Test the full user flow
Check for side effects

Step 6: Automated tests (sometimes)

For critical paths, add automated tests
For frequently changing code, maybe not
Balance test maintenance vs. value

When We Write Automated Tests

We DO write automated tests for:

Critical business logic (payments, auth, data integrity)
Stable APIs that won't change often
Complex algorithms hard to verify manually
Integration points between services
Regression prevention for known bugs

We DON'T write automated tests for:

UI components that change frequently
Experimental features
One-off scripts
Code that's easier to test manually
Features that might be removed soon

Why: AI-generated code changes fast. Test maintenance can become a burden. Focus automated tests where they provide most value.

Real Examples from the Trenches

Brand-Heart Agent Orchestration:

Intent: Agents should communicate asynchronously
AI built: Synchronous communication (worked, but wrong)
Caught in: Intent verification
Fix: Clarified async requirement, AI rebuilt correctly

LessonLight Curriculum Alignment:

Intent: Match lessons to curriculum standards
AI built: Exact string matching (too rigid)
Caught in: Edge case testing with real curriculum data
Fix: Specified fuzzy matching, AI implemented better solution

Smart Services API:

Intent: Unified API for all services
AI built: Working API with inconsistent error responses
Caught in: Integration testing across services
Fix: Specified error response format, AI standardized

Common Testing Mistakes

Trusting "it compiles" - Compiling means nothing. Always test functionality.
Skipping intent verification - Jumping straight to functional testing, missing that AI built the wrong thing.
Over-relying on automated tests - Writing tests for everything, test maintenance becomes a burden.
Under-testing integrations - Testing components in isolation, missing integration failures.
Not testing immediately - Letting AI generate multiple changes, harder to identify what broke.

The Mindset Shift

Old mindset:

"I wrote this, I know it works"
Tests are for catching my mistakes
Comprehensive test coverage is the goal

New mindset:

"AI wrote this, I need to verify it"
Tests verify AI understood my intent
Strategic test coverage where it matters most
Manual testing is faster for rapidly changing code

Practical Advice

Start here:

Always verify intent before testing functionality
Test immediately after every AI change
Focus on integration testing—that's where AI struggles
Write automated tests only for critical, stable code
Use AI to help generate tests, but verify them

Avoid:

Trusting AI without verification
Skipping manual testing
Over-investing in automated tests for changing code
Testing in isolation without integration checks
Proceeding when tests fail

The Bottom Line

Testing AI-generated code isn't about catching bugs in code you wrote. It's about verifying that AI built what you actually wanted, and that it works in the real world.

Intent verification matters more than test coverage. Manual testing is faster than automated for rapidly changing code. Integration testing is where the real issues hide.

But the goal is the same: ship code that works, doesn't break things, and solves the actual problem.

We've tested our way through four ventures. These approaches work.

Test differently. Test smarter. Trust but verify.

How This Was Created: This post was written by Mike, architected by Mike, and drafted with AI assistance. The testing approaches are from real experience across four AI-built ventures. The execution is AI-augmented. Just like everything we build at Wizewerx.