Python Testing Tip #15 / April 8, 2026

Python Development Workflow for AI era

AI is taking over software engineering like fire over dry grass in summer in Greece. It's everywhere. From people fully committing to vibe coding and leaking their Stripe API keys to others staying completely away from AI tooling because LLMs are nondeterministic. One thing is true: the landscape is changing.

This is the beginning of a series where we'll dig into each step of a Python development workflow built for shipping reliably and fast in the AI era.

Engineering - the more things change, the more they stay the same

The AI hype wave is nothing new. We saw that in the past. Microservices, NoSQL, serverless, blockchain, big data, neural networks - to name a few. Each of them was promising salvation. "You have a problem? You should do microservices/big data/serverless!" Sounds familiar? Reality is that there's no "one size fits all" solution. I've seen too many teams go NoSQL just to do joins in the application code. Or microservices just to propagate some data through 9 services in a row. Simply jumping on the hype train usually causes more harm than good.

As engineers, we should ask ourselves: "What's the problem that we have? How can we most effectively solve it?" After that, we should evaluate potential solutions and pick the most effective one. I can guarantee you that there's no team in this world that solved the high-coupling problem by splitting into microservices - they always ended up with a distributed monolith. It's the same with AI. It's great and very powerful when used for the right things.

AI made coding simpler and faster. But it didn't eliminate the need for a quick feedback loop. Even more, it requires an even tighter, faster, and more reliable one. If we can code faster, we need a feedback loop that can keep up. So, how to get there?

Merge requests

Like it or not, big feature branches are an anti-pattern if you want to ship to production often. Context window is limited - for LLMs and for people. The bigger the set of changes, the more likely it is that something will go wrong. It's very easy to miss that something is fishy when you're dealing with 100 things. Not only that, it's impossible to learn and adapt if a feature takes ages to be touched by a non-developer. The quicker you ship, the quicker you can learn, the quicker you can improve, the more value you can deliver.

My rule of thumb is: "MR needs to be small enough that you're willing to throw it completely away if you realize that you derailed!" Why so? You'll never be willing to throw a week of work away - regardless of who wrote the code. But you won't flinch at throwing away one hour of work. That's where it all starts.

Obviously, you'll need to implement other things to really ship fast and reliably. Anyhow, if you don't know where to start, start by making the MRs smaller. Instead of shipping the most polished version of the new feature, try to ship the POC behind a feature flag first. You'll see that even by applying that alone, deployments become less stressful.

Release process

Deployments should be non-events. For this to be true, everything needs to happen automatically. No "Hey, Felipe! Can you SSH to server and deploy the latest main?" Automation makes it predictable and repeatable - it should be so simple that anyone in the team can do it. Here's what this looks like in practice - four pipelines, each with a clear job.

Local flow:

              ┌─────────────────────────────────┐
              │                                  │
              ▼                                  │
┌───────────────────────┐   ┌──────────────────┐ │
│ Develop solution      │──▶│ Run tests &      │─┘
│ with tests            │   │ code quality     │
└───────────────────────┘   └───────┬──────────┘
                                    │ all green
                                    ▼
                            ┌──────────────────┐   ┌──────────────────┐
                            │ Commit & push    │──▶│ Open MR &        │
                            │                  │   │ request review   │
                            └──────────────────┘   └──────────────────┘

MR pipeline:

┌─────────────────┐   ┌─────────────────────────────────────────┐
│ Build Docker    │──▶│ Tests & quality checks                  │
│ image           │   │                                         │
└─────────────────┘   │  ┌─────────────┐  ┌──────────────────┐  │
                      │  │ Unit tests  │  │ I/O tests        │  │
                      │  └─────────────┘  └──────────────────┘  │
                      │  ┌─────────────┐  ┌──────────────────┐  │
                      │  │ Code quality│  │ AI code review   │  │
                      │  └─────────────┘  └──────────────────┘  │
                      └───────────────────┬─────────────────────┘
                                          ▼
                      ┌──────────────────────────────────────────┐
                      │ (Database Migrations)                    │
                      └────────────────────┬─────────────────────┘
                                           ▼
                      ┌──────────────────────────────────────────┐
                      │ Deploy to development                    │
                      └────────────────────┬─────────────────────┘
                                           ▼
                      ┌──────────────────────────────────────────┐
                      │ End-to-end tests                         │
                      └──────────────────────────────────────────┘

Main pipeline:

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│ Prepare release  │──▶│ Release          │──▶│ Deploy to        │
│                  │   │                  │   │ staging          │
└──────────────────┘   └──────────────────┘   └──────────────────┘

Release pipeline:

┌──────────────┐   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│ Build Docker │──▶│ (Database    │──▶│ Deploy to    │──▶│ Send Slack   │
│ image        │   │ Migrations)  │   │ production   │   │ message      │
└──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘

It should be as simple as that.

Environments

I'll just say it: "3 environments are more than enough." More environments always mean more infrastructure complexity and costs while providing only a false sense of security. If this sounds impossible at your company, start with one thing - smaller MRs. That one doesn't need anyone's permission.

Build a culture of tight collaboration

Developers and the product team should together define expected behavior. Developers should ensure expectations are met through automated tests, including acceptance tests. In the worst case, they should also perform some manual testing. (e.g., OAuth flows)

Use feature flags to control releases of features

Want to enable a new feature only for the product team? Use a feature flag to control whether it's enabled. You can merge it to main, it's deployed to staging and production, and then they can test it on production. This way, you also avoid situations like "Why is it broken on production? It worked on UAT when we signed it off."

Leverage automated testing as much as possible

Build a test suite that you trust enough that passed tests in the CI/CD pipeline give confidence to merge - at least in 95% of cases (OAuth flows again). This way, you eliminate most of "No deploys to development. I need to test something." requests. They still happen from time to time, but you don't have 10 people in line all the time. And again - small MRs!

Monitoring

If you want to move fast, you need great monitoring in place. You need to catch issues before they happen or at least when they happen. You shouldn't wait for your customers to send you an angry email to learn about production issues. Monitor everything from exceptions in application code to database disk read IOPS and API latency. Set alerts and post messages to Slack when something is fishy. For example, your life is much easier if you're notified that the database disk is running low on space before it actually runs out. This way, you can enlarge the disk and prevent a potential outage. Anyhow, it's important to find a balance between reporting too early and waiting until things are already done. If you receive too many messages, you get used to that, and you simply start ignoring them - similar to the "The boy who cried wolf " story. On the other hand, setting a certain threshold too high might mean that you won't get an alert despite a true need for it. There's obviously no "one size fits all" answer on how to set these things up. That's why it's important to build, learn, and adapt. I suggest starting on the noisy side and gradually increasing thresholds. This helps to slowly build trust at the price of some false positives. Vice versa can quickly lead to questions like "Why do we have all this monitoring in place if things are going down, and we're not aware of that?"

Conclusion

In this article, we touched only the surface. We'll get into more depth in upcoming articles. For now, keep two things in mind - simplicity and small steps.

Happy engineering!

Share this tip

The complete testing system, not just tips.

Stop piecing together advice from blog posts. This course gives you a structured approach to Python testing that scales with your codebase and keeps your AI agents in check.

Get the Course $20