
Building a Pulumi Provider for Lagoon with Claude Code

February 11, 2026
At Tag1, we believe in proving AI within our own work before recommending it to clients. This post is part of our AI Applied content series, where team members share real stories of how they're using Artificial Intelligence and the insights and lessons they learn along the way. Here, Greg Chaix (Senior Infrastructure Engineer) explores how AI helped transform a manual Lagoon management workflow into a Pulumi dynamic provider, complete with a GraphQL client, high test coverage, and multi-cluster examples.
How I used Claude Code to build a Pulumi provider for Lagoon
Managing cloud-native applications at scale requires automation. For teams using Lagoon, an open-source application delivery platform, this typically means manually creating projects, configuring environments, and setting variables through a web UI or CLI. What if you could manage all of this declaratively, just like your other infrastructure?
That's the problem I set out to solve by building pulumi-lagoon-provider—a Pulumi dynamic provider that lets you manage Lagoon resources as infrastructure-as-code. Instead of building it the traditional way, I used Claude Code (Anthropic's CLI tool) throughout the project.
The result: a functional provider with 5 resource types, a 915-line GraphQL client, 352 unit tests, and comprehensive documentation—built with AI assistance from the first commit to the last bug fix.
The provider is now available for teams managing Lagoon infrastructure.
What is Lagoon?
Lagoon is an open source application delivery platform that deploys containerized applications to Kubernetes. Originally developed by amazee.io, it's particularly popular in the Drupal and open-source CMS communities for its developer-friendly workflows:
- Git-driven deployments: Push to a branch, get an environment
- Environment parity: Production mirrors staging mirrors development
- Built-in CI/CD: Automatic builds and deployments
- Multi-cluster support: Route workloads across multiple Kubernetes clusters
While Lagoon has a web interface and command line interface for management, there was no way to manage Lagoon resources using infrastructure-as-code tools like Terraform or Pulumi. This meant teams managing dozens or hundreds of projects had to rely on manual processes or custom scripts—a gap that pulumi-lagoon-provider fills.
Project Goals
The goal was to create a Pulumi dynamic provider that supports core Lagoon resources:
- LagoonProject: Applications/sites with Git repository connections
- LagoonEnvironment: Branch or PR deployments within a project
- LagoonVariable: Environment or project-level configuration
- LagoonDeployTarget: Kubernetes clusters that host deployments
- LagoonDeployTargetConfig: Cluster-to-project associations
Beyond the provider itself, I wanted to make sure it was production-grade. That meant building a GraphQL client with proper error handling instead of relying on generic HTTP calls, reaching comprehensive test coverage of at least 90 percent, creating working examples that included multi-cluster deployments, and writing documentation clear enough to support real-world adoption.
Here's what using the provider looks like in practice:
import pulumi
import pulumi_lagoon as lagoon
# Create a Lagoon project
project = lagoon.LagoonProject("my-drupal-site",
lagoon.LagoonProjectArgs(
name="example-drupal-site",
git_url="[email protected]:example/drupal-site.git",
deploytarget_id=1,
production_environment="main",
branches="^(main|develop|stage)$",
)
)
# Create production environment
prod_env = lagoon.LagoonEnvironment("production",
lagoon.LagoonEnvironmentArgs(
name="main",
project_id=project.id,
deploy_type="branch",
environment_type="production",
)
)
# Add environment-specific configuration
prod_db = lagoon.LagoonVariable("prod-db-host",
lagoon.LagoonVariableArgs(
name="DATABASE_HOST",
value="mysql-prod.example.com",
project_id=project.id,
environment_id=prod_env.id,
scope="runtime",
)
)
pulumi.export("project_id", project.id)
This declarative approach means you can version control your Lagoon configuration, review changes in pull requests, and apply consistent settings across environments.
The AI-Assisted Development Approach
For this project, I used Claude Code exclusively—no manual coding without AI assistance. This wasn't a constraint but rather an opportunity to explore how AI can transform the entire software development lifecycle.
Speed and Productivity Gains
The most immediate benefit was raw speed. Here's what the project delivered:
| Component | Lines of Code | Tests |
|---|---|---|
| GraphQL Client | 915 | 67 |
| Resource Providers | 1,469 | 156 |
| Validators | 470 | 61 |
| Import Utilities | 255 | 29 |
| Example Infrastructure | 365 | 39 |
| Total | 3,761 | 352 |
Beyond the provider, there are 4,309 lines of test code, 1,865 lines of automation shell scripts, and 5,176 lines of example infrastructure code across three deployment scenarios.
Claude Code's ability to generate boilerplate dramatically accelerated development. For example, once I had the pattern for LagoonProject, creating LagoonEnvironment and LagoonVariable was a matter of describing the resource properties and letting the AI generate the implementation. The consistency across resources isn't accidental—it's the natural result of AI following established patterns.
The terminal-integrated workflow eliminated context switching. I'd describe what I needed, review the generated code, make adjustments, and continue—all without leaving my development environment.
Code Quality and Testing
One concern with AI-generated code is quality. Would it be maintainable? Would it handle edge cases?
I found the testing results compelling. With 352 unit tests achieving approximately 95% coverage, the test suite is more comprehensive than many manually-written projects. Claude Code generated tests that follow consistent patterns:
class TestLagoonClientInit:
"""Tests for LagoonClient initialization."""
def test_client_initialization(self):
"""Test client initializes with correct credentials."""
with patch("requests.Session"):
client = LagoonClient(
api_url="https://api.test.com/graphql",
token="test-token"
)
assert client.api_url == "https://api.test.com/graphql"
assert client.token == "test-token"
assert client.verify_ssl is True
def test_client_ssl_verification_disabled_by_env(self):
"""Test SSL verification can be disabled via environment variable."""
with patch("requests.Session"):
with patch.dict("os.environ", {"LAGOON_INSECURE": "true"}):
client = LagoonClient(
api_url="https://api.test.com/graphql",
token="test-token"
)
assert client.verify_ssl is False
The tests cover initialization, success cases, error cases, and edge conditions. This systematic approach to testing is something AI excels at—once given the pattern, it can generate comprehensive test coverage without the fatigue that often leads developers to skip "obvious" test cases.
The CI/CD pipeline running on GitHub Actions tests against Python 3.8 through 3.12, catching compatibility issues early.
Learning and Problem-Solving
Building this provider required working with technologies I hadn't used extensively together:
- Pulumi Dynamic Providers: The pattern for implementing custom resources
- Lagoon's GraphQL API: With its own conventions and quirks
- Multi-cluster Kubernetes: For the production-like example deployment
Claude Code served as a knowledgeable pair programmer. Let me walk through two examples that showcase different aspects of AI-assisted development.
Finding a Critical Bug While Adding Validation
The initial implementation had minimal input validation. Users would hit cryptic GraphQL errors when they used invalid characters in a project name or typo'd an environment type. I asked Claude Code to add comprehensive validation across all resources.
The AI generated a 470-line validators module with 11 validation functions covering project names (Lagoon's 58-character limit, lowercase+hyphens only), git URLs (SSH and HTTPS formats), regex patterns, and enum values. But the real value came from what the AI found while integrating these validators into the existing code.
In variable.py, the update method had this pattern:
# BEFORE: Dangerous - swallows ALL errors including network failures
try:
client.delete_env_variable(**delete_args)
except Exception:
pass # Silent failure!
Claude Code flagged this immediately: "This catches all exceptions including network errors. If the delete fails due to a connection issue, the code continues to create a new variable, potentially leaving the old one in place. You'd end up with duplicate variables and no indication anything went wrong."
The fix was straightforward but the bug was subtle:
# AFTER: Only swallows expected API errors
try:
client.delete_env_variable(**delete_args)
except LagoonAPIError:
pass # Variable might not exist - acceptable
except LagoonConnectionError:
raise # Network errors must propagate!
This is a real example of AI catching a bug that could have caused silent data inconsistency in production. The validation work added 1,393 lines of code and 105 new tests, but the bug fix was the most valuable outcome.
Designing Composite Import IDs
A more architectural challenge emerged when implementing pulumi import support. The standard pattern—pass a resource ID, get back the resource—doesn't work for Lagoon variables.
To fetch a variable from Lagoon's GraphQL API, you need the project ID, optionally an environment ID, and the variable name. But pulumi import only passes a single ID string. How do you encode all that context?
I described the problem to Claude Code, and we worked through the design together. The solution: composite import IDs with different formats per resource type:
# Import ID formats designed for each resource's lookup requirements
LagoonProject: "42" # Simple numeric ID
LagoonEnvironment: "42:main" # project_id:env_name
LagoonVariable: "42:7:DATABASE_HOST" # project_id:env_id:var_name
LagoonVariable: "42::API_KEY" # project_id::var_name (project-level)
LagoonDeployTargetConfig: "42:3" # project_id:config_id
The AI then generated an ImportIdParser class that detects whether a read() call is an import (props empty) or a refresh (props populated), and parses the composite ID accordingly:
class ImportIdParser:
@staticmethod
def is_import_scenario(id: str, props: dict, required_props: List[str]) -> bool:
"""Detect if this is an import scenario vs a normal refresh."""
if not props:
return True
return any(
prop not in props or props.get(prop) is None
for prop in required_props
)
This feature required understanding both Pulumi's resource lifecycle and Lagoon's API query requirements—exactly the kind of cross-domain problem where having an AI that can reason about both systems accelerates development. The implementation added 257 lines of import utilities with 67 tests covering all the edge cases.
End-to-End Workflow
AI assistance wasn't limited to writing resource providers. The entire development lifecycle benefited:
- Project Scaffolding: Package structure, setup.py, requirements.txt, and initial documentation were generated from descriptions of what the project needed to accomplish.
- Core Implementation: Each resource went through the same pattern—describe the properties, generate the provider, generate tests, refine based on actual API behavior.
- Testing: Beyond unit tests, the AI helped create mock fixtures, test configurations, and debugging utilities.
- Documentation: The README, API documentation, and even this project's CLAUDE.md file (which provides context to Claude Code itself) were AI-assisted.
- Maintenance: Recent bug fixes for RabbitMQ NodePort selectors and Keycloak configuration were all AI-assisted, following the established patterns.
- Multi-Session Continuity: Using a
memory-bank/directory for session summaries allowed context preservation across coding sessions. Each session started by reading previous summaries, maintaining continuity without having to re-explain the project structure.
Architecture and Implementation Challenges
The provider follows Pulumi's dynamic provider pattern: each resource type implements create(), read(), update(), and delete() methods. The read() method enables drift detection—if someone modifies a resource outside of Pulumi, the next pulumi refresh will detect the change.
A 915-line GraphQL client sits beneath the resource providers, handling all API communication with proper error handling. The client separates connection errors (retryable) from API errors (configuration issues), making error handling cleaner throughout the codebase.
Key challenges encountered:
- Lagoon API Conventions: Lagoon uses
openshiftin its API to refer to Kubernetes deploy targets (a historical artifact). The provider translates this to the more intuitivedeploytarget_idwhile maintaining API compatibility. - Two-Phase Deployment: Creating deploy targets requires the Lagoon API to be running, but the API is part of the infrastructure being deployed. The solution: a two-phase deployment pattern where infrastructure deploys first, then deploy targets are created via port-forwards to the now-running API. Claude Code helped design this chicken-and-egg solution.
- Multi-Cluster Communication: The multi-cluster example required configuring RabbitMQ NodePorts and CoreDNS for cross-cluster communication, with AI assistance helping debug service selector mismatches and Helm chart naming conventions.
For the full architecture details and code, see the GitHub repository.
Testing Strategy
The testing approach combines several layers:
Unit Tests (352 tests): Mock all external dependencies, test individual functions in isolation. These run quickly and catch regressions early.
Example Validation: The examples/simple-project/ directory contains a working Pulumi program that exercises the provider against a real Lagoon instance.
Multi-Cluster Testing: The examples/multi-cluster/ deployment creates two Kind clusters with full Lagoon installation, testing the provider in a production-like environment.
CI/CD Pipeline: GitHub Actions runs the full test suite on every push, testing Python 3.8 through 3.12.
# Run the full test suite
pytest tests/unit/ -v
# Output: 352 passed in 5.23s
Results and Metrics
Going from the initial commit to having a working provider with tests took approximately 3 months of part-time development, totaling roughly 40–50 hours of active coding time.
Time Investment Breakdown (estimated):
- Initial scaffolding and GraphQL client: ~8 hours
- Resource providers (5 types): ~12 hours
- Unit tests (352 tests): ~10 hours
- Multi-cluster example infrastructure: ~12 hours
- Debugging and bug fixes: ~8 hours
- Without AI assistance, I estimate this would have taken 2–3x longer, particularly for the test generation and boilerplate-heavy resource providers.
Code Metrics:
- 91 commits
- 3,396 lines of provider code
- 4,309 lines of test code
- 352 unit tests passing
- ~95% code coverage
Functionality Delivered:
- 5 resource types (Project, Environment, Variable, DeployTarget, DeployTargetConfig)
- Full CRUD operations with drift detection
- Import support for all resources
- 3 working examples (simple, single-cluster, multi-cluster)
- Comprehensive documentation
What This Project Taught Me
Where AI Really Shined
- AI Excels at Patterns: Once I established the pattern for one resource type, the AI could generate consistent implementations for additional resources. This isn't just about saving keystrokes—it's about maintaining consistency across a codebase.
- Test Generation Is Transformative: The most time-consuming part of writing robust software is often testing. AI-generated tests are thorough and follow consistent patterns, achieving coverage levels that would be tedious to maintain manually.
- Documentation Stays Current: When the AI writes both code and documentation in the same session, they naturally stay aligned. The README accurately reflects the current API because both were generated together.
- Debugging Accelerates: Describing a problem and getting targeted suggestions for investigation saved significant time. The AI's ability to reason about error messages and suggest likely causes was particularly valuable.
Where Human Judgement Still Matters
- Architecture Decisions: The choice to use a dynamic provider (Python) rather than a native provider (Go) was a human decision based on project scope and iteration speed requirements.
- API Design: What properties should each resource expose? How should naming conventions work? These required understanding both Lagoon's domain model and Pulumi's conventions.
- Edge Case Handling: While AI generates comprehensive tests, knowing which edge cases matter for production use required domain expertise.
- Integration Testing Strategy: Deciding that a multi-cluster Kind deployment was necessary for realistic testing was a judgment call that shaped the project's scope.
Practical Advice for Your Own Projects
- Maintain Context: Use a memory-bank pattern or similar approach to preserve context across sessions. AI works best when it understands the full project context.
- Iterate in Small Increments: Rather than asking for complete implementations, build up functionality piece by piece. This allows for course corrections and produces better results.
- Trust but Verify: AI-generated code should be reviewed, especially for security-sensitive operations. The code is usually good, but understanding what it does is essential.
- Use AI for Research: Beyond code generation, AI is excellent for exploring unfamiliar APIs, understanding error messages, and suggesting debugging approaches.
Where This Leaves Us
Building pulumi-lagoon-provider with Claude Code demonstrated that AI-assisted development can deliver software efficiently. The combination of rapid iteration, comprehensive testing, and maintained documentation produced a result that would have taken significantly longer with traditional development approaches.
For Tag1, this project validates our investment in AI-assisted development workflows. The techniques used here—context preservation across sessions, iterative refinement, AI-generated testing—are directly applicable to client projects where we need to deliver robust solutions quickly.
The provider is now available for teams managing Lagoon infrastructure. Future plans include building a native Go provider for multi-language SDK generation.
Resources
- GitHub Repository: tag1consulting/pulumi-lagoon-provider
- Lagoon Documentation: docs.lagoon.sh
- Pulumi Dynamic Providers: pulumi.com/docs/concepts/resources/dynamic-providers
- Claude Code: claude.com/claude-code
This post is part of Tag1’s This post is part of our AI Applied content series content series, where we share how we're using AI inside our own work before bringing it to clients. Our goal is to be transparent about what works, what doesn’t, and what we are still figuring out, so that together, we can build a more practical, responsible path for AI adoption.
Bring practical, proven AI adoption strategies to your organization, let's start a conversation! We'd love to hear from you.