Building a Theremin with AI: A Developer's Journey

Francesco Quagliati - Software Engineer
June 3, 2026

At Tag1, we believe in proving AI within our own work before recommending it to clients. This post is part of our AI Applied content series, where team members share real stories of how they're using Artificial Intelligence and the insights and lessons they learn along the way. Here, Francesco Quagliati, Software Engineer, shares how AI collaboration helped him build something he could not have built alone and how he unexpectedly learned so much.

The Result First

In my hands is a gesture-controlled digital theremin. Wave your hands over two laser sensors: one controls pitch, one controls volume. Three oscillators blend sine, square, triangle, and sawtooth waves. Delay, chorus, and reverb effects add space and shimmer to the sound. A web interface lets you tweak every parameter in real-time from your phone. An OLED display shows a visual tuner converting frequency to musical notes.

It runs on an ESP32 microcontroller. The audio comes through a 16-bit I2S DAC. Not hi-fi, but surprisingly decent. CPU usage sits at 15%, leaving 85% headroom. The code is ~1,200 lines of C++ for the firmware, plus ~1,500 lines for the web interface.

Screenshot of Therem[AI]n Control Panel. — Figure 1: Therem[AI]n Control Panel.

Here's the twist: I had almost no C++ experience when I started this in October 2025. Electronics knowledge? About the same — just some soldering experience from swapping guitar pickups and fixing my FPV drone.

This wasn't about building a theremin. It was about testing whether AI could help me build something real, something complex enough to break, sophisticated enough to matter, and tangible enough to hold in my hands and play. The ESP32 Theremin project is open source and can be accessed today. Find the code and documentation at https://github.com/FrancescoQ/ESP32_Theremin.

Why a Theremin?

I wanted to build something real with AI. Not a todo app. Not another CRUD interface. Something more interesting than "hello world". Complex enough to be a genuine test, even if it meant using a language I barely knew.

A gesture-controlled theremin felt right. It's not trivial: you can't copy-paste your way to working audio synthesis. But it's not impossible either: the scope is bounded, the requirements clear.

I didn't set out needing embedded systems, audio DSP, or I2C protocols. Those came with the territory the project naturally touched:

Embedded systems: Real-time constraints, memory management, hardware timing
Audio synthesis: Oscillators, effects processing, buffer management
I2C protocols: Multiple devices on a shared bus with address conflicts to resolve
Web development: Frontend, WebSocket communication, real-time state sync
Hardware integration: Sensors, GPIO expanders, DACs, displays

None of these were requirements — they were interesting challenges that came from choosing an ambitious project.

Most importantly: I would know if it worked. Not from test coverage or code review, but from hearing the sound and feeling the response when I moved my hands.

The AI Collaboration Model

Two Phases, Two Tools

Phase 1: Claude for Planning

Before writing a single line of code, I spent time with Claude designing the architecture. We mapped out a three-layer system:

SensorManager > Theremin > AudioEngine
   (Input)      (Logic)     (Output)

This wasn't just boxes on a diagram. We discussed trade-offs: Why stack allocation over heap for effects? Why a coordinator pattern instead of tightly coupled classes? What happens when the audio task needs to read parameters while the main loop is writing them?

Claude helped me think through problems I didn't know existed yet. The foundation we designed held up throughout the entire project.

Phase 2: Cline for Implementation

With architecture in hand, I switched to Cline for the actual coding. This phase was different — faster, more iterative, more hands-on. Write code, upload to hardware, test, report results, adjust.

The pattern that emerged:

AI proposes approach
I review and ask questions
AI implements
I test on real hardware
I report what actually happened
We iterate

That feedback loop was critical. AI can't plug in wires. It can't hear the audio glitch at low volumes. It can't feel whether the latency is too slow for musical expression.

What Changed

AI didn't replace me as a developer. It changed what I did as a developer.

I became the validator, the tester, the one who said "this sounds wrong" or "this feels too slow." AI became the fast implementer, the pattern suggester, the patient explainer of concepts I didn't know.

The Journey

Act 1: Architecture First

AI's initial instinct was to jump straight into making sound — get something working, then iterate. But I already had effects and extensibility in mind. I pushed for a different approach: design a future-proof architecture from the start.

So we designed the modular architecture before writing any code. SensorManager handles input abstraction. AudioEngine handles output. Theremin coordinates between them.

This felt like extra work upfront. But I knew from experience that retrofitting architecture is painful. We discussed:

How would effects fit into the audio pipeline?
How would hardware controls integrate without coupling to specific button types?
How would we handle thread safety between the sensor loop and audio generation? The result: when I later added effects, web controls, and display features, they slotted in cleanly. The architecture we designed in the beginning supported features we built months later.

Act 2: Making Sound

First sounds came through basic PWM audio: harsh, buzzy, but working. Then we moved to the ESP32's internal DAC, and that's when problems appeared: choppy audio. The sound would play for a moment, then gap, then play again.

AI helped identify the issue — the sensor reading and audio generation were competing on the same core. The ESP32 has two cores: I asked if we could use the second one for audio. AI helped me implement it: a FreeRTOS task running the audio loop on Core 1, while sensor reading happens on Core 0.

That fix — continuous audio via a dedicated task — changed everything. The result: smooth, uninterrupted sound.

Then came the upgrade to a proper external DAC. The PCM5102 gave us 16-bit audio instead of 8-bit. The difference was dramatic and actually pleasant to listen to.

Act 3: Effects and Controls

Effects were where AI's pattern suggestions really shone.

For the chorus effect, I needed an LFO (low-frequency oscillator) to modulate the delay time. AI started implementing a dedicated LFO class, but I had a different idea. Why not reuse the Oscillator class we already had for audio generation and run it at sub-audio frequencies?

This is where having the "wide view" of the project paid off. AI was focused on the immediate task; I was thinking about the whole codebase. After I proposed the new approach, AI helped refine the implementation — including using the existing lookup table for sine waves instead of expensive sin() calls.

Code reuse, performance benefit, architectural elegance. The collaboration worked both ways.

Reverb brought a subtler problem. At low volumes, the decay would turn grainy — a quiet buzzing as the sound faded. AI diagnosed it as quantization noise: tiny rounding errors in the 16-bit feedback loop accumulating over thousands of iterations.

The fix was elegant: three strategic noise gates silencing signals below threshold at input, feedback, and output stages. The reverb now decays to (almost) true silence.

Hardware controls came together when we added an MCP23017 GPIO expander — 15 physical switches for waveform and octave selection.

Act 4: Web Interface

The final phase: a web UI for controlling everything from a browser.

AI's instinct was vanilla JavaScript: simple, no build step, easy to embed. But I wanted to use this as an opportunity to get experience with modern tooling. I proposed Preact with Vite for the build system and Tailwind for styling.

It seemed like a lot of tooling for an embedded project, but the result justified it: a ~30KB bundle (10KB gzipped) serving a clean, responsive interface with real-time WebSocket updates.

The theremin is now controllable from my phone while I play it. Every oscillator parameter, every effect setting, a visual tuner showing the note I'm playing — all updating in real-time.

Screenshot of interface served at theremin.local — Figure 2: The full interface served at theremin.local on mobile

What I Learned

Technical

The project became an unexpected education:

Embedded C++ patterns: RAII for resource management, stack vs heap allocation, when to use unique_ptr vs raw pointers
FreeRTOS: Tasks, cores, mutexes, semaphores — real multithreading concepts
I2C multi-device management: Address conflicts, XSHUT initialization sequences, bus arbitration
Real-time audio: Sample rates, buffer sizes, latency budgets, the math of oscillators and effects
DSP fundamentals: Delay lines, comb filters, allpass filters, feedback loops, quantization noise
Modern frontend: Component architecture, WebSocket state management, build tooling I didn't learn these by reading tutorials. I learned them by building something that needed them.

About AI Collaboration

What AI Does Well

Architecture and planning: Thinking through systems, identifying edge cases, designing for extensibility. Claude was genuinely helpful here — not just generating diagrams, but reasoning about trade-offs. That said, I often had the better "wide view" of the project and proposed architectural decisions that were more future-proof (like reusing the Oscillator class for LFOs). The human-AI dynamic worked best when I steered strategy and AI helped with implementation details.

Code generation: AI is fast at producing working code. Consistency was more mixed: as the project grew, I sometimes had to remind AI of the style conventions we'd established. Context gets lost across sessions. But with guidance, the codebase ended up feeling coherent rather than patchwork.

Problem-solving for subtle issues: AI excelled at diagnosing tricky bugs like the reverb quantization noise. For architectural solutions, it was more collaborative: I proposed using the second core for audio and reusing Oscillators as LFOs, then AI helped me implement what I couldn't have coded alone.

Full-stack work: C++, JavaScript, CSS, build configurations — AI moved fluidly between all of them.

Documentation and context: The memory bank system (context persistence files) let each session build on the last instead of starting from zero.

What Still Needs Humans

Hardware testing: AI can't plug in wires, measure voltages, or hear audio quality. Every hardware interaction required my hands. Case in point: there's still an unsolved I2C sensitivity issue — the sensor cables are probably too long, and touching certain metal parts while running causes a freeze. AI helped me investigate (suggesting a stabilizing capacitor, analyzing possible causes), but the real fix means rebuilding hardware with shorter connections. Some problems need a soldering iron, not a chat window.

Real-world debugging: Some bugs only appear on physical devices. Simulation gets you far, but not all the way.

Strategic decisions: When to stop optimizing. Which features matter. Trade-off calls that depend on project goals.

Quality judgment: Does it feel right? Is the latency acceptable? Does the sound please the ear? These are human assessments.

Understanding and critical review: The biggest lesson. I could have copy-pasted AI suggestions blindly. I didn't. I asked why. I traced through the logic. I spotted errors and improvements in what AI proposed and discussed how to fix them. I learned C++ because I engaged with the code, not despite having AI write it.

The Workflow That Worked

AI proposes architecture/approach: "Here's how I'd structure this..."
Human reviews, questions, and counter-proposes: "Why stack allocation here? What if we reused this class instead?"
AI implements: Code generated following the patterns we discussed
Human tests on hardware: Upload, run, observe actual behavior
Report results back to AI: "Audio glitches when changing waveforms"
Iterate: Discuss, adjust, re-implement, re-test Critical tool: Memory bank context persistence files that carry project knowledge between sessions. Without this, every AI conversation starts from zero. With it, we build on accumulated understanding.

The Honest Truth

Final stats:

15% CPU usage (85% headroom remaining)
~1,200 lines C++
~1,500 lines web UI
3 oscillators, 4 waveforms, 3 effects
Real-time web control
16-bit audio output Would this have taken longer without AI? Almost certainly.

Could I have built it at all with my C++ level? Probably not.

But here's what matters: AI didn't build it for me.

I had to understand every decision. I had to test every feature. I had to debug real hardware. I had to make judgment calls about what sounded right and felt responsive.

I learned C++ by building something real, with AI as an incredibly patient teacher and fast implementer.

This was a true collaborative process: AI tools accelerated development and provided technical assistance, but all code was reviewed, tested, and refined through active human involvement and hardware validation.

What This Means

AI-assisted development isn't "cheating." It's a new way to learn and build.

The barrier to entry for complex projects is lower than ever. Not because AI does everything — but because AI handles the tedious parts, explains the confusing parts, and lets you focus on the parts that matter.

If you're curious about embedded systems, audio programming, or any complex technical domain, you don't need to master everything first. Start with something you want to build. Let AI help you get there. Stay engaged. Ask questions. Test obsessively.

You'll end up learning more than you expected — and holding something real in your hands.

The ESP32 Theremin project is open source. Find the code and documentation at https://github.com/FrancescoQ/ESP32_Theremin.

This post is part of Tag1’s AI Applied content series, where we share how we're using AI inside our own work before bringing it to clients. Our goal is to be transparent about what works, what doesn’t, and what we are still figuring out, so that together, we can build a more practical, responsible path for AI adoption.

Bring practical, proven AI adoption strategies to your organization, let's start a conversation! We'd love to hear from you.

Your challenges fuel our innovation

AI Innovations

Performance Management Innovations

Building a Theremin with AI: A Developer's Journey

Take Away

The Result First

Why a Theremin?

The AI Collaboration Model

Two Phases, Two Tools

What Changed

The Journey

Act 1: Architecture First

Act 2: Making Sound

Act 3: Effects and Controls

Act 4: Web Interface

What I Learned

Technical

About AI Collaboration

The Workflow That Worked

The Honest Truth

What This Means

Be in Capable Digital Hands

Take Away

The Result First

Why a Theremin?

The AI Collaboration Model

Two Phases, Two Tools

What Changed

The Journey

Act 1: Architecture First

Act 2: Making Sound

Act 3: Effects and Controls

Act 4: Web Interface

What I Learned

Technical

About AI Collaboration

The Workflow That Worked

The Honest Truth

What This Means

Related Insights