I built gowebper, a pure Go WebP (VP8L) encoder. Zero dependencies, no CGo, no golang.org/x/image. Just Go and a spec. Claude Code wrote most of the implementation while I directed the architecture and debugged the gnarliest format bugs alongside it. Here's how it went, what broke, and what I learned.
Why?
Go's standard library can decode WebP but can't encode it. The only option is golang.org/x/image/webp, which is decode only too. If you want to create WebP files in Go, you're stuck reaching for CGo bindings to libwebp. I wanted something that could run anywhere Go runs: cross compiled, statically linked, no shared libraries.
VP8L (WebP lossless) seemed tractable. The spec is public, the format is well documented, and the bitstream is relatively straightforward compared to VP8 (the lossy variant). A Huffman coder, an LZ77 compressor, four image transforms, and a bit writer. How hard could it be?
The Architecture
The final codebase is about 3,000 lines of Go across six internal packages:
encode.go # main encoder, RIFF wrapper, transform orchestration
quantize.go # near lossless pre quantization
internal/bitwriter/ # LSB first bit packing (VP8L is LSB first, unlike most formats)
internal/colormodel/ # image.Image to packed ARGB uint32 with fast path type switches
internal/huffman/ # length limited canonical Huffman codes + VP8L tree serialization
internal/lz77/ # hash chain LZ77 with VP8L's 2D spatial distance table
internal/transform/ # SubtractGreen, Predictor (14 modes), CrossColor (least squares)
internal/vp8ldec/ # reference decoder for round trip testing
Claude Code generated the initial implementation of each package from my descriptions of the VP8L spec. I'd describe a component, something like "implement canonical Huffman coding with a 15 bit length limit using a min heap", and it would produce working code on the first try. The boilerplate heavy parts (the 120 entry spatial distance table, the 14 predictor modes, the RLE code length encoding) were where it saved the most time.
The encoder supports 10 compression levels (0 through 9), progressively enabling more transforms and larger LZ77 windows. Level 0 is raw Huffman coded pixels. Level 9 uses all four transforms with a million pixel search window and color cache.
Where Things Got Interesting: The Bug Hunt
The initial implementation passed all internal round trip tests immediately. Encode an image, decode it with our own decoder, compare pixels. Perfect. I was feeling good.
Then I ran dwebp on the output:
Decoding of output.webp failed.
Status: 3(BITSTREAM_ERROR)
Every single file. Every level, every image. Our internal decoder was happy, but libwebp rejected everything. This started a multi day debugging session that taught me more about VP8L than I ever wanted to know.
Bug 1: The Missing Bit
The first bug was a single missing bit.
VP8L's bitstream has a flag called use_meta, a 1 bit field that signals whether the image uses meta Huffman coding (multiple Huffman tree groups for different image regions). We don't use meta Huffman, so the correct value is 0. But we weren't writing it at all.
This bit sits between the color cache flag and the Huffman tree data. Without it, dwebp would read our first Huffman tree bit as the meta Huffman flag, interpret it as "yes, use meta Huffman", and then try to parse a meta Huffman map from what was actually tree data. Instant corruption.
The subtle part: this flag only exists at the top level image (what libwebp calls is_level0). Transform sub images (predictor tile data, palette data) don't have it. So you can't just "always write it". You need to know your context.
I found it by fetching the libwebp source code and reading ReadHuffmanCodes(). One line: VP8LReadBits(br, 1), right there, between color cache and tree reading, guarded by if (is_level0). Claude Code had faithfully implemented the VP8L spec but missed this detail because the spec buries it in prose that's easy to skim past.
The fix: one line of code. bw.WriteBits(0, 1). In two places.
After this fix, small images (1x1, 2x2, up to about 200x200) decoded perfectly with dwebp. But test.png (387x429, a real photo) still failed.
Bug 2: The Undercomplete Huffman Tree
VP8L limits Huffman code lengths to 15 bits. When the naive Huffman algorithm produces codes longer than 15 bits (which happens with skewed frequency distributions), you need to "limit" the tree: shorten the deepest codes and redistribute the code space.
This redistribution must maintain Kraft equality: the sum of 2^(length) across all codes must equal exactly 1. If it's less than 1, the code is "undercomplete" (there are unused bit patterns) and libwebp rejects it.
Our limitLengths function had a greedy loop that would lengthen codes (increase depth) to reduce the Kraft sum after clamping. But each lengthening step reduces the sum by a power of 2, and the loop processed all eligible codes in a single pass. For a tree with 225 active symbols, this could overshoot. The Kraft sum would end up at 32,767 instead of 32,768 (undercomplete by 1).
Binary search revealed the exact threshold: rows=280 of test.png worked, rows=281 failed. At row 281, the green channel tree had enough symbols that limitLengths would overshoot.
The fix: Replace the entry based greedy algorithm with a bit count based approach. Process depths from shallowest to deepest (largest reduction first), using integer division to guarantee exact Kraft equality. It's essentially the binary decomposition of the excess, like making change with powers of 2 coins.
Bug 3: The Spatial Distance Table
With the Kraft fix, Level 0 (no LZ77) worked perfectly at all image sizes. But Levels 1 through 6 (with LZ77 backward references) still failed.
VP8L uses a 2D spatial distance table for encoding backward reference distances. Instead of storing "120 pixels back", you store a plane code that the decoder maps to a (dx, dy) offset. The first 120 plane codes have predefined spatial offsets:
plane code 1 = (dx=0, dy=1) pixel directly above
plane code 2 = (dx=1, dy=0) pixel to the left
plane code 3 = (dx=1, dy=1) pixel above right
plane code 4 = (dx=-1, dy=1) pixel above left
...
Our encoder assumed that pixel distances 1 through 4 mapped directly to plane codes 1 through 4. They don't. Plane code 1 is the pixel above (distance = image width), not the pixel to the left (distance = 1). Plane code 2 is the pixel to the left.
This meant every single backward reference with a small distance was pointing to the wrong pixel. Our internal decoder had the same bug in reverse, so round trip tests passed. Both sides were wrong in the same way. But dwebp uses the correct mapping.
The fix: Include all 120 spatial table entries in the reverse lookup (we were only using entries 5 through 120), and remove the special case for small distances. Also fix the internal decoder to match.
Working with Claude Code
The pattern that worked best: I'd describe the architecture and constraints, Claude Code would generate the implementation, and then I'd test against external tools (dwebp, cwebp, Pillow). When something failed, I'd direct the investigation. "Binary search for the failing image size." "Compare our bitstream against cwebp's output byte by byte." "Fetch the libwebp source and find where it reads the meta Huffman flag." Claude Code would execute.
Claude Code was excellent at:
- Generating boilerplate heavy code. The 120 entry distance table, 14 predictor modes, RLE code length encoding.
- Implementing well specified algorithms. Huffman tree construction, canonical code assignment, LZ77 hash chains.
- Writing targeted debugging tools. Bit level comparators, Kraft sum validators, binary search test harnesses.
- Iterating quickly. Modifying code, running tests, analyzing output in tight loops.
The challenges were all in format compatibility: the gaps between the spec as written and the spec as implemented by libwebp. These required reading the actual libwebp C source code and understanding the decoder's expectations, not just the encoder's intent.
The Numbers
For test.png (387x429 RGBA):
| Level | Quality | Size | Notes |
|---|---|---|---|
| 0 | 0 (lossless) | 277 KB | Baseline |
| 6 | 0 (lossless) | 104 KB | Competitive |
| 9 | 0 (lossless) | 121 KB | Diminishing returns |
| 6 | 50 | 52 KB | Near lossless sweet spot |
| 6 | 25 | 41 KB | Visible loss, small file |
All output validates with dwebp 1.6.0.
Takeaways
- Internal round trip tests are necessary but not sufficient. If both your encoder and decoder share a bug, your tests will pass and your output will be wrong. Always test against an external reference decoder.
- Binary search is the most underrated debugging technique. "Works at 280 rows, fails at 281" is infinitely more useful than "fails on large images". Same for isolating which feature flag causes the failure.
- A single missing bit can corrupt everything downstream. Bit packed formats have no error recovery. One missing bit shifts every subsequent field by one position, and the decoder interprets garbage until it gives up.
- LLMs are great at implementing specs, less great at catching spec ambiguities. The VP8L spec says there's a meta Huffman flag but doesn't emphasize that it only appears at level 0. Claude Code implemented what the spec says; the bug was in what the spec implies.
- Pure Go implementations are worth the effort. No CGo means easy cross compilation, no shared library headaches, simpler deployment, and full control over the code. The performance is reasonable: encoding a 387x429 image at level 6 takes about 50ms on an M series Mac.
The code is at github.com/eringen/gowebper. go get github.com/eringen/gowebper@v0.1.1 and you're encoding WebP files in pure Go.