The Story Behind the Speed

I recently wrote about why we built our own OpenAPI linter at Speakeasy. At the time of writing, the headline benchmark was 1.7 seconds to lint an 81,000-line spec, compared to 30+ seconds for existing tools. That post covered the benchmarks and the competitive landscape.

This post is about something different: the engineering decisions that made that speed possible. Not clever tricks or late-stage optimizations, but the foundational choices that compounded into a fast system. Looking back, the most interesting thing is that I didn't set out to build a fast linter. I set out to build a well-structured one. Speed was the side effect.

It Starts with the Parser

The linter's performance story starts well before the first lint rule runs. It starts with the OpenAPI parser library that powers everything underneath. I wrote previously about building this library if you want the full deep dive, but the key design choice relevant here is the reflection-based unmarshaller.

OpenAPI models are just struct definitions. The unmarshaller uses reflection to walk those definitions and figure out how to parse YAML into them automatically. That means adding a new model type or extending an existing one is just defining a struct, there's no bespoke parsing code to write per model:

type OpenAPI struct {
    marshaller.Model[core.OpenAPI] // Reflection-based unmarshalling + core access

    OpenAPI    string
    Info       Info
    Paths      *Paths
    Webhooks   *sequencedmap.Map[string, *ReferencedPathItem]
    Components *Components
    Extensions *extensions.Extensions
    // ...
}

This is where the performance story begins. Because unmarshalling is generic and driven by reflection, optimizations like concurrent parsing apply to every model type at once. There's no chance of one model being fast and another slow because someone forgot to optimize its parser. And critically, YAML node pointers are collected as a natural byproduct of the unmarshalling process, not in a separate pass. Every model automatically gets references to its source nodes, which means the data can never get out of sync.

That last point pays off enormously downstream. When a lint rule finds an issue, it doesn't need to figure out where in the file the problem is. The line number and column are already there, attached to the node:

type Error struct {
    UnderlyingError error
    Node            *yaml.Node // Carries line/column from unmarshalling
    Severity        Severity
    Rule            string
}

func (e Error) GetLineNumber() int { return e.Node.Line }

No separate source-mapping pass. No re-parsing to find positions. The information flows through the system naturally because the unmarshaller collects it for every model, automatically. The models also provide both high-level typed access (like operation.GetDescription()) and low-level node access when you need to check what was actually present in the original document. Neither access pattern requires extra work because both are populated during the same unmarshalling pass.

Walk Once, Index Everything

Once you have a parsed document, the question becomes: how do you visit every node efficiently? The naive approach would be to let each lint rule traverse the document tree on its own. With over 60 rules at the time, that means walking the same tree 60+ times. That's a lot of wasted work.

Instead, I built a Walk API that implements a visitor pattern using Go's iterator protocol. It traverses every node in the document exactly once via depth-first traversal:

type WalkItem struct {
    Match    MatchFunc // Type-switch to handle this node
    Location Locations // Breadcrumb path through the document
    OpenAPI  *OpenAPI  // Reference to the root document
}

func Walk[T any](ctx context.Context, start *T) iter.Seq[WalkItem] {
    return func(yield func(WalkItem) bool) {
        if start == nil {
            return
        }
        walkFrom(ctx, start, yield)
    }
}

The iterator-based design means consumers can break out early if they find what they need, and nothing is allocated until the consumer asks for the next item. But the real power comes from what I built on top of the Walk API: the Index.

During a single walk of the document, I build a comprehensive index of every important node, pre-categorized by type:

type Index struct {
    Doc *OpenAPI

    // Schemas categorized by where they appear
    BooleanSchemas   []*IndexNode[*JSONSchemaReferenceable]
    InlineSchemas    []*IndexNode[*JSONSchemaReferenceable]
    ComponentSchemas []*IndexNode[*JSONSchemaReferenceable]
    ExternalSchemas  []*IndexNode[*JSONSchemaReferenceable]
    SchemaReferences []*IndexNode[*JSONSchemaReferenceable]

    // Same categorization for operations, parameters, responses...
    Operations          []*IndexNode[*Operation]
    InlinePathItems     []*IndexNode[*ReferencedPathItem]
    InlineParameters    []*IndexNode[*ReferencedParameter]
    InlineResponses     []*IndexNode[*ReferencedResponse]
    // ... and more
}

A critical detail: the Index stores pointers to existing nodes, not copies. The tracking maps use pointer identity (map[*Schema]bool) to avoid indexing the same node twice. The Index is a lens over the document, not a second copy of it. This keeps memory overhead minimal. An 81K-line spec doesn't need twice the memory just because you indexed it.

Build the index once, share it everywhere. One O(n) walk replaces what would otherwise be O(n x rules) traversals.

Rules That Don't Do Extra Work

With the Index in place, writing lint rules becomes almost trivial. Each rule gets direct access to exactly the collection of nodes it cares about. No traversal, no filtering, just iterate:

func (r *OperationTagDefinedRule) Run(
    ctx context.Context,
    docInfo *DocumentInfo[*openapi.OpenAPI],
    config *RuleConfig,
) []error {
    // Build lookup from pre-indexed tags
    globalTags := make(map[string]bool)
    for _, tagNode := range docInfo.Index.Tags {
        globalTags[tagNode.Node.Name] = true
    }

    // Check every operation's tags against global definitions
    for _, opNode := range docInfo.Index.Operations {
        for _, tagName := range opNode.Node.GetTags() {
            if !globalTags[tagName] {
                errs = append(errs, validation.NewValidationError(
                    config.GetSeverity(r.DefaultSeverity()),
                    RuleStyleOperationTagDefined,
                    fmt.Errorf("tag %q is not defined globally", tagName),
                    opNode.Node.GetRootNode(), // Line number for free
                ))
            }
        }
    }
    return errs
}

Notice the pattern: the rule cross-references two index collections (Tags and Operations), builds a quick lookup, and iterates. It never touches the document tree directly. It doesn't walk paths, then operations, then check siblings. All of that pre-processing happened once during index building.

This is the difference between O(n) linting (walk once, run all rules against the index) and O(n x rules) linting (each rule re-walks the tree). With dozens of rules, that's a substantial multiplier avoided.

Catch Errors Early: Validation During Unmarshalling

Not every issue needs a lint rule. A lot of problems with OpenAPI documents are structural: wrong types, duplicate keys, invalid YAML patterns. I catch these during unmarshalling itself, before the document is fully constructed.

For example, when parsing a YAML map, the unmarshaller detects duplicate keys in a pre-scan before it even starts processing values:

// Pre-scan for duplicate keys before concurrent processing
seenKeys := make(map[string]*keyInfo)
for i := 0; i < len(node.Content); i += 2 {
    key := node.Content[i].Value
    if existing, ok := seenKeys[key]; ok {
        duplicateKeyErrs = append(duplicateKeyErrs, validation.NewValidationError(
            validation.SeverityWarning,
            validation.RuleValidationDuplicateKey,
            fmt.Errorf("duplicate key %q at line %d", key, keyNode.Line),
            keyNode,
        ))
    }
    seenKeys[key] = &keyInfo{firstLine: keyNode.Line, lastIndex: i / 2}
}

Type mismatches (expecting a mapping node, getting a scalar), missing required fields, invalid references -- all of these surface during unmarshalling with precise line numbers. This eliminates whole categories of lint rules that other tools need. If the document can't even parse correctly, those errors are already captured before linting begins. It also means the lint rules can trust that the document they receive is structurally sound, which simplifies their logic considerably.

Parallelism Where It Counts

There are two natural parallelism opportunities in a linter pipeline, and I took both.

First, during unmarshalling, map key-value pairs are processed concurrently. Each pair is independent (after duplicate detection), so they can unmarshal in parallel using Go's errgroup, with results collected into pre-allocated slots and written back sequentially:

g, ctx := errgroup.WithContext(ctx)
valuesToSet := make([]keyPair, numJobs)

for i := 0; i < len(node.Content); i += 2 {
    g.Go(func() error {
        // Unmarshal each key-value pair independently
        validationErrs, err := UnmarshalKeyValuePair(ctx, keyNode, valueNode, target)
        valuesToSet[i/2] = keyPair{key: key, value: target}
        return err
    })
}
g.Wait()
// Sequential write-back after all goroutines complete

Second, all lint rules run concurrently. Since rules have read-only access to the document and index (which are built once and shared immutably), they parallelize trivially. Each rule runs in its own goroutine, collecting errors independently with a mutex for the shared results slice:

var wg sync.WaitGroup
for _, rule := range enabledRules {
    wg.Add(1)
    go func(r RuleRunner, cfg RuleConfig) {
        defer wg.Done()
        ruleErrs := r.Run(ctx, docInfo, &cfg)
        mu.Lock()
        errs = append(errs, ruleErrs...)
        mu.Unlock()
    }(rule, ruleConfig)
}
wg.Wait()

The design made parallelism almost free. Because the Index is immutable after construction and rules don't modify the document, there are no race conditions to reason about. The hardest part of concurrent programming (shared mutable state) was designed away before it became a problem.

Speed and DX Are the Same Thing

Looking back at these decisions, the thing that strikes me most is that none of them were driven by performance benchmarks. They were driven by developer experience.

I built the dual-access model because I wanted rule authors to have ergonomic access to whatever level of detail they needed. I built the Walk API because I wanted a clean, composable way to traverse documents. I built the Index because I didn't want rules to contain boilerplate traversal code. I moved validation into unmarshalling because it was the natural place to catch structural errors.

Making the right data easy to access turned out to be the same thing as making it fast to access.

When a rule author can write docInfo.Index.Operations instead of manually traversing paths, iterating operations, and handling edge cases, they write simpler code. That simpler code also happens to be faster because it avoids redundant work. The ergonomic path and the performant path converged.

This isn't unique to linters. I think it's a general principle: if you find yourself choosing between clean APIs and fast ones, the abstractions might be wrong. Good abstractions make the common case both easy and efficient. When they don't, it's often a signal that the abstraction boundary is in the wrong place.

The Compound Effect

None of these decisions work in isolation. They compound:

The parser unmarshals generically, collecting YAML nodes and running validation as it goes
The Walk API ensures a single, thorough traversal of every node
The Index transforms that traversal into pre-categorized, O(1)-accessible collections
Rules consume the Index directly, doing zero redundant work
Parallelism in both parsing and linting exploits the immutability guarantees the design provides

Remove any one of these and the system still works, but the compounding breaks. If the Index didn't exist, rules would need to walk the document. If the parser didn't preserve YAML nodes, rules would need a source-mapping pass. Each layer enables the next.

Try It Yourself

If any of this resonates with problems you're solving:

The library is open source -- explore the Walk API, Index, and rule implementations directly
The Speakeasy linter post has benchmarks and comparisons if you want the numbers
The parser deep dive covers the reflection-based unmarshaller in more detail

The broader takeaway isn't specific to linters or OpenAPI. If you're building any kind of analysis tool -- a linter, a code generator, a validator -- consider how much work you can push into the parsing and indexing phases. The less each downstream consumer has to do, the faster and simpler the whole system gets.

Have thoughts on this? Join the conversation on LinkedIn.

Building a Fast OpenAPI Linter: Design Decisions That Matter