What's the difference between feature flags and feature toggles?

They're essentially the same thing—both refer to conditional code paths controlled by external configuration. 'Feature toggle' often implies short-lived release toggles, while 'feature flag' encompasses longer-lived use cases like A/B testing and permission controls.

How do I test code that depends on feature flags?

Use dependency injection to pass behavior variants into your classes rather than checking flags inside business logic. In tests, inject mocks or stubs directly. For integration tests, use an in-memory feature flag store rather than calling external services.

What's the best way to clean up old feature flags?

Enforce mandatory TTLs (time-to-live) for every flag with CI failures when flags expire. Use static analysis to find dead flag references, and ensure every flag has a named owner responsible for its lifecycle. Make cleanup easier than creation.

How Do You Implement Feature Flags Without Scattering Conditionals Throughout Your Codebase?

What Are Feature Flags Actually Doing to Your Architecture?

Feature flags start simple—a quick if (flagEnabled) here, a config toggle there. Six months later, you're staring at a codebase littered with dead branches, mysterious booleans wrapped in triple negatives, and logic that's impossible to trace without cross-referencing a dozen external services. This post shows you architectural patterns for implementing feature flags that stay maintainable at scale—techniques that keep your business logic clean while still giving product teams the flexibility they need.

Most developers first encounter feature flags as "feature toggles"—a way to merge incomplete code to main without exposing it to users. But that's just the tip of the iceberg. In practice, teams use flags for gradual rollouts (releasing to 1% of users, then 10%, then everyone), A/B testing different implementations, circuit breakers for failing external services, and even kill switches for performance degradation. Each use case has different requirements for latency, consistency, and rollback speed. Treating them all the same is where the trouble starts.

The real problem isn't the flags themselves—it's the architectural decisions (or lack thereof) around where flag-checking logic lives. When you sprinkle if (await featureFlags.isEnabled('new-checkout')) throughout your controllers, services, and templates, you're creating an invisible dependency graph. Your code no longer describes what it does; it describes what it might do, depending on external state you can't see in the file. This "toggle debt" accumulates silently until someone realizes you have 400 active flags and no idea which ones are safe to remove.

Where Should Feature Flag Logic Live in Your Codebase?

The first rule of sustainable feature flags is simple: keep them out of your business logic. Not "minimize them"—keep them out entirely. Your domain code should express what the system does, not what it might do depending on a remote config value. This sounds impossible until you look at the branch by abstraction pattern.

Here's how it works. Instead of this:

async function processPayment(order) {
  if (await featureFlags.isEnabled('new-payment-flow')) {
    return await newPaymentProcessor.process(order);
  } else {
    return await oldPaymentProcessor.process(order);
  }
}

You write this:

class PaymentService {
  constructor(paymentProcessor) {
    this.processor = paymentProcessor;
  }
  
  async processPayment(order) {
    return this.processor.process(order);
  }
}

The factory or dependency injection container decides which processor to inject based on the feature flag—not the business logic. Your PaymentService stays pure, testable, and oblivious to the fact that a flag even exists. When it's time to remove the old path, you delete one line in your DI configuration and the old implementation class. No grep-ing through controllers looking for stray conditionals.

This approach scales because it treats feature flags as infrastructure concerns, not application logic. Just like you wouldn't let your domain code decide which database connection pool to use, you shouldn't let it decide which code path to execute based on feature availability. The decision happens at the composition root—the place where you wire together your application's components. This keeps your business rules coherent and your feature flag surface area small and manageable.

How Do You Clean Up Feature Flags Once They're Released?

The dirty secret of feature flagging is that most flags never die. Engineers move on to new tickets, PMs forget the flag exists, and that "temporary" toggle for the new login flow becomes permanent infrastructure—untouched for three years, feared by everyone who looks at it. The only way to prevent this is treating flag removal as a first-class concern from day one.

Start with a mandatory time-to-live (TTL) for every flag. When someone creates a flag in your system, they should be forced to specify an expiration date—30 days for a release toggle, maybe 90 for an experiment. When that date hits, the system should start warning you. After a grace period, it should fail your CI builds until you either extend the TTL (with justification) or remove the flag. This sounds aggressive, but it's the only thing that works at scale. Martin Fowler's comprehensive guide on feature toggles emphasizes this point: flags are technical debt the moment they're created, and you need a plan to pay that debt down.

Automated detection helps too. Static analysis tools can find flag checks that reference keys not present in your feature flag service—dead code waiting to be deleted. Some teams run "flag coverage" reports showing which flags are evaluated in production but never change state—strong candidates for hardcoding and removal. The goal is making flag cleanup easier than flag creation, reversing the usual incentive structure.

Ownership matters just as much as tooling. Every flag needs a named owner—an actual human with a Slack handle—who's responsible for its lifecycle. When that person leaves the company, the flag gets reassigned or deleted. No orphan flags allowed. This social contract, enforced by automated reminders, prevents the gradual accumulation of "flags nobody dares touch."

What Happens When Your Feature Flag Service Goes Down?

Here's a scenario that keeps engineers awake at night: your feature flag service—LaunchDarkly, Unleash, or that internal service Dave wrote—experiences an outage. Every flag check starts failing. If you haven't planned for this, your application probably starts throwing errors or defaulting every flag to "off," potentially disabling critical functionality that was working fine moments ago. Your feature flag infrastructure just became a single point of failure.

Resilient flag implementations plan for this explicitly. The client library should cache flag values locally—either in memory, in a local file, or in your application's database. When the remote service is unreachable, you fall back to those cached values rather than failing open or closed. Better yet, you should specify default values for each flag at the call site—not globally, but contextually. A "new dashboard" feature might safely default to "off" during an outage, but a "circuit breaker for the payment provider" flag probably needs to default to "on" to prevent cascading failures.

Some teams go further by treating flag configuration as code—checking flag definitions into version control and deploying them alongside the application. LaunchDarkly's best practices documentation recommends hybrid approaches where critical flags are available locally while dynamic targeting rules live in the service. This means you can still deploy emergency changes without full dependency on an external service.

How Do You Test Code That's Wrapped in Feature Flags?

Testing flag-driven code presents a specific challenge: your test suite needs to verify behavior with the flag both enabled and disabled, but without making network calls to a feature flag service. The solution lies in abstraction and test doubles.

Remember that dependency injection approach? It pays dividends in testing. When your business logic receives its behavior variants through constructor injection, your tests can simply inject mocks or stubs rather than dealing with flag state at all. You're testing the "new payment processor" or "old payment processor" logic directly—not the flag infrastructure around it.

For integration tests, you need a local feature flag store—an in-memory implementation that your tests can manipulate directly. Never hit the real feature flag service in tests; it's slow, flaky, and couples your test suite to external state. Instead, configure your test runner to use a "test double" flag client that starts with known values and can be mutated within individual test cases. This keeps your tests fast and deterministic while still exercising the flag-evaluation paths through your code.

The most sophisticated teams run their CI pipeline twice for flag-heavy codebases: once with all flags defaulted to "on," once with all flags defaulted to "off." This catches the edge cases where flags interact in unexpected ways—when flag A being on and flag B being off creates an impossible state that nobody considered. It's expensive, but cheaper than discovering that combination in production.

When Should You Skip Feature Flags Entirely?

Not every change needs a flag. In fact, over-flagging creates as many problems as under-flagging. Small UI tweaks, bug fixes, and purely additive changes (new API endpoints that don't break existing ones) often don't warrant the complexity. The cost of the flag—in cognitive overhead, testing burden, and eventual cleanup—must be weighed against the risk of the change.

Feature flags make sense when: rolling back requires a deployment (long build times, slow app store reviews), the change affects critical paths (payments, authentication, data integrity), or you're running an experiment where you need to measure different behaviors. They don't make sense for fixing a typo, adding a new admin endpoint, or changing a button color that's already behind a permission check. Trunk-Based Development's guide on feature flags recommends starting with the assumption that you won't use a flag, then adding one only when the deployment risk justifies it.

The teams that succeed with feature flags treat them as sharp tools—powerful when used correctly, dangerous when used carelessly. They build architectural guardrails that keep flag logic out of business code, automate the drudgery of flag cleanup, and plan for failure modes that less experienced teams ignore. Your feature flag strategy should make your codebase more flexible, not more complicated.