There is a lot of value in testing work-in-progress features in production. You discover edge cases by using real data and you can measure the performance impact in a production setting.

Of course, testing in production doesn't necessarily mean releasing an unfinished feature to all users. This is where feature flags come in, allowing you to enable or disable functionality for specific users or groups (say admins or testers).

I often implement feature flags from scratch, as the feature flag libraries out there do a lot I don't need. I usually don't need a user interface, or to store configuration in Redis, or to let non-developers toggle flags.

In one case, I thought we might need more complex scenarios in the future, so I started by persisting the feature flags in a database table. 5 years later, I realised we had never taken advantage of the extra flexibility and migrated the flags to static configuration. It proved harder than anticipated.

Feature flags in the database

When the configuration is in the database, you can potentially change it anytime. That also makes testing easier.

Consider this piece of production code:

<% if Feature.enabled?(:facebook_like) %>
  <div class="fb-like">
    <%# the like button is here %>
  </div>
<% end %>

Tests can verify different scenarios by updating the feature flags table in the database.

scenario 'User sees a like button when enabled' do
  user = users(:alice)

  visit profile_path(user)
  refute_css('.fb-like')

  Feature.create!(name: 'facebook_like')
  visit profile_path(user)
  assert_css('.fb-like')
end

Feature.create!(name: 'facebook_like') is what enables the flag. The test depends on an implementation detail (the flag is stored in the database), but that could be solved by a helper function. Overall it's very easy to enable or disable flags in tests.

There are a few downsides of this approach outside of the test environment.

Flags are disabled by default, which is a good safeguard. However that also means that you must remember to update the database after deploying code that uses a flag. That is a manual step for each deployment environment.

Additionally, reading flags incurs database reads, which makes response times slower.

Feature flags in a file

Storing flags in the application configuration makes the deployment story easier. The flag goes through code review and is then automatically updated during deployment.

As the configuration is under version control, you can also easily debug. When an error occurs in a release, you can check which flags are enabled for that release. If the flags are in the database, you may not have this information anymore unless you implemented an audit trail or include feature flag configuration in the error notification context.

Reading flags is also faster. You can store flags directly in a data structure in a class, or you can store them in a file and read that file once when the application starts. Then the flags are always in memory.

That is great for production, but makes testing harder. Let's look at a slightly modified version of the test from earlier:

scenario 'User sees a like button when enabled' do
  user = users(:alice)

  visit profile_path(user)
  refute_css('.fb-like')

  with_feature_enabled('facebook_like') do
    visit profile_path(user)
    assert_css('.fb-like')
  end
end

with_feature_enabled needs to modify the value of the flag for the scope of the block. As I said, the feature flag value is normally read once at application start and cannot be modified anymore. In a unit test I would override configuration by passing it as a constructor parameter. In an integration test as the one above, that's not possible.

That leads us to metaprogramming shenanigans.

Stubs

Feature.enabled? can be called from any part of the code. The dirty secret to making the test above pass is to conditionally stub enabled? for the duration of a block.

Why conditionally? Because we have to take into account the name of the flag. Let's say we had this structure:

if Feature.enabled?(:feature1)
  # something

  if Feature.enabled?(:feature2)
    # ... something extra
  end
end

We might want to test with feature1 enabled, but feature2 disabled, so we cannot return the same value for any call to Feature.enabled?

Minitest#stub can help us. It takes three arguments: the method name, a "value or callable" and a block. From the documentation:

Add a temporary stubbed method replacing name for the duration of the block. If val_or_callable responds to #call, then it returns the result of calling it, otherwise returns the value as-is. If stubbed method yields a block, block_args will be passed along. Cleans up the stub at the end of the block. The method name must exist before stubbing.

The unconditional stub would pass a value as the second argument:

Feature.stub :enabled?, true do
 # Feature.enabled?(:feature1) returns true here
 # So does Feature.enabled?(:feature2)
 # In fact, all feature flags are enabled now
end

We can make the returned value depend on the parameter with a Proc (i.e. a "callable"):

callable = Proc.new do |arg|
  arg.to_s == "feature1"
end

Feature.stub :enabled?, callable do
 # calling Feature.enabled?(:feature1) returns true
 # calling Feature.enabled?(:feature2) returns false
end

This code enables a certain flag, but disables all others. What we want is to control one flag, but "pass through" to the default configuration for any other flags.

Taking inspiration from the Minitest stub implementation itself, we can keep a copy of the original Feature.enabled? method:

metaclass = Feature.instance_eval { class << self; self; end }
metaclass.alias_method :original_enabled?, :enabled?

We saved the method as Feature.original_enabled?. That allows us to use it in the "callable" argument once the Feature.enabled? method is overwritten by the stub:

callable = Proc.new do |arg|
  if arg.to_s == "feature1"
    true
  else
    Feature.original_enabled?(arg)
  end
end

Feature.stub :enabled?, callable do
  # Feature.enabled?(:feature1) returns true
  # Other flags are untouched
end

Let's clean up original_enabled? at the end:

metaclass.alias_method :enabled?, :original_enabled?
metaclass.undef_method :original_enabled?

Combining all of the above in a test helper gives us this monstrosity:

def with_feature_enabled(name, &block)
  metaclass = Feature.instance_eval { class << self; self; end }
  metaclass.alias_method :original_enabled?, :enabled?

  stub = Proc.new do |arg|
    if arg.to_s == name.to_s
      true
    else
      Feature.original_enabled?(arg)
    end
  end

  Feature.stub(:enabled?, stub, &block)
ensure
  metaclass.alias_method :enabled?, :original_enabled?
  metaclass.undef_method :original_enabled?
end

We could maybe have a simpler helper if we the Feature class supported overriding flags. For example:

def with_feature_enabled(name)
  Feature.overrides = { name => true}
  yield
  Feature.overrides = {}
end

However that would make the implementation of the Feature class more complex. I consider that worse in this case, as this functionality would not be used outside of the test helper.

Conclusion

Unsurprisingly, the design that allows for dynamic toggling of flags is easier to use in tests.

I generally find ways to write tests without stubs, but I don't see a good alternative in the application from which I took the example. Using a global configuration (be it read from the database or a file) for the feature flags is the most ergonomic for production use in this case.

If there was a business need to override feature flags without deploying, I'd revert to database storage or control via URL query parameters. Both of those would allow removing stubs from the tests.

As it is, it's great that Ruby allows us to get our hands dirty when needed.