There is a lot of value in testing work-in-progress features in production. You discover edge cases by using real data and you can measure the performance impact in a production setting.
Of course, testing in production doesn't necessarily mean releasing an unfinished feature to all users. This is where feature flags come in, allowing you to enable or disable functionality for specific users or groups (say admins or testers).
I often implement feature flags from scratch, as the feature flag libraries out there do a lot I don't need. I usually don't need a user interface, or to store configuration in Redis, or to let non-developers toggle flags.
In one case, I thought we might need more complex scenarios in the future, so I started by persisting the feature flags in a database table. 5 years later, I realised we had never taken advantage of the extra flexibility and migrated the flags to static configuration. It proved harder than anticipated.
Feature flags in the database
When the configuration is in the database, you can potentially change it anytime. That also makes testing easier.
Consider this piece of production code:
<% if Feature.enabled?(:facebook_like) %>
<div class="fb-like">
<%# the like button is here %>
</div>
<% end %>
Tests can verify different scenarios by updating the feature flags table in the database.
scenario 'User sees a like button when enabled' do
user = users(:alice)
visit profile_path(user)
refute_css('.fb-like')
Feature.create!(name: 'facebook_like')
visit profile_path(user)
assert_css('.fb-like')
end
Feature.create!(name: 'facebook_like')
is what enables the flag. The test depends on an implementation detail (the flag is stored in the database), but that could be solved by a helper function. Overall it's very easy to enable or disable flags in tests.
There are a few downsides of this approach outside of the test environment.
Flags are disabled by default, which is a good safeguard. However that also means that you must remember to update the database after deploying code that uses a flag. That is a manual step for each deployment environment.
Additionally, reading flags incurs database reads, which makes response times slower.
Feature flags in a file
Storing flags in the application configuration makes the deployment story easier. The flag goes through code review and is then automatically updated during deployment.
As the configuration is under version control, you can also easily debug. When an error occurs in a release, you can check which flags are enabled for that release. If the flags are in the database, you may not have this information anymore unless you implemented an audit trail or include feature flag configuration in the error notification context.
Reading flags is also faster. You can store flags directly in a data structure in a class, or you can store them in a file and read that file once when the application starts. Then the flags are always in memory.
That is great for production, but makes testing harder. Let's look at a slightly modified version of the test from earlier:
scenario 'User sees a like button when enabled' do
user = users(:alice)
visit profile_path(user)
refute_css('.fb-like')
with_feature_enabled('facebook_like') do
visit profile_path(user)
assert_css('.fb-like')
end
end
with_feature_enabled
needs to modify the value of the flag for the scope of the block. As I said, the feature flag value is normally read once at application start and cannot be modified anymore. In a unit test I would override configuration by passing it as a constructor parameter. In an integration test as the one above, that's not possible.
That leads us to metaprogramming shenanigans.
Stubs
Feature.enabled?
can be called from any part of the code. The dirty secret to making the test above pass is to conditionally stub enabled?
for the duration of a block.
Why conditionally? Because we have to take into account the name of the flag. Let's say we had this structure:
if Feature.enabled?(:feature1)
# something
if Feature.enabled?(:feature2)
# ... something extra
end
end
We might want to test with feature1
enabled, but feature2
disabled, so we cannot return the same value for any call to Feature.enabled?
Minitest#stub
can help us. It takes three arguments: the method name, a "value or callable" and a block. From the documentation:
Add a temporary stubbed method replacing
name
for the duration of theblock
. Ifval_or_callable
responds to#call
, then it returns the result of calling it, otherwise returns the value as-is. If stubbed method yields a block,block_args
will be passed along. Cleans up the stub at the end of theblock
. The methodname
must exist before stubbing.
The unconditional stub would pass a value as the second argument:
Feature.stub :enabled?, true do
# Feature.enabled?(:feature1) returns true here
# So does Feature.enabled?(:feature2)
# In fact, all feature flags are enabled now
end
We can make the returned value depend on the parameter with a Proc (i.e. a "callable"):
callable = Proc.new do |arg|
arg.to_s == "feature1"
end
Feature.stub :enabled?, callable do
# calling Feature.enabled?(:feature1) returns true
# calling Feature.enabled?(:feature2) returns false
end
This code enables a certain flag, but disables all others. What we want is to control one flag, but "pass through" to the default configuration for any other flags.
Taking inspiration from the Minitest stub implementation itself, we can keep a copy of the original Feature.enabled?
method:
metaclass = Feature.instance_eval { class << self; self; end }
metaclass.alias_method :original_enabled?, :enabled?
We saved the method as Feature.original_enabled?
. That allows us to use it in the "callable" argument once the Feature.enabled?
method is overwritten by the stub:
callable = Proc.new do |arg|
if arg.to_s == "feature1"
true
else
Feature.original_enabled?(arg)
end
end
Feature.stub :enabled?, callable do
# Feature.enabled?(:feature1) returns true
# Other flags are untouched
end
Let's clean up original_enabled?
at the end:
metaclass.alias_method :enabled?, :original_enabled?
metaclass.undef_method :original_enabled?
Combining all of the above in a test helper gives us this monstrosity:
def with_feature_enabled(name, &block)
metaclass = Feature.instance_eval { class << self; self; end }
metaclass.alias_method :original_enabled?, :enabled?
stub = Proc.new do |arg|
if arg.to_s == name.to_s
true
else
Feature.original_enabled?(arg)
end
end
Feature.stub(:enabled?, stub, &block)
ensure
metaclass.alias_method :enabled?, :original_enabled?
metaclass.undef_method :original_enabled?
end
We could maybe have a simpler helper if we the Feature
class supported overriding flags. For example:
def with_feature_enabled(name)
Feature.overrides = { name => true}
yield
Feature.overrides = {}
end
However that would make the implementation of the Feature
class more complex. I consider that worse in this case, as this functionality would not be used outside of the test helper.
Conclusion
Unsurprisingly, the design that allows for dynamic toggling of flags is easier to use in tests.
I generally find ways to write tests without stubs, but I don't see a good alternative in the application from which I took the example. Using a global configuration (be it read from the database or a file) for the feature flags is the most ergonomic for production use in this case.
If there was a business need to override feature flags without deploying, I'd revert to database storage or control via URL query parameters. Both of those would allow removing stubs from the tests.
As it is, it's great that Ruby allows us to get our hands dirty when needed.