When we first built the app that became Keylime Toolbox, we used Ruby on Rails because it was what the team was familiar with and it gave us lots of tools to get started quickly. Along the way, as we grew, we knew we were going to run into the dreaded “monolithic Rails app” problem if we didn’t separate out our code. So early on we created our own gems to segregate common, “low churn” code, and also spun up a second Rails app for a service interface as we added more data. We learned a whole lot in that first service including that data should not be segregated along a dimension, but that asynchronous and long-running tasks (batch data loading) make a great seam to split on.
Fast-forward a year or two as we brought that same asynchronous data loading to bear on a new set of data and we decided to take another stab at breaking up the app. With the previous service we had put data loading and analytics into a single service that “owned” the data source. This time around I wanted to split that up. Following the command-query separation pattern, I wanted a service that was responsible for data loading (and everything associated with that), “command” (or “mutate”), but I wanted to keep the analytics part, “query”, in the main analytics app. In order to do that I needed a database that was shared between the service and the main app.
Many companies (e.g. Pivotal Labs, Task Rabbit) have broken up their Rails app and a common method is to use Rails engines. As I had a shared database (the “analytics” database in the diagram above), it made sense to me to share the ActiveRecord models between the two main app and the processing service. A Rails engine is just a Rail app, packaged as a gem, that runs in another app, so this provided the structure to share models.
The first question people ask when adding models into an engine, is, “What do I do about migrations?” The Rails default behavior is to copy the migrations into the app that hosts the engine and run them as part of the app deployment. This assumes that the database for the engine is the same as the database for the app and that no apps share a database. This is a general Rails assumption (each app has a single dedicated database) and it makes sense as a rule of thumb. For example, if you were to tack a forum onto your app, you would want all those tables installed into your app’s database. That wasn’t going to work for me.
Pivotal Labs, as part of one of their “break up the monolith” projects, declared instead that you should “leave your migrations in your Rails engines.” This sounded great, except that they were still running the migrations in the main app against a single database. Not the solution for me.
What I needed was an engine that exposed migrations that could be run from one of the two apps (we decided the processing service because data loading precedes data reading) and would run the migrations in a database that was not the app’s primary database. In the digram above, you can see that each app has a primary database that deals with its own transactional data. In the main analytics app, that’s configuration types of things. In the processing service, that’s audit trails of data loading and details of data state.
So I wanted to keep the migrations in the engine gem and run them when we deployed. Following the ideas in this StackOverflow post I realized that the migrations needed to be separate from the app’s migration, and the schema.rb
(or in our case the structure.sql
) needed to be a different file (because it is a completely different database). As covered in the linked Rails issue, I decided that we really needed to completely separate the migration management, which actually made things a lot simpler. If the engine exposes Rake tasks to run its own migrations (with a database named by convention) then it can own the migrations, the schema file and everything else. And we can decide when and where to run those migrations. Which is a clean, simple solution.
The Goal
Here’s how this works when it is all done. In each app, we add a reference to the shared analytics database.
# config/database.yml analytics-development: <<: *default database: analytics-development analytics-test: <<: *default database: analytics-test
Then to run the migrations (in development or as part of our deployment scripts) we just have a namespaced Rake command:
$ rake analytics:db:migrate
Building the Engine
To get this all working, I needed the shared models to connect to the shared database. So they all inherit from a common base class to establish the connection (and are namespaced following engine best practice).
module Analytics class AnalyticsModel < ActiveRecord::Base self.abstract_class = true establish_connection :"analytics-#{Rails.env}" end end
I added the db/migrate
folder to the gem and created migrations there as you normally would (e.g. rails generate migrations ...
).
Finally, I added the custom Rake tasks to handle migration management.
namespace :analytics do # Custom migration tasks to manage migrating the engine's dedicated database. namespace :db do desc 'Migrates the analytics-* database' task :migrate => :environment do with_engine_connection do ActiveRecord::Migrator.migrate(File.expand_path("../../../db/migrate", __FILE__), ENV['VERSION'].try(:to_i)) end Rake::Task['analytics:db:schema:dump'].invoke end task :'schema:dump' => :environment do require 'active_record/schema_dumper' with_engine_connection do File.open(File.join(Rails.root, 'db', 'analytics_schema.rb'), 'w') do |file| ActiveRecord::SchemaDumper.dump ActiveRecord::Base.connection, file end end end task :'schema:load' => :environment do with_engine_connection do load File.join(Rails.root, 'db', 'analytics_schema.rb') end end end end # Hack to temporarily connect AR::Base to your engine. def with_engine_connection original = ActiveRecord::Base.remove_connection ActiveRecord::Base.establish_connection "analytics-#{Rails.env}".to_sym yield ensure ActiveRecord::Base.establish_connection original end
That’s all there is to it. Sometimes, it is about finding a simple solution and avoiding making things “clever”.
Testing
I wrote unit tests for the models, but these require connecting the the database. The gem only has the shared database, so I just need to create the test database on my dev box and run the migrations (with the task I created). I ran these commands (and added it to the README
file for the gem):
$ psql -c 'CREATE DATABASE "analytics-test" with owner <your-username> encoding='"'"UTF8"'"';' postgres $ rake analytics:db:migrate
That made the appropriate database available for testing and then my models could be tested.
It also generated a db/schema.rb
file, which I didn’t want checked in, so I list that in my .gitignore
to exclude it for the project.
But how do I test that that an application that incorporates the gem can actually talk to the database? Easy. The scaffolded gem includes a complete “dummy” Rails app in the test folder. So following what I will have to do when I incorporate this into our two apps, I added the database references to spec/dummy/config/database.yml
. Then I wrote a couple controllers in the dummy app just to test read/write against the shared models. In the specs, I added integration tests (using Capybara) that verify that I can read and write through the models.
One final note. We use factory_girl for creating fixtures for testing. Because we namespaced the model, though, factory_girl can’t figure out how to find the class just from the name of the factory. So I had to add the class
attribute to the factory to help it out.
FactoryGirl.define do factory :log_stat, class: 'Analytics::LogStat' do date { Date.today } end end
Incidentally, these factories are only available to the gem for its own testing. I haven’t investigated it yet, but it would be great to expose these the the containing app so that it could use them for integration tests as well. That might just have to be a generator, because I want them exposed to the test framework for the containing app, but not to be loaded into the app itself when deployed.