Rails Template Scripts: Manipulating database.yml

This is the second in a series of posts on building powerful and resilient Rails application templates. In the previous post, we discussed parsing and emitting YAML files with comments. Today, we will be looking at manipulating the database.yml file’s AST.


At the end of the previous post, we had a way to parse a YAML file or string and emit partial YAML content. We needed to emit partial YAML content to leave comment blocks in place. We also needed to ensure that our implementation could handle inline commented out lines in the YAML content. To achieve all of this, we created a DatabaseYAML class:

class DatabaseYAML
COMMENT_REGEX = /^([ \t]+)#\s?(.*?)$/.freeze
 
def initialize(yaml)
@yaml = yaml
@comment_cache = {}
end
 
def parse_yaml_with_comments(yaml)
matchdata = yaml.match COMMENT_REGEX
parseable = if matchdata
commented, indentation, content = matchdata.to_a
uncommented = indentation + content
@comment_cache[uncommented] = commented
yaml.gsub(commented, uncommented)
else
yaml
end
Psych.parse_stream(parseable)
end
 
def emit_pair(scalar, mapping)
emission_stream = Psych::Nodes::Stream.new
emission_document = Psych::Nodes::Document.new
emission_mapping = Psych::Nodes::Mapping.new
emission_mapping.children.concat [scalar, mapping]
emission_document.children.concat [emission_mapping]
emission_stream.children.concat [emission_document]
output = emission_stream.yaml.gsub!(/^---/, '').strip!
@comment_cache.each do |uncommented, commented|
output.gsub!(uncommented, commented)
end
output
end
end

In this post, let’s dig into some actual AST manipulations. One of the most important take-aways that I have had from this project is that limiting your problem space is key to making progress. In this case, we are only interested in manipulating the database.yml file. This means that we can make some assumptions about the structure of the file. We can also make some assumptions about the types of manipulations that we will need to make. This allows us to focus on the problem at hand and not get bogged down in the details of YAML parsing and emitting.

Trust me, trying to write a completely generic YAML manipulation engine is a fool’s errand. YAML is a complex format with many edge cases. By limiting our scope to a specific file and a specific set of manipulations, we can make progress much more quickly.

Defining a new database definition

The first manipulation to tackle is adding a new database definition to the database.yml file. This is a common task when setting up a new Rails application, especially when using SQLite, where having a separate database file for each IO-bound component helps avoid write contention. We want to add a new database definition to the database.yml file. This is a simple task, but it is a good starting point for our manipulation engine.

Let’s start with an example of the YAML output we need to generate:

name: &name
<<: *default
migrations_paths: db/name_migrate
database: storage/<%= Rails.env %>-name.sqlite3

We want to generate a new top-level mapping that has an anchor (so that we can reference this mapping in our environment definitions), inherits the default database configuration, defines a separate directory to hold migrations for this database, and specifies the location of the database file.

To achieve this, we need to create a new method in our DatabaseYAML class. This method will take the name of the new database and return the YAML content for the new database definition. We will call this method new_database:

def new_database(name)
db = Psych::Nodes::Mapping.new(name)
db.children.concat [
Psych::Nodes::Scalar.new("<<"),
Psych::Nodes::Alias.new("default"),
Psych::Nodes::Scalar.new("migrations_paths"),
Psych::Nodes::Scalar.new("db/#{name}_migrate"),
Psych::Nodes::Scalar.new("database"),
Psych::Nodes::Scalar.new("storage/<%= Rails.env %>-#{name}.sqlite3"),
]
emit_pair(Psych::Nodes::Scalar.new(name), db)
end

A couple key points to note here:

Once you know the structure of the AST, it is relatively straightforward to build up the AST for the new database definition. The emit_pair method will take care of emitting the YAML string for the new database definition. If you execute something like puts new_database("name") in an IRB console, you should see the following output:

new: &new
<<: *default
migrations_paths: db/new_migrate
database: storage/<%= Rails.env %>-new.sqlite3

Adding a new database definition to environment configurations

Once you have a database defined in the database.yml file, you need to “activate” it by adding it to the environment configurations. That means, we need to make use of Rails’ multiple database support. You see, defining a new database definition is inert until you tell Rails that this database should be used in a specific environment. In order to use multiple databases in a single environment, you have to define the environment(s) using a three-tiered configuration structure. This is the example given in the Rails Guides:

production:
primary:
database: my_primary_database
username: root
password: <%= ENV['ROOT_PASSWORD'] %>
adapter: mysql2
primary_replica:
database: my_primary_database
username: root_readonly
password: <%= ENV['ROOT_READONLY_PASSWORD'] %>
adapter: mysql2
replica: true
animals:
database: my_animals_database
username: animals_root
password: <%= ENV['ANIMALS_ROOT_PASSWORD'] %>
adapter: mysql2
migrations_paths: db/animals_migrate
animals_replica:
database: my_animals_database
username: animals_readonly
password: <%= ENV['ANIMALS_READONLY_PASSWORD'] %>
adapter: mysql2
replica: true

Under the production environment key, you have a hash of database name keys, and under each of those is a hash of configuration options. Now, this three-tiered structure can be simplified using YAML anchors and aliases. So, in the same way that the default development environment config inherits from the default database configuration:

development:
<<: *default
database: storage/development.sqlite3

You can use the same technique to simplify a three-tiered environment configuration:

development:
primary:
<<: *default
database: storage/development.sqlite3
new: *new

So, this is the second transformation we need to implement. And, to keep the implementation simple, we will simply add the database to every environment defined in the database.yml file. Let’s add a new method to our DatabaseYAML class called add_database and get to work:

class DatabaseYAML
def add_database(environment, name)
# implementation goes here
end
end

There are essentially 2 steps to this implementation:

  1. Find the environment configurations in the YAML AST.
  2. Add the new database definition to each environment configuration.

Let’s start with the first. How do we identify the environment configurations in the YAML? I don’t want to simply use the development, test, and production names, because what if an app has a staging environment defined? Or some other custom environment? So, we can’t rely on names, what is the structure of the AST that we can use to identify environment configurations?

Well, we know that the database.yml file only has two kinds of top-level mappings: database configurations and environment configurations. The database configurations are easy to identify because they have an anchor. So, the environment configurations are the ones that don’t have an anchor. This is the key insight we need to identify environment configurations in the YAML AST.

So, we want to get to the root node, iterate over the pairs of children, and select those pairs that are a Scalar plus Mapping pair where the Mapping doesn’t have an anchor. Here’s how you can do that:1

def add_database(name)
root = @stream.children.first.root
root.children.each_slice(2).map do |scalar, mapping|
next unless scalar.is_a?(Psych::Nodes::Scalar)
next unless mapping.is_a?(Psych::Nodes::Mapping)
next unless mapping.anchor.nil? || mapping.anchor.empty?
 
# implementation goes here
end.compact!
end

With this in place, we can now focus on adding the new database definition to each environment configuration. But, we need to be thoughtful. This script can be run as a part of rails new or against an existing database. This means that the database.yml config might already be in a three-tiered structure. We need to handle both two-tiered and three-tiered environment configurations.

Again, we can rely on some conventions to help us navigate the AST and figure out which structure we are dealing with. In a two-tiered environment configuration, the first key-value pair in the mapping will have a scalar with the value << (to inherit from the default database configuration). Otherwise, we can presume we are dealing with a three-tiered environment configuration:

def add_database(name)
root = @stream.children.first.root
root.children.each_slice(2).map do |scalar, mapping|
# ...
 
if mapping.children.first.value == "<<" # 2-tiered environment
# implementation goes here
else # 3-tiered environment
# implementation goes here
end
end.compact!
end

When dealing with a two-tiered environment configuration, we need to shift the whole environment configuration to a three-tiered config and then add the new database definition to the mapping. We can do this by making the existing mapping the value of a primary key, then adding a key-value alias for the database beneath that:

new_mapping = Psych::Nodes::Mapping.new
new_mapping.children.concat [
Psych::Nodes::Scalar.new("primary"),
mapping,
Psych::Nodes::Scalar.new(name),
Psych::Nodes::Alias.new(name),
]

When dealing with a three-tiered environment configuration, we can simply add the new database definition to the mapping:

new_mapping = Psych::Nodes::Mapping.new
new_mapping.children.concat mapping.children
new_mapping.children.concat [
Psych::Nodes::Scalar.new(name),
Psych::Nodes::Alias.new(name),
]

In both cases we work with a new AST node, instead of mutating to the existing node, so that we can easily emit the YAML content for both the original and the new content. We can then replace the existing mapping with the new mapping in the YAML source via a simple find and replace. Putting it all together, the full add_database method looks like this:

def add_database(name)
root = @stream.children.first.root
root.children.each_slice(2).map do |scalar, mapping|
next unless scalar.is_a?(Psych::Nodes::Scalar)
next unless mapping.is_a?(Psych::Nodes::Mapping)
next unless mapping.anchor.nil? || mapping.anchor.empty?
# skip if the environment already has the database definition
next if mapping.children.each_slice(2).any? do |key, value|
key.is_a?(Psych::Nodes::Scalar) && key.value == name && value.is_a?(Psych::Nodes::Alias) && value.anchor == name
end
 
new_mapping = Psych::Nodes::Mapping.new
if mapping.children.first.value == "<<" # 2-tiered environment
new_mapping.children.concat [
Psych::Nodes::Scalar.new("primary"),
mapping,
Psych::Nodes::Scalar.new(name),
Psych::Nodes::Alias.new(name),
]
else # 3-tiered environment
new_mapping.children.concat mapping.children
new_mapping.children.concat [
Psych::Nodes::Scalar.new(name),
Psych::Nodes::Alias.new(name),
]
end
 
old_environment_entry = emit_pair(scalar, mapping)
new_environment_entry = emit_pair(scalar, new_mapping)
 
[scalar.value, old_environment_entry, new_environment_entry]
end.compact!
end

As you can see, simplifying the problem space and relying on the conventions of the database.yml file made this implementation approachable. Trying to account for every possible variation possible in generic YAML would have been a nightmare, to be frank. So, yes, we can’t mutate any YAML content in any kind of way, but we can make the changes we needed in a sufficiently robust and resilient manner. And, it took far less time to implement than trying to tackle the problem in a generic way.

All totalled, we have built a DatabaseYAML class that can parse a YAML file or string, emit partial YAML content, handle inline commented out lines in the YAML content, define new database configurations, and add them to the environment configurations in the YAML content. And, we did all of this without mutating the YAML AST directly. We used the AST as a read-only data structure and built new AST nodes to represent the changes we wanted to make, which we then emit as strings so that updates can be done more surgically as find and replace operations.

Hopefully some of the lessons and techniques will prove useful to you if you ever need to manipulate YAML content in a similar way. And, if you have any questions or comments, please feel free to reach out to me on Twitter at @fractaledmind. In following posts, we will turn from YAML parsing, to some other aspects of building a high-quality Rails application template script, like testing and configuration.

  1. @stream here is an instance variable that holds the result of the Psych.parse_stream(yaml) call.