This is the second in a series of posts on building powerful and resilient Rails application templates. In the previous post, we discussed parsing and emitting YAML files with comments. Today, we will be looking at manipulating the database.yml
file’s AST.
At the end of the previous post, we had a way to parse a YAML file or string and emit partial YAML content. We needed to emit partial YAML content to leave comment blocks in place. We also needed to ensure that our implementation could handle inline commented out lines in the YAML content. To achieve all of this, we created a DatabaseYAML
class:
class DatabaseYAML COMMENT_REGEX = /^([ \t]+)#\s?(.*?)$/.freeze def initialize(yaml) @yaml = yaml @comment_cache = {} end def parse_yaml_with_comments(yaml) matchdata = yaml.match COMMENT_REGEX parseable = if matchdata commented, indentation, content = matchdata.to_a uncommented = indentation + content @comment_cache[uncommented] = commented yaml.gsub(commented, uncommented) else yaml end Psych.parse_stream(parseable) end def emit_pair(scalar, mapping) emission_stream = Psych::Nodes::Stream.new emission_document = Psych::Nodes::Document.new emission_mapping = Psych::Nodes::Mapping.new emission_mapping.children.concat [scalar, mapping] emission_document.children.concat [emission_mapping] emission_stream.children.concat [emission_document] output = emission_stream.yaml.gsub!(/^---/, '').strip! @comment_cache.each do |uncommented, commented| output.gsub!(uncommented, commented) end output endend
In this post, let’s dig into some actual AST manipulations. One of the most important take-aways that I have had from this project is that limiting your problem space is key to making progress. In this case, we are only interested in manipulating the database.yml
file. This means that we can make some assumptions about the structure of the file. We can also make some assumptions about the types of manipulations that we will need to make. This allows us to focus on the problem at hand and not get bogged down in the details of YAML parsing and emitting.
Trust me, trying to write a completely generic YAML manipulation engine is a fool’s errand. YAML is a complex format with many edge cases. By limiting our scope to a specific file and a specific set of manipulations, we can make progress much more quickly.
Defining a new database definition #
The first manipulation to tackle is adding a new database definition to the database.yml
file. This is a common task when setting up a new Rails application, especially when using SQLite, where having a separate database file for each IO-bound component helps avoid write contention. We want to add a new database definition to the database.yml
file. This is a simple task, but it is a good starting point for our manipulation engine.
Let’s start with an example of the YAML output we need to generate:
name: &name <<: *default migrations_paths: db/name_migrate database: storage/<%= Rails.env %>-name.sqlite3
We want to generate a new top-level mapping that has an anchor (so that we can reference this mapping in our environment definitions), inherits the default
database configuration, defines a separate directory to hold migrations for this database, and specifies the location of the database file.
To achieve this, we need to create a new method in our DatabaseYAML
class. This method will take the name of the new database and return the YAML content for the new database definition. We will call this method new_database
:
def new_database(name) db = Psych::Nodes::Mapping.new(name) db.children.concat [ Psych::Nodes::Scalar.new("<<"), Psych::Nodes::Alias.new("default"), Psych::Nodes::Scalar.new("migrations_paths"), Psych::Nodes::Scalar.new("db/#{name}_migrate"), Psych::Nodes::Scalar.new("database"), Psych::Nodes::Scalar.new("storage/<%= Rails.env %>-#{name}.sqlite3"), ] emit_pair(Psych::Nodes::Scalar.new(name), db)end
A couple key points to note here:
- When you create a new
Mapping
instance with a value, that value will become the anchor. AMapping
initialized with no value will be a “normal” mapping. - We need to ensure to create an
Alias
node when referencing thedefault
database configuration. - The YAML AST structure uses a flat array of child nodes for a key-value mapping. Each tuple of child nodes represents a key-value pair.
Once you know the structure of the AST, it is relatively straightforward to build up the AST for the new database definition. The emit_pair
method will take care of emitting the YAML string for the new database definition. If you execute something like puts new_database("name")
in an IRB console, you should see the following output:
new: &new <<: *default migrations_paths: db/new_migrate database: storage/<%= Rails.env %>-new.sqlite3
Adding a new database definition to environment configurations #
Once you have a database defined in the database.yml
file, you need to “activate” it by adding it to the environment configurations. That means, we need to make use of Rails’ multiple database support. You see, defining a new database definition is inert until you tell Rails that this database should be used in a specific environment. In order to use multiple databases in a single environment, you have to define the environment(s) using a three-tiered configuration structure. This is the example given in the Rails Guides:
production: primary: database: my_primary_database username: root password: <%= ENV['ROOT_PASSWORD'] %> adapter: mysql2 primary_replica: database: my_primary_database username: root_readonly password: <%= ENV['ROOT_READONLY_PASSWORD'] %> adapter: mysql2 replica: true animals: database: my_animals_database username: animals_root password: <%= ENV['ANIMALS_ROOT_PASSWORD'] %> adapter: mysql2 migrations_paths: db/animals_migrate animals_replica: database: my_animals_database username: animals_readonly password: <%= ENV['ANIMALS_READONLY_PASSWORD'] %> adapter: mysql2 replica: true
Under the production
environment key, you have a hash of database name keys, and under each of those is a hash of configuration options. Now, this three-tiered structure can be simplified using YAML anchors and aliases. So, in the same way that the default development
environment config inherits from the default
database configuration:
development: <<: *default database: storage/development.sqlite3
You can use the same technique to simplify a three-tiered environment configuration:
development: primary: <<: *default database: storage/development.sqlite3 new: *new
So, this is the second transformation we need to implement. And, to keep the implementation simple, we will simply add the database to every environment defined in the database.yml
file. Let’s add a new method to our DatabaseYAML
class called add_database
and get to work:
class DatabaseYAML def add_database(environment, name) # implementation goes here endend
There are essentially 2 steps to this implementation:
- Find the environment configurations in the YAML AST.
- Add the new database definition to each environment configuration.
Let’s start with the first. How do we identify the environment configurations in the YAML? I don’t want to simply use the development
, test
, and production
names, because what if an app has a staging
environment defined? Or some other custom environment? So, we can’t rely on names, what is the structure of the AST that we can use to identify environment configurations?
Well, we know that the database.yml
file only has two kinds of top-level mappings: database configurations and environment configurations. The database configurations are easy to identify because they have an anchor. So, the environment configurations are the ones that don’t have an anchor. This is the key insight we need to identify environment configurations in the YAML AST.
So, we want to get to the root node, iterate over the pairs of children, and select those pairs that are a Scalar
plus Mapping
pair where the Mapping
doesn’t have an anchor. Here’s how you can do that:1
def add_database(name) root = @stream.children.first.root root.children.each_slice(2).map do |scalar, mapping| next unless scalar.is_a?(Psych::Nodes::Scalar) next unless mapping.is_a?(Psych::Nodes::Mapping) next unless mapping.anchor.nil? || mapping.anchor.empty? # implementation goes here end.compact!end
With this in place, we can now focus on adding the new database definition to each environment configuration. But, we need to be thoughtful. This script can be run as a part of rails new
or against an existing database. This means that the database.yml
config might already be in a three-tiered structure. We need to handle both two-tiered and three-tiered environment configurations.
Again, we can rely on some conventions to help us navigate the AST and figure out which structure we are dealing with. In a two-tiered environment configuration, the first key-value pair in the mapping will have a scalar with the value <<
(to inherit from the default
database configuration). Otherwise, we can presume we are dealing with a three-tiered environment configuration:
def add_database(name) root = @stream.children.first.root root.children.each_slice(2).map do |scalar, mapping| # ... if mapping.children.first.value == "<<" # 2-tiered environment # implementation goes here else # 3-tiered environment # implementation goes here end end.compact!end
When dealing with a two-tiered environment configuration, we need to shift the whole environment configuration to a three-tiered config and then add the new database definition to the mapping. We can do this by making the existing mapping the value of a primary
key, then adding a key-value alias for the database beneath that:
new_mapping = Psych::Nodes::Mapping.newnew_mapping.children.concat [ Psych::Nodes::Scalar.new("primary"), mapping, Psych::Nodes::Scalar.new(name), Psych::Nodes::Alias.new(name),]
When dealing with a three-tiered environment configuration, we can simply add the new database definition to the mapping:
new_mapping = Psych::Nodes::Mapping.newnew_mapping.children.concat mapping.childrennew_mapping.children.concat [ Psych::Nodes::Scalar.new(name), Psych::Nodes::Alias.new(name),]
In both cases we work with a new AST node, instead of mutating to the existing node, so that we can easily emit the YAML content for both the original and the new content. We can then replace the existing mapping with the new mapping in the YAML source via a simple find and replace. Putting it all together, the full add_database
method looks like this:
def add_database(name) root = @stream.children.first.root root.children.each_slice(2).map do |scalar, mapping| next unless scalar.is_a?(Psych::Nodes::Scalar) next unless mapping.is_a?(Psych::Nodes::Mapping) next unless mapping.anchor.nil? || mapping.anchor.empty? # skip if the environment already has the database definition next if mapping.children.each_slice(2).any? do |key, value| key.is_a?(Psych::Nodes::Scalar) && key.value == name && value.is_a?(Psych::Nodes::Alias) && value.anchor == name end new_mapping = Psych::Nodes::Mapping.new if mapping.children.first.value == "<<" # 2-tiered environment new_mapping.children.concat [ Psych::Nodes::Scalar.new("primary"), mapping, Psych::Nodes::Scalar.new(name), Psych::Nodes::Alias.new(name), ] else # 3-tiered environment new_mapping.children.concat mapping.children new_mapping.children.concat [ Psych::Nodes::Scalar.new(name), Psych::Nodes::Alias.new(name), ] end old_environment_entry = emit_pair(scalar, mapping) new_environment_entry = emit_pair(scalar, new_mapping) [scalar.value, old_environment_entry, new_environment_entry] end.compact!end
As you can see, simplifying the problem space and relying on the conventions of the database.yml
file made this implementation approachable. Trying to account for every possible variation possible in generic YAML would have been a nightmare, to be frank. So, yes, we can’t mutate any YAML content in any kind of way, but we can make the changes we needed in a sufficiently robust and resilient manner. And, it took far less time to implement than trying to tackle the problem in a generic way.
All totalled, we have built a DatabaseYAML
class that can parse a YAML file or string, emit partial YAML content, handle inline commented out lines in the YAML content, define new database configurations, and add them to the environment configurations in the YAML content. And, we did all of this without mutating the YAML AST directly. We used the AST as a read-only data structure and built new AST nodes to represent the changes we wanted to make, which we then emit as strings so that updates can be done more surgically as find and replace operations.
Hopefully some of the lessons and techniques will prove useful to you if you ever need to manipulate YAML content in a similar way. And, if you have any questions or comments, please feel free to reach out to me on Twitter at @fractaledmind. In following posts, we will turn from YAML parsing, to some other aspects of building a high-quality Rails application template script, like testing and configuration.
-
@stream
here is an instance variable that holds the result of thePsych.parse_stream(yaml)
call. ↩