As you might know, some popular Ruby gems, like Nokogiri and Puma come with native extensions which back their features and/or their performance. These native extensions are usually written in C, just like Ruby MRI itself. But extensions can be written in other languages, too!

At HAUSGOLD we implement our backend services in Ruby and Rails. It enables us to ship features fast, and build highly reliable and maintainable software, but it comes at a performance cost. Nevertheless, this is a great deal. And it’s not just us riding the Ruby train: there are much bigger companies like Github, Shopify, or Zendesk. So we’re in good company.

The mentioned performance costs can luckily be reduced, with native Ruby extensions. For a long time, Ruby MRI allows us to access functionality from non-Ruby libraries. This is made possible by a mechanism called foreign function interface (FFI), which allows a program that is written in one programming language to access routines or make use of services written in another.

Classically these native Ruby extensions are written in the C programming language. This is obvious because all shared system libraries offer a C interface to access their functionality. So it is convenient for Ruby gem developers to build bridges to these functionalities as well as to make use of the great performance of the natively compiled C language. But writing good and safe code in C is quite elaborate. There are many ways to handle memory or function side effects badly, resulting in vulnerabilities or memory leaks.

On the other hand, the Rust programming language comes with an awesome memory safety concept, built into the language, while being comparable to the runtime performance of C. A lot of effort was put into the language design to overcome common developer mistakes, which makes it a great system programming language. And it’s straightforward to build and access functionality written in Rust from the Ruby context with the fantastic Rutie Rust Crate.

The Use Case™

Our architecture is driven by distributed services that rely on authorization feedback in order to perform the requested operations on their entities or reject the request. As you can imagine, the lower the overhead of the authorization, the better the overall performance of the platform.

Therefore, our platform team worked hard on optimizing our access control subsystem, a hand-rolled RBAC implementation within the last 6 months. Within this scope, we identified and rewrote a critical component of the system with Rust.

For the access control subsystem, we heavily rely on so-called GlobalIDs. These identifiers are application- and entity-aware, so they are perfect for referencing things within our distributed system architecture. They follow this schema: gid://app/entity/id. As you can see, this forms a valid URI. With a custom GlobalID locator, one can easily implement its own logic to fetch data from remote applications/services. Here comes an example of such a GlobalID from our ecosystem: gid://identity-api/User/109ecc20-7886-404a-86fe-577f9dbe4300

The vanilla purpose of a GlobalID does not require the use of URI query parameters, or fragments. But as mentioned, it forms a URI and so it allows the use of parameters and fragments. We also made use of the fact that the entity identifier can be anything, like an auto-incremented integer, a UUID, or any character sequence you like. All these details allowed us to implement flexible and extensible references in the context of access control policies. Check this out:

  • gid://identity-api/User/all, selects all users
  • gid://all/all/all, selects everything, from anywhere
  • gid://identity-api/User/self, selects the user within a request context
  • gid://calendar-api/Task/assigned?through=gid://identity-api/User/self, selects all tasks which are directly assigned to the user
  • gid://asset-api/Asset/assigned?through=gid://calendar-api/Task/assigned, selects all assets which are related to all tasks that are assigned to the dynamic context (eg. the user is assigned to a task, so he is able to access assets of the task without the need to reference them directly)

All the extra handling (evaluated GlobalID properties, query parameter interpretation, etc) forms our entity reference component. And this is the critical component we rewrote in Rust. (spoiler: this was the third iteration)

Performance Reasons™

The first version of this entity reference component was based on the URI::GID Ruby class (inheriting URI::Generic), which is shipped by the GlobalID gem. This version was super slow on parsing, as URI::Generic runs a lot of validations. That’s great when used for user-facing inputs, like single/some API inputs, but sucks on a large set from which you know it’s valid.

So the second version was a hand-rolled GlobalID implementation that had sophisticated string parsing, without any validation or seatbelts. Why not? When you know the input is valid, slice the input string to get rid of the gid:// prefix, split it by the slash character three times, and you’re done. (yeah, handle potential query parameters, you know the drill)

def self.parse(input)
  # .. other clever shortcuts

  # Split the individual components from the URI
  app, model_name, rest = input[6..-1].split('/', 3)
  model_id, params = rest.split('?', 2)
  params = (params || '').split('&').map { |param| param.split('=', 2) }.to_h

  # Construct a new instance with the extracted components
  new(app: app, model_name: model_name, model_id: model_id, params: params)

The first version was able to perform 6,631.0 i/s (iterations per second). The second, shown version, was able to run 32,245.3 i/s. (4.86x faster than the previous version) Processing strings this way comes with many allocations, array constructions, array decompositions, and so on. And this results in suboptimal performance.

For serialization, we used simple Ruby string interpolation ("gid://#{@app}/#{@model_name}/#{@model_id}") which resulted in 112,540.9 i/s. This was not bad, but no improvement over the original URI::Generic version.

As mentioned earlier, another approach to speed things up is the implementation of the feature in a Ruby native extension. So we set up an application-inlined Ruby Gem called Restless, and we started playing around with Rust and Rutie.

Matters of (implementation) detail

A great advantage of this project was the fact that we had a solid test suite with dozens of examples in place and a benchmarking suite to measure the performance across the different implementation versions. Another welcomed fact was the straightforward set up of the Rutie Rust Crate with the Thermite Ruby Gem.

With this constellation, it was just a matter of minutes to have the Rust code in place to form our entity reference component, which was callable from the Ruby context. Then the real fun started: implementing each class method with Rust code and trying out some ideas.

We’ve implemented the GlobalID parsing seven times with seven different approaches. All of them performed better than our second Ruby version and passes the test suite:

  • Regular expression with capture groups (via regex Rust Crate), running a second regex if query parameters were available (1.63x faster)
  • Using the url Rust Crate (Url::parse(), url.host_str(), url.path_segments(), url.query_pairs()) (1.95x faster)
  • A re-implementation of the second Ruby version, string splitting, and re-joining (2.98x faster)
  • Searching for a query parameters split character (index of), perform parameter splitting if needed, then build sliced references for the properties (app, model name, model id) instead of string manipulation (eg. splitting) (3.47x faster)
  • Just using string split+collect for the query parameters and the properties, and check array counts and each property for zero-length afterwards (3.52x faster)
  • Iterating character by character while filling and switching string buffers (3.72x faster)
  • Searching for the next split character (index of) and split at index accordingly (.strip_prefix("gid://"), .find('/'), .split_at(idx)) (4.08x faster)

With the findings along our way with Rust and Rutie we were able to make the GlobalID parsing ~20x faster than the original implementation. Check out the numbers:

GlobalID parsing:

Ruby:v1 EntityReference.parse:   6,631.0 i/s
Ruby:v2 EntityReference.parse:  32,245.3 i/s -  4.9x faster
Rust:v1 EntityReference.parse: 131,560.8 i/s - 19.8x faster (4.1x faster than Ruby:v2)

GlobalID serialization:

Ruby:v1 EntityReference#to_s: 111,950.9 i/s
Ruby:v2 EntityReference#to_s: 112,540.9 i/s - same-ish: difference falls within error
Rust:v1 EntityReference#to_s: 254,453.4 i/s - 2.3x faster

Performance across all methods (summed up):

Ruby:v1 EntityReference#(all methods): 5,152,099.5 i/s
Ruby:v2 EntityReference#(all methods): 6,463,401.8 i/s - 1.3x faster
Rust:v1 EntityReference#(all methods): 9,338,239.1 i/s - 1.8x faster

In Rust we trust

As mentioned earlier, we worked on the optimization of our access control subsystem, and this component rewrite was just one of many subprojects within this scope. But it was an excellent opportunity to try something new, and it turned out very well! We’ve run this code on our production cluster for a few weeks now, without any issues, so it’s safe to call it battle proven.

Remember when I first talked about Rust’s memory safety concept in the beginning? The learning curve for this paid out vastly, as it wipes out almost all concerns about memory leaks. And this is something you care about as a Ruby connoisseur and it makes you sleep well.

I imagine we’re converting more dedicated components to Rust in the future. But moving all our applications/libraries to Rust isn’t something we are currently considering. Our ecosystem can be enhanced best by the usage of Rust when it comes into play in critical places like the use case described here.

Our access control subsystem optimization has a lot of other interesting subprojects, like the rewrite of our resolve machine with the help of neo4j. More blog posts will follow, so stay tuned!