An Analysis of custom rails serializers vs. conditional statements
There are some scenarios where while using a backend server, you will want to return nested data in response to a GET Request while not providing it in others. The common method of doing this is by the use of a Serializer.
In the example I'll be using for this post, I'm going to use a Rails practice code challenge called "Camping Fun". In Camping Fun, the following relationships exist:
- A Camper has many Signups, and has many Activities through Signups.
- An Activity has many Signups, and has many Campers through Signups.
- A Signup belongs to a Camper and belongs to an Activity.
These relationships look like this:
The end result of our backend server requires that the Index method for Campers does NOT display any nested data; however, the Show method should display all related activities.
We can accomplish this easily by having two serializers. In the first, we will simply have the following code:
class CamperSerializer < ActiveModel::Serializer
attributes :id, :name, :age
end
Combined with a second serializer which would be very similar but include a has_many, which would look like this:
class CamperActivitiesSerializer < ActiveModel::Serializer
attributes :id, :name, :age
has_many :activities
end
In our CampersController, we will have the following code:
def index
render json: Camper.all, status: 200
end
def show
render json: Camper.find(params[:id]), serializer: CamperActivitiesSerializer, status: 200
end
By following this convention, we will appropriately receive the proper responses.
At the /campers endpoint, no nested data:
[
{
"id": 1,
"name": "Caitlin",
"age": 8
},
{
"id": 2,
"name": "Lizzie",
"age": 9
},
{
"id": 3,
"name": "Tom",
"age": 12
}
]
While still displaying the appropriate nested data at the /campers/:id endpoint:
{
"id": 1,
"name": "Caitlin",
"age": 8,
"activities": [
{
"id": 4,
"name": "Arts & Crafts",
"difficulty": 5
}
]
}
The only problem with this is that it bothered me to have two separate serializers that have so much redundancy. The only difference is the has_many :activities line.
I decided to see if I could discover a new way to do this without creating an entirely new Serializer for this sole purpose. After scouring the Ruby Documentation for a solution, I found instance_options.
#instance_options ⇒ Returns the value of attribute instance_options.
With this, I thought I may have a way to solve the issue posed. What if I were able to set an instance option that would display the has_many :activities only in certain conditions?
With that thought, I returned to the CampersController and edited my show method:
def show
render json: Camper.find(params[:id]), show_activities: true, status: 200
end
Next, I went to the original CamperSerializer to make the following changes:
class CamperSerializer < ActiveModel::Serializer
attributes :id, :name, :age
has_many :activities, if: :activities?
def activities?
@instance_options[:show_activities]
end
end
After testing this, I discovered it worked exactly as intended. If the :show_activities instance option was true, the nested data would display. If it were either false or absent completely, the nested data would not display.
I accomplished exactly what I wanted! The code was now "cleaner" in my mind. Then, one of my instructors from Flatiron School told me they were curious about the difference in run times between using a custom serializer and using the instance option route. Initially, I was unsure of how to proceed with such a test until she reminded me that the Rails Server logs the information necessary to gather the data required.
A quick glance at my server logs showed one simple line at the end of every request: Completed 200 OK in 6ms (Views: 3.8ms | ActiveRecord: 0.5ms | Allocations: 1812)
My mind quickly transitioned into figuring out how we can test this to answer the question posed.
First, I created a new Ruby project which would eventually become a log parser. The inner workings and creation of the parser are a topic for another post. That being said, I can use the parser to create an array of all of the times provided by the logs. After doing so, I then created a method to make use of that data and, in the end, was left with an array of hashes that followed this pattern:
{
"views_time"=> FLOAT,
"avg_views_time"=>FLOAT,
"active_record_time"=>FLOAT,
"avg_active_record_time"=>FLOAT,
"iterations"=>INT
}
The hard part was finished at this point. Now, we just need to collect data. Initially, I used a small sample of 519 requests and had the following results All times in milliseconds:
***********ONE SERIALIZER***************
{
'views_time' => 2008.1000000000006,
'avg_views_time' => 3.8691714836223516,
'active_record_time' => 197.6000000000012,
'avg_active_record_time' => 0.38073217726397146,
'iterations' => 519,
}
***********TWO SERIALIZER***************
{
'views_time' => 1902.6,
'avg_views_time' => 3.665895953757225,
'active_record_time' => 193.00000000000108,
'avg_active_record_time' => 0.3718689788053971,
'iterations' => 519,
}
This sample size wasn't very large but it seemed to show a slight slowdown between using my new method of utilizing instance options compared to the old method of using a custom serializer. If we combine the two average times for views and active record time, the difference between one serializer vs. two serializers is 0.212 milliseconds on average. That's 0.000212 seconds.
I needed more data, though.
Of course, I decided to write a small method that would gather more logs at a faster pace - I had gathered the first 519 manually. I'm a developer though and I shouldn't be doing manual tasks like that. It was time to automate it.
With a new method that would query the amount of times I specified, I requested 5000 data points. I found the following:
***********ONE SERIALIZER***************
{
"views_time"=>4158.599999999943,
"avg_views_time"=>0.8317199999999886,
"active_record_time"=>428.9000000000289,
"avg_active_record_time"=>0.08578000000000578,
"iterations"=>5000
}
***********TWO SERIALIZER***************
{
"views_time"=>4132.499999999936,
"avg_views_time"=>0.8264999999999872,
"active_record_time"=>454.7000000000348,
"avg_active_record_time"=>0.09094000000000696,
"iterations"=>5000
}
Adding the total averages together again, we will see that over 5000 iterations the time discrepancy has dropped even farther from the original sample. At this point, the difference is only -0.00006 milliseconds on average. I'd say it's worth the sacrifice to have the cleaner version. This is a very basic test though and I do not know how the time difference may scale on larger databases.
Let me know what you think!
[EDIT: 12/18/2021]: I wanted another way to analyze the results found, without using server logs directly. This time, I created a bash script that would create the GET requests.
#!/bin/bash
filename=results.txt
counter=0
while [ $counter -lt 5000 ]
do
counter=$((counter+1))
curl -s -w "%{time_total}\n" -o /dev/null http://localhost:3000/campers/1 >> $filename
done
This script rapidly created 5000 requests to my rails server and returned the total time for each, writing that number (in seconds) to a results file. Using JavaScript, I loaded the results file and used a reducer to add the times together.
Using this method of testing for 5000 requests, I found:
Total Time - Single Serializer: 48.1317 seconds
Total Time - Two Serializers: 40.6091 seconds
I think this is a more accurate method of testing and it makes the actual difference much clearer by taking in the total time from request to response.
Thanks for reading this! I hope this helps you. If you enjoyed the content, please check back and also follow my journey on Twitter!