Tech: Autocomplete with Rails & Mongoid
Hi everyone. This is the inaugural post in our Dwellable Tech series, which highlights the technology powering Dwellable. We're lucky to have a great team here and we want to share what we've learned.
You might have noticed that we recently added a search box on the right side of our Dwellable page header. I've built sophisticated search boxes for several popular web sites (including Urbanspoon) so I know how to build these in a hurry.
Before we start in on the technical side, let me state a few UX precepts:
Users hate search boxes - Users have been trained to distrust search boxes on web sites. They wonder why your search isn't as good as Google. After getting burned over and over by bad search boxes, users become very hesitant to use them.
Users love autocomplete - Users hate search, but they love autocomplete. It provides instant feedback and gives a sense of confidence that the thing is working.
Therefore... You might not need actual searching at all. Why not start with autocomplete? That's the approach we've taken at Dwellable. You can't actually search for things like "three bedrooms" or "a nice house on the beach" with our search box. But you can use autocomplete to zero in on vacation destinations or specific listings, which is what users want the vast majority of the time.
I built the Dwellable autocomplete in one business day. Here's how I did it.
1. How does autocomplete work?
We'll add an autocomplete key to each of our models. The autocomplete key is a normalized string that we can query efficiently. The string comes from your object - that's what the user is searching for.
Autocomplete keys have normalized whitespace, and always begin with a space. That way we can quickly match word prefixes. For example, if our autocomplete keys are:
(note underscore to represent space)
...
_CAPE_CANAVERAL
_CAPE_COD
_CAPE_HATTERAS
_SOMEWHERE_IN_CAPE_COD
...
And the user is typing:
CAP
CAPE
CAPE_
CAPE_C
CAPE_CO
CAPE_COD
You can find the right matches with these queries:
Place.where(autocomplete: /_CAP/)
Place.where(autocomplete: /_CAPE/)
Place.where(autocomplete: /_CAPE_/)
Place.where(autocomplete: /_CAPE_C/)
Place.where(autocomplete: /_CAPE_CO/)
Place.where(autocomplete: /_CAPE_COD/)
2. Normalized Keys
The important concept here is normalization - processing our searchable object names and the user's query string using the same function, so we can efficiently find matches. We use the same normalization function on our object names and the user's search string. First, the mixin for our models:
module Autocomplete
extend ActiveSupport::Concern
included do
field :autocomplete
before_save :generate_autocomplete
end
# callback to populate :autocomplete
def generate_autocomplete
# you'll have to customize this
s = self.name
s = s.truncate(45, omission: "", separator: " ") if s.length > 45
write_attribute(:autocomplete, Autocomplete.normalize(s))
end
# turn strings into autocomplete keys
def self.normalize(s)
s = s.upcase
s = s.gsub("'", "")
s = s.gsub("&", " AND ")
s = s.gsub(/[^A-Z0-9 ]/, " ")
s = s.gsub(/ THE /, "")
s = s.squish
s = " #{s}"
s
end
end
# a sample model
class Place
include Mongoid::Document
include Autocomplete
field :name
end
We add an autocomplete key to our model and a callback to populate it. The autocomplete key is a normalized version of the searchable name for your object. In this case I'm using self.name, but you'll need to customize this for your app.
The name is passed to Autocomplete.normalize, which normalizes the string and turns it into an autocomplete key. The normalization performed here is actually quite simple - upcase, remove stopchars/stopwords, fix whitespace, etc. For example, we remove THE so it will be completely ignored during our searches. The normalization function used at Urbanspoon was more elaborate and had to handle the dreaded Ben and Jerry's search (or Ben and Jerries or Ben & Jerys or ...)
3. Data Migration
Now let's populate our models. Here's a simple example script to get you started:
Place.all.each { |i| i.save! }
That might be slow if you have a lot of objects. You can dramatically improve performance by dropping down into the Mongo driver (instead of Mongoid) and only loading the fields required to generate your keys. Something like this:
Place.all.only(:name).each do |i|
i.send(:generate_autocomplete)
Place.collection.update({ _id: i.id }, { "$set" => { autocomplete: i.autocomplete } })
end
If I remember correctly, the optimized version was about 10x faster than the naive version.
Pro tip: use the progressbar gem. You can add it while you wait for your first test migration to complete.
4. Queries and Performance
Now that we've created our autocomplete keys, how do we query for documents? Something like this will do the trick:
module Autocomplete
def self.search(query)
query = normalize(query)
return [] if query.blank?
Place.where(autocomplete: /#{query}/).asc(:name).limit(10)
end
end
This is the simplest possible example - your implementation will likely be more complicated. A few things to note:
- don't forget to normalize - Gotta normalize those incoming queries. Otherwise the query won't match the keys.
- handle the empty case - What happens if the query is empty? It'll return all records unless we short circuit the empty case.
- escaping (not) - The normalization function takes care of escaping the query, so I can pass it into the regex directly without escaping.
- sort order - If more than 10 records match, which 10 do you want? On Dwellable we actually sort by # of rentals, so that larger destinations are returned first. That's useful for short queries. When the user types
Kwe want to returnKAUAI, notKANSAS CITY.
In actuality, the Dwellable search queries a number of models - first destinations, then rental listings. That's why this method lives in Autocomplete instead of a model.
What about performance? Note that there is NO INDEX on the autocomplete field. Mongo can't actually use an index for these regex queries, so there's no point. Here are some benchmarks from my machine, with 50,000 objects:
Place.where(autocomplete: / XYZZY/).limit(10).to_a # zero matches - 0.046s
Place.where(autocomplete: / A/).limit(10).to_a # ten matches - 0.002s
Perfect for our purposes. There are lots of tricks you can use to improve the performance if your dataset is bigger. The easiest way to improve performance is to limit the scope of the query using some other index. For example, on Dwellable if the user is in Florida we only search places in Florida:
Place.where(autocomplete: / A/, state: "FL")
Tricks like this keep both users and Mongo happy.
5. The Action
When the user types into the search box, we'll use jQuery to fetch completions:
class ApplicationController
def autocomplete
list = Autocomplete.search(params[:q])
list = list.map do |i|
{ label: i.name, value: place_path(i) }
end
render json: list
end
end
The jQuery UI Autocomplete widget accepts a list of results, which each result consisting of a label for display and a value that you can use when the user clicks. In this case we're simply returning the object name and the url for the object.
In actuality, I added a method to each of our autocompleting models called autocomplete_label. For vacation destinations, the label shows the number of rentals. For rentals, the label gives some geographic context. Users love this stuff.
Since the Dwellable search results actually include different models, I found it useful to add a url method to each model. This is somewhat sacrilegious in Rails, but it's an incredibly handy pattern that I've used to great effect when building well-SEO'd sites. After all, for sites that care about SEO each object should have a canonical url. So for Dwellable it looks more like this:
{ label: i.autocomplete_label, value: i.url }
A quick check with curl shows it's working:
$ curl 'http://localhost/autocomplete?q=kau'
[
[0] {
"label" => "Kauai (1533 rentals)",
"value" => "/a/1057/Kauai/Vacation-Rentals"
},
...
]
6. HAML/Coffeescript
Let's add our search box to the layout:
text_field_tag(:q, nil, :placeholder => "Search")
I used the jQuery UI Autocomplete widget to handle queries. Here's some boilerplate code to get you started:
$(".navbar input").autocomplete(
delay: 100
minLength: 2
source: (request, response) ->
$.getJSON("/autocomplete", { q: request.term }, (result) ->
response(result)
)
select: (event, ui) ->
window.location = ui.item.value
false
)
Our actual code is a bit more complicated, since we want to automatically select the first item. There are some blur/focus bugs to overcome. Read through the Dwellable Javascript for more information.
7. Caveats
So, where are the gotchas?
- mobile - A few days after implementing autocomplete, I added autocomplete to our mobile web site. That turned out to be hard, since our mobile site doesn't use jQuery. We use Zepto. I ended up writing my own Zepto-based autocomplete widget. That was more work than the entire back end! Luckily I only had to support Webkit-based browsers. Otherwise it would've been a nightmare.
- i18n - As written above, the normalization function eats multibyte characters. That isn't a problem for Dwellable, but it might be for you.
- web scale - I expect this design to fall over if the number of documents increases by 10x. For Dwellable, I can probably get away with only searching the most popular 100,000 documents. Again, simple tricks keep users and Mongo happy.
The End
That's it! We don't have comments on our blog, but I'll keep an eye on the relevant Hacker News thread. Chime in if you have a question.