Multilingual search in Django with Haystack
Multilingual search in Django with Haystack
When you build a website for a swiss company, like we do at tangent, that’s likely you would need to perform search in 3 languages.
There is multiple way to perform multilingual search, and Steve Kearns explains that in this slideshare
We chose to implement the solution at indexing time, and that was, unpredictabily easy.
The solution
The solution is really simple, let say you want to index a book title in 3 languages: german, french and italian. Your book will look like:
class Book(models.Model):
title_de = models.CharField(max_length=100)
title_fr = models.CharField(max_length=100)
title_it = models.CharField(max_length=100)
Your Haystack index will look like:
class BookIndex(indexes.RealTimeSearchIndex, indexes.Indexable):
title = indexes.CharField(document=True)
At indexing time, we will send an index to solr for each language to a specific solr core.
------------ "Mein Buch" ---| ------------
| SOLR FR |<------ "Mon livre" ---->| SOLR DE |
------------ "Mio libro" ------------
|
|
V
------------
| SOLR IT |
------------
At query time, we will only query the core matching our language:
------------
"Les miserables" ---->| SOLR FR | ---> French results
------------
Implementation
1. Config
Define your search engines in your configuration file
HAYSTACK_CONNECTIONS = {
'default':{
'ENGINE': 'myproject.backend.MultilingualSolrEngine',
'URL': 'http://127.0.0.1:8080/solr-de',
},
'default_de':{
'ENGINE': 'myproject.backend.MultilingualSolrEngine',
'URL': 'http://127.0.0.1:8080/solr-de',
},
'default_fr':{
'ENGINE': 'myproject.backend.MultilingualSolrEngine',
'URL': 'http://127.0.0.1:8080/solr-fr',
},
'default_it':{
'ENGINE': 'myproject.backend.MultilingualSolrEngine',
'URL': 'http://127.0.0.1:8080/solr-it',
},
}
Each connection follows the same pattern, ”<name>_<language_code>". Some of you may have notice the usage of a special backend, but that’s step 2.
2. Add the backend
Copy the following backend. This will do the job for indexation, publishing each content to all cores.
from django.conf import settings
from django.utils import translation
from haystack import connections
from haystack.backends.solr_backend import SolrEngine, SolrSearchBackend, SolrSearchQuery
from haystack.constants import DEFAULT_ALIAS
def get_using(language, alias=DEFAULT_ALIAS):
new_using = alias + "_" + language
using = new_using if new_using in settings.HAYSTACK_CONNECTIONS else alias
return using
class MultilingualSolrSearchBackend(SolrSearchBackend):
def update(self, index, iterable, commit=True, multilingual=True):
if multilingual:
initial_language = translation.get_language()[:2]
# retrieve unique backend name
backends = []
for language, __ in settings.LANGUAGES:
using = get_using(language, alias=self.connection_alias)
# Ensure each backend is called only once
if using in backends:
continue
else:
backends.append(using)
translation.activate(language)
backend = connections[using].get_backend()
backend.update(index, iterable, commit, multilingual=False)
translation.activate(initial_language)
else:
print "[%s]" % self.connection_alias
super(MultilingualSolrSearchBackend, self).update(index, iterable, commit)
If you run update_index
now, you will see haystack updating indexes for all cores.
3. Query time now!
To choose the right engine when querying solr through haystack, we should define our own SearchQuery and provide the right using
.
class MultilingualSolrSearchQuery(SolrSearchQuery):
def __init__(self, using=DEFAULT_ALIAS):
language = translation.get_language()[:2]
using = get_using(language)
super(MultilingualSolrSearchQuery, self).__init__(using)
4. Link everything together, the engine
The engine job is to link the query and the backend and to provide an entry point.
class MultilingualSolrEngine(SolrEngine):
backend = MultilingualSolrSearchBackend
query = MultilingualSolrSearchQuery
5. That’s it
There is nothing else to do. Just write your haystack views and indexes and everything without changing a single line of code.
Downside
There is no absolute solution, never. As the others, this one has few downsides. The ones I saw are:
- Many cores to manage, deploy and monitor
- Slow indexing. If you are using real time haystack indexes like we do, any time you update an instance of an indexed model, a signal trigger an update to the N cores, which significantly slow down the update. Most of the time, we care more about the query time than the update time. If you do, that might not be the right solution for you.
You like this post. Follow me on twitter: @atresontani