How to Optimize Algolia Search by Language on a Docusaurus Multi-language Site
When implementing Algolia DocSearch on a multi-language (i18n) site built with Docusaurus, I faced the challenge of search results containing mixed languages. It's common for situations to arise that harm the user experience, such as English pages appearing in search results when searching in Japanese.
This article organizes the specific setup steps as a technical memo to solve this problem and achieve an optimal search experience that syncs with the site's display language.
- Filter search results to match the site's display language (Japanese/English).
- Automatically route content to language-specific Algolia indexes based on the URL path (
/
or/en/
). - Improve search accuracy with index settings tailored to the characteristics of each language, such as Japanese and English.
References
- Docusaurus Official Documentation: Search
- Algolia Multilingual Search Guide: Multilingual search
The Challenge: Problems with the Standard Setup
When integrating Docusaurus's i18n feature with Algolia using the standard configuration, content from all languages is registered in a single Algolia index. This causes the following issues:
Challenge | Problem with a Single Index | Solution with Language-Specific Indexes |
---|---|---|
Mixed Search Results | Japanese and English content are displayed without distinction, confusing the user. | Accurately display only the content that matches the current display language. |
Reduced Search Accuracy | Optimization considering language-specific characteristics (word separation, plurals, etc.) is difficult. | Improved accuracy by using settings like ignorePlurals tailored to each language's properties. |
Complex Management | All language data is centralized, making language-specific analysis and adjustments difficult. | Easier management and analysis by separating data for each language. |
Mixed-language search results can prevent users from finding the information they need, becoming a major factor in site abandonment.
Solution Approach: Separation into Language-Specific Indexes
The key to solving this problem is to separate content into language-specific indexes based on the URL structure.
The implementation proceeds in the following two steps:
- Customize the Algolia Crawler:
Identify the URL path (
/
or/en/
) to define crawling targets for each language. Create dedicated indexes for each language (yoursite_ja
,yoursite_en
) and apply optimized settings to each. - Configure the Docusaurus Front-End:
Enable the
contextualSearch
option to automatically link the display language to the corresponding Algolia index.
1. Configure Language-Specific Indexes in the Algolia Crawler
First, edit the Algolia crawler configuration file to define crawl actions for each language. This will route content to two different indexes based on the URL path.
You can edit this configuration directly in the Algolia Crawler dashboard or manage it locally as a config.json
or .js
file and apply it via the CLI. The following is an example in JavaScript file format.
new Crawler({
// --- Basic Configuration ---
appId: "YOUR_APP_ID",
apiKey: "YOUR_CRAWLER_API_KEY", // Admin API key for the crawler, issued by Algolia
rateLimit: 8,
maxUrls: 5000,
schedule: "at 21:10 on Tuesday",
indexPrefix: "", // Keep the prefix empty, as Docusaurus will handle it
// --- Crawl Target Configuration ---
startUrls: [
"https://your-docusaurus-site.com/",
"https://your-docusaurus-site.com/en/",
],
discoveryPatterns: ["https://your-docusaurus-site.com/**"],
sitemaps: ["https://your-docusaurus-site.com/sitemap.xml"],
// --- Language-Specific Index Routing (Most Important) ---
actions: [
// ▼▼▼ Configuration for Japanese Content ▼▼▼
{
indexName: "yoursite_ja",
pathsToMatch: [
"https://your-docusaurus-site.com/**",
"!https://your-docusaurus-site.com/en/**",
],
recordExtractor: ({ $, helpers }) => {
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active",
defaultValue: "Documentation",
},
lvl1: ["header h1", "article h1"],
lvl2: "article h2",
lvl3: "article h3",
lvl4: "article h4",
lvl5: "article h5, article td:first-child",
content: "article p, article li, article td:last-child",
lang: { defaultValue: "ja" },
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
// ▼▼▼ Configuration for English Content ▼▼▼
{
indexName: "yoursite_en",
pathsToMatch: ["https://your-docusaurus-site.com/en/**"],
recordExtractor: ({ $, helpers }) => {
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active",
defaultValue: "Documentation",
},
lvl1: ["header h1", "article h1"],
lvl2: "article h2",
lvl3: "article h3",
lvl4: "article h4",
lvl5: "article h5, article td:first-child",
content: "article p, article li, article td:last-child",
lang: { defaultValue: "en" },
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
],
// --- Initial Index Settings for Each Language ---
initialIndexSettings: {
// ▼▼▼ Search settings for the Japanese index ('yoursite_ja') ▼▼▼
yoursite_ja: {
attributesForFaceting: ["type", "lang", "docusaurus_tag"],
ignorePlurals: false, // Ignoring plurals is not necessary for Japanese
minWordSizefor1Typo: 4,
minWordSizefor2Typos: 8,
searchableAttributes: ["unordered(hierarchy.lvl0)","unordered(hierarchy.lvl1)","unordered(hierarchy.lvl2)","unordered(hierarchy.lvl3)","unordered(hierarchy.lvl4)","content"],
customRanking: ["desc(weight.pageRank)","desc(weight.level)","asc(weight.position)"],
},
// ▼▼▼ Search settings for the English index ('yoursite_en') ▼▼▼
yoursite_en: {
attributesForFaceting: ["type", "lang", "docusaurus_tag"],
ignorePlurals: true, // For English, treating plurals as the same improves accuracy
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
searchableAttributes: ["unordered(hierarchy.lvl0)","unordered(hierarchy.lvl1)","unordered(hierarchy.lvl2)","unordered(hierarchy.lvl3)","unordered(hierarchy.lvl4)","content"],
customRanking: ["desc(weight.pageRank)","desc(weight.level)","asc(weight.position)"],
},
},
});
Key Configuration Points
actions
: This is the core of the configuration. It usespathsToMatch
with URL patterns to decide which content goes to whichindexName
.- Japanese: Targets the entire site (
/**
) while excluding the English path (!/en/**
). - English: Targets only the English path (
/en/**
).
- Japanese: Targets the entire site (
initialIndexSettings
: Applies language-specific settings to each index (yoursite_ja
,yoursite_en
). AdjustingignorePlurals
and typo tolerance (minWordSizefor...Typos
) is key to improving search quality.
2. Dynamic Index Switching in Docusaurus
Next, configure the Docusaurus front-end to automatically use the correct index based on the viewing language. This is done simply by modifying the algolia
configuration in docusaurus.config.ts
.
import type { Config } from '@docusaurus/types';
const config: Config = {
// ... (basic site configuration)
i18n: {
defaultLocale: 'ja',
locales: ['ja', 'en'],
},
themeConfig: {
// ...
algolia: {
appId: 'YOUR_APP_ID',
apiKey: 'YOUR_SEARCH_ONLY_API_KEY', // Set the "Search-Only" key
indexName: 'yoursite', // Specify the common prefix for the index name
// This one line is the key
contextualSearch: true,
},
},
};
export default config;
contextualSearch: true
Enabling this option allows Docusaurus to automatically detect the current language context (ja
or en
). It then constructs the final index name by appending the locale ID as a suffix to the indexName
prefix (yoursite
), and sends the query to the correct index (yoursite_ja
or yoursite_en
).
- When viewing a Japanese page → searches
yoursite_ja
- When viewing an English page → searches
yoursite_en
Conclusion
With the settings above, you can dramatically improve the Algolia search experience on your Docusaurus i18n site.
- Optimal Search Experience: Users can search only within the content of the language they are currently viewing, getting precise and noise-free results.
- High Search Quality: Provide more relevant search results thanks to index settings optimized for each language.
- Excellent Scalability: When adding new languages in the future, you can easily scale by adding a similar pattern to the crawler configuration.