Skip to main content

How to Optimize Algolia Search by Language on a Docusaurus Multi-language Site

· 6 min read
hiroaki
Individual Developer

When implementing Algolia DocSearch on a multi-language (i18n) site built with Docusaurus, I faced the challenge of search results containing mixed languages. It's common for situations to arise that harm the user experience, such as English pages appearing in search results when searching in Japanese.

This article organizes the specific setup steps as a technical memo to solve this problem and achieve an optimal search experience that syncs with the site's display language.

What you can achieve with this article
  • Filter search results to match the site's display language (Japanese/English).
  • Automatically route content to language-specific Algolia indexes based on the URL path (/ or /en/).
  • Improve search accuracy with index settings tailored to the characteristics of each language, such as Japanese and English.

References

The Challenge: Problems with the Standard Setup

When integrating Docusaurus's i18n feature with Algolia using the standard configuration, content from all languages is registered in a single Algolia index. This causes the following issues:

ChallengeProblem with a Single IndexSolution with Language-Specific Indexes
Mixed Search ResultsJapanese and English content are displayed without distinction, confusing the user.Accurately display only the content that matches the current display language.
Reduced Search AccuracyOptimization considering language-specific characteristics (word separation, plurals, etc.) is difficult.Improved accuracy by using settings like ignorePlurals tailored to each language's properties.
Complex ManagementAll language data is centralized, making language-specific analysis and adjustments difficult.Easier management and analysis by separating data for each language.
Impact on User Experience

Mixed-language search results can prevent users from finding the information they need, becoming a major factor in site abandonment.

Solution Approach: Separation into Language-Specific Indexes

The key to solving this problem is to separate content into language-specific indexes based on the URL structure.

The implementation proceeds in the following two steps:

  1. Customize the Algolia Crawler: Identify the URL path (/ or /en/) to define crawling targets for each language. Create dedicated indexes for each language (yoursite_ja, yoursite_en) and apply optimized settings to each.
  2. Configure the Docusaurus Front-End: Enable the contextualSearch option to automatically link the display language to the corresponding Algolia index.

1. Configure Language-Specific Indexes in the Algolia Crawler

First, edit the Algolia crawler configuration file to define crawl actions for each language. This will route content to two different indexes based on the URL path.

Managing the Configuration File

You can edit this configuration directly in the Algolia Crawler dashboard or manage it locally as a config.json or .js file and apply it via the CLI. The following is an example in JavaScript file format.

crawler-config.js
new Crawler({
// --- Basic Configuration ---
appId: "YOUR_APP_ID",
apiKey: "YOUR_CRAWLER_API_KEY", // Admin API key for the crawler, issued by Algolia
rateLimit: 8,
maxUrls: 5000,
schedule: "at 21:10 on Tuesday",
indexPrefix: "", // Keep the prefix empty, as Docusaurus will handle it

// --- Crawl Target Configuration ---
startUrls: [
"https://your-docusaurus-site.com/",
"https://your-docusaurus-site.com/en/",
],
discoveryPatterns: ["https://your-docusaurus-site.com/**"],
sitemaps: ["https://your-docusaurus-site.com/sitemap.xml"],

// --- Language-Specific Index Routing (Most Important) ---
actions: [
// ▼▼▼ Configuration for Japanese Content ▼▼▼
{
indexName: "yoursite_ja",
pathsToMatch: [
"https://your-docusaurus-site.com/**",
"!https://your-docusaurus-site.com/en/**",
],
recordExtractor: ({ $, helpers }) => {
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active",
defaultValue: "Documentation",
},
lvl1: ["header h1", "article h1"],
lvl2: "article h2",
lvl3: "article h3",
lvl4: "article h4",
lvl5: "article h5, article td:first-child",
content: "article p, article li, article td:last-child",
lang: { defaultValue: "ja" },
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
// ▼▼▼ Configuration for English Content ▼▼▼
{
indexName: "yoursite_en",
pathsToMatch: ["https://your-docusaurus-site.com/en/**"],
recordExtractor: ({ $, helpers }) => {
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active",
defaultValue: "Documentation",
},
lvl1: ["header h1", "article h1"],
lvl2: "article h2",
lvl3: "article h3",
lvl4: "article h4",
lvl5: "article h5, article td:first-child",
content: "article p, article li, article td:last-child",
lang: { defaultValue: "en" },
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
],

// --- Initial Index Settings for Each Language ---
initialIndexSettings: {
// ▼▼▼ Search settings for the Japanese index ('yoursite_ja') ▼▼▼
yoursite_ja: {
attributesForFaceting: ["type", "lang", "docusaurus_tag"],
ignorePlurals: false, // Ignoring plurals is not necessary for Japanese
minWordSizefor1Typo: 4,
minWordSizefor2Typos: 8,
searchableAttributes: ["unordered(hierarchy.lvl0)","unordered(hierarchy.lvl1)","unordered(hierarchy.lvl2)","unordered(hierarchy.lvl3)","unordered(hierarchy.lvl4)","content"],
customRanking: ["desc(weight.pageRank)","desc(weight.level)","asc(weight.position)"],
},
// ▼▼▼ Search settings for the English index ('yoursite_en') ▼▼▼
yoursite_en: {
attributesForFaceting: ["type", "lang", "docusaurus_tag"],
ignorePlurals: true, // For English, treating plurals as the same improves accuracy
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
searchableAttributes: ["unordered(hierarchy.lvl0)","unordered(hierarchy.lvl1)","unordered(hierarchy.lvl2)","unordered(hierarchy.lvl3)","unordered(hierarchy.lvl4)","content"],
customRanking: ["desc(weight.pageRank)","desc(weight.level)","asc(weight.position)"],
},
},
});

Key Configuration Points

  • actions: This is the core of the configuration. It uses pathsToMatch with URL patterns to decide which content goes to which indexName.
    • Japanese: Targets the entire site (/**) while excluding the English path (!/en/**).
    • English: Targets only the English path (/en/**).
  • initialIndexSettings: Applies language-specific settings to each index (yoursite_ja, yoursite_en). Adjusting ignorePlurals and typo tolerance (minWordSizefor...Typos) is key to improving search quality.

2. Dynamic Index Switching in Docusaurus

Next, configure the Docusaurus front-end to automatically use the correct index based on the viewing language. This is done simply by modifying the algolia configuration in docusaurus.config.ts.

docusaurus.config.ts
import type { Config } from '@docusaurus/types';

const config: Config = {
// ... (basic site configuration)

i18n: {
defaultLocale: 'ja',
locales: ['ja', 'en'],
},

themeConfig: {
// ...
algolia: {
appId: 'YOUR_APP_ID',
apiKey: 'YOUR_SEARCH_ONLY_API_KEY', // Set the "Search-Only" key
indexName: 'yoursite', // Specify the common prefix for the index name

// This one line is the key
contextualSearch: true,
},
},
};

export default config;
The Role of contextualSearch: true

Enabling this option allows Docusaurus to automatically detect the current language context (ja or en). It then constructs the final index name by appending the locale ID as a suffix to the indexName prefix (yoursite), and sends the query to the correct index (yoursite_ja or yoursite_en).

  • When viewing a Japanese page → searches yoursite_ja
  • When viewing an English page → searches yoursite_en

Conclusion

With the settings above, you can dramatically improve the Algolia search experience on your Docusaurus i18n site.

  • Optimal Search Experience: Users can search only within the content of the language they are currently viewing, getting precise and noise-free results.
  • High Search Quality: Provide more relevant search results thanks to index settings optimized for each language.
  • Excellent Scalability: When adding new languages in the future, you can easily scale by adding a similar pattern to the crawler configuration.