
How-to Guide
Scolta on Drupal
Closing the Gap Between Search and Content

June 16, 2026
Imagine a user visits an encyclopedia website and searches for "survival in extreme conditions." The article they want, about the Ross Sea party being stranded in Antarctica for two years , doesn't contain that phrase anywhere. Nor do dozens of others that would match what they're hoping to find. Standard keyword search can't connect the query to any of them, because the visitor's specific words don't appear in any of the pages they're looking for.
You can try to close that gap with synonym lists, stemming, or manually tagging content with the terms you think people will use. But you can't control what people type, and they're going to use their own words. That gap between what someone types and what your content actually says is the problem Scolta solves.
Start with our introduction to Scolta and the thinking behind it in Introducing Scolta: what it is, why there's no vector database or embedding pipeline in the picture, and how the four-stage search pipeline fits together (the architecture itself goes deeper in The Practical Path to AI Search). This how to guide is the hands-on companion for Drupal: install the module, point it at your content, and tune it until that example query works. Everything here is open source as a Drupal module. Later posts in this series do the same for the other platforms, WordPress next.
The Athenaeum: Over 6,000 Wikipedia Articles in Drupal
To properly demonstrate Scolta on Drupal, we needed a demo corpus that was large, familiar, and verifiable. We couldn't do meaningful testing with lorem ipsum, and a handful of manually crafted test pages wasn't enough. We needed something where a reader could look at the search results and quickly understand whether they made sense.
Wikipedia's Featured Articles turned out to be perfect. These are the most rigorously reviewed entries in the English Wikipedia. We pulled over 6,000 of them spanning every domain of human knowledge: science, history, biography, geography, arts, technology, nature, military history, sports, philosophy. Each article averages 4,000 to 8,000 words with their full section structure preserved.
We built the demo site on Drupal 11 and called it The Athenaeum (it's themed with a library reading room aesthetic: parchment backgrounds, navy headings, burgundy accents, and a library card catalog motif in the search UI). All content is CC BY-SA 4.0 with attribution links back to the original Wikipedia articles on every page. So the demo uses real content, real licensing, and nothing proprietary.
This breadth is what makes the demo work: cross-domain discovery actually happens. A site about one topic can fake good search results with just keywords. An encyclopedia covering everything from quantum mechanics to the Battle of Gettysburg to bowerbird mating rituals can quickly fall apart with standard keyword search.
Installing Scolta on Drupal
Scolta integrates with Drupal through the Search API module, which is Drupal's established abstraction layer for search backends. If you've ever configured Solr or Elasticsearch for a Drupal site before, the workflow is familiar, but without infrastructure overhead. If not, it's still straightforward. Scolta is just another Search API backend, which means existing views, facets, and search pages keep working.
Step 1: Install Scolta with Composer the Way You'd Install Any Contributed Module:
composer require drupal/scolta
drush en scolta
Composer pulls in tag1/scolta-php (a shared library that makes Scolta work across Drupal, WordPress, Laravel, and custom PHP applications) and drupal/search_api automatically. The module works on Drupal 10.3+ and 11.x with PHP 8.1+.
Step 2: Set up Search API
- Go to Administration > Configuration > Search and metadata > Search API
- Add a server, select "Scolta (Pagefind)" as the backend
- Create an index pointing to that server
- Choose which content types to index: in our case, the Featured Article content type with title, body, and taxonomy fields
Step 3: Build the Index
drush scolta:build --force
This does two things. First, Scolta exports your Drupal content as static HTML, one file per indexed node, with the title, body, taxonomy, and any other fields you configured in the Search API index. Second, it builds the search index from that exported content using Pagefind, the static search library that scolta-core (the Rust/WASM engine from Introducing Scolta) builds on. The index runs entirely in the visitor's browser, which is why there's no Solr, Elasticsearch, or search daemon anywhere in these instructions.
For over 6,000 articles averaging thousands of words each, the build takes a few minutes on a decent server. The default PHP indexer needs no binary, no Node.js, and no exec(), so it runs even on shared hosting with a 128M memory limit.
Step 4: Connect an AI Provider
Scolta gives you three ways to wire this up:
- Zero-config (Amazee.ai free trial): out of the box you don't have to configure anything. On the first AI request, Scolta auto-provisions a free trial provided by Amazee.ai, a hosted LLM gateway that works immediately. When the generous trial ends, you'll be prompted to enter an email address if you wish to upgrade to a paid account.
- Direct provider connection: alternatively you can configure your own Anthropic or OpenAI API key in an environment variable (SCOLTA_API_KEY=sk-ant-...) or in settings.php. Scolta talks directly to the provider's API. You control the model, the costs, and the data flow without any additional dependencies.
- Drupal AI Initiative module: If you're already using the Drupal AI modules, Drupal's community-built abstraction layer for AI providers, Scolta integrates with it directly. Select "Drupal AI module" as the provider in Scolta's admin settings, and Scolta delegates all AI calls to whatever provider you've configured through the Drupal AI module. That means any provider the module supports (Anthropic, OpenAI, Amazee.ai, Ollama, AWS Bedrock, Azure OpenAI, and a growing list of others) works with Scolta automatically. You manage one provider configuration for your entire site instead of configuring each module separately. This is the recommended setup for Drupal sites that already use the AI module for other features like content generation or translation.
The admin form at /admin/config/search/scolta shows all available options in a dropdown. When you select the Drupal AI provider, the API key and model fields hide themselves, as at that point those settings come from the Drupal AI module's own configuration.
Step 5: Pick a Preset and You're Done
The next most important setting on the Scolta admin page is the "Site Type" dropdown asking what kind of site this is. Five options get you started:
- Start from Scratch: general purpose defaults
- Recipe & Content Catalog: for structured, timeless, browse-oriented content
- Documentation & Reference: for knowledge bases and domain-specific references
- E-commerce & Product Store: for product catalogs
- Blog & Editorial: for narrative and editorial content
For a Wikipedia-style encyclopedia, "Recipe & Content Catalog" is the right choice. The name may suggest cooking sites, but the preset is really for structured, timeless, browse-oriented collections, which includes an encyclopedia. There's also a "Documentation & Reference" preset, but it's tuned more for vocabulary bridging (patients searching "my head hurts" on a medical site where the content is more technical), which a later post in this series covers. Wikipedia readers generally search with the right terminology already, so the catalog preset fits better. Select "Recipe & Content Catalog", save, and the scoring parameters update to sensible defaults for encyclopedic content.
At this point you have a working AI search. Drop the Scolta Search block onto a page via Block Layout, and your users can start searching.
That's the "just works" path. It's maybe five minutes if you already know Drupal's Search API workflow. It's a fantastic starting place, but there's a lot of tuning possible.
What the Preset Actually Does (and When to Go Further)
The content_catalog preset makes three scoring changes that matter for an encyclopedia:
- Recency is disabled. The default recency strategy uses exponential decay, meaning newer content gets a ranking boost and older content gradually sinks in the search results. For a news site, that makes sense. For an encyclopedia, it's exactly wrong. An article about Roman aqueducts is as relevant today as the day it was indexed. The site type preset we've selected sets recencyStrategy to none.
- Full-title matches get a bigger reward. The default title boost already favors titles, and the preset leaves it alone. What it raises is
title_all_terms_multiplier, from 1.5 to 2.5: when every word of the query appears in the title, that article gets a strong extra push. Encyclopedia titles are precise identifiers, "Battle of Gettysburg," "Quantum mechanics," "Cleopatra," and when someone types one, the article carrying that title should rank first. The multiplier makes that happen without drowning out body-text matches for broader queries. - Body text gets a bump. content_match_boost goes from 0.4 to 0.5. While a small change, it does impact search performance. Encyclopedia articles have rich body text where the cross-domain connections live. The article about bowerbirds doesn't have "architecture" in its title, but the body text describes elaborate structures the birds build. Raising the content boost from 0.4 to 0.5 makes sure those body-text connections are found by relevant search queries.
The preset also widens the funnel: Pagefind fetches 75 candidate results instead of 50 before re-ranking, the AI summary draws from the top 15 results instead of 10, and result pages show 12 instead of 10, because browse-oriented sites reward breadth. Each preset is a starting point, not a cage; every individual parameter can be overridden in the Scoring section of the admin form (collapsed by default, because most people don't need it).
The Real Quality Lever: Site Description
Because Scolta is using Large Language Models (LLMs), the single most important configuration field isn't in the scoring section at all. It's the site description.
The site description is a plain text field in Scolta's Content section. Whatever you put there gets passed directly to the AI model at query expansion time. It's the context that tells whatever model you're using what kind of content it's working with.
For The Athenaeum, the description says the content spans science, history, biography, geography, arts, and technology, and that it's an encyclopedia covering all areas of human knowledge. That description does more for search quality than any scoring parameter adjustment.
When someone searches for a concept, say, "survival in extreme conditions", the AI expands that into specific search terms. Without a good site description, the expansion might focus narrowly, maybe just on survival gear or wilderness tips. With a description that says "cross-domain encyclopedia," the AI knows to think broadly. It expands to Antarctic exploration, extremophile bacteria, deep-sea life, space missions, mountain climbing. The expansion matches the content because the description matches the content.
A great site description with default scoring parameters will outperform a generic description with perfectly tuned scoring. It's held true on every site we've built with Scolta. Tune the description first. Tune the numbers second.
We started with scoring parameters because that's what search engineers expect to tune. It's what we'd reach for on Solr or Elasticsearch. But AI search inverts the priority: the context you give the model matters more than the weights you assign to fields. If you only have five minutes, spend them on the site description.
Show It Working
Here are two queries that show what Scolta does that keyword search can't. Because Scolta uses an LLM to expand each query, the exact results and AI Overview shift from search to search: the articles in the index don't change, but the expanded terms do, so you see a different slice of the same corpus each time. These reflect what we saw writing this post.
- "tiny things with huge impact" comes back with DNA nanotechnology, Niels Bohr (the subatomic scale determining all of chemistry), the periodic table, and Gothic boxwood miniatures, medieval carvings a few centimeters across whose "spiritual impact [was] curiously in inverse proportion to their size." It even pulls in Pluto's reclassification, a tiny world that reshaped what "planet" means. None of these pages say "tiny things"; the expansion found smallness expressed as nanotechnology, atomic physics, and miniature art.
- "beautiful mathematics" is the clearer case, since neither word alone would surface what comes back. The Aesthetics article leads, quoted directly on when "a mathematical proof may be considered beautiful." Group theory comes back as the study of symmetry, and Palladian architecture arrives through Colin Rowe's "The Mathematics of the Ideal Villa," connecting mathematical elegance to physical form. "Beautiful mathematics" isn't a keyword phrase, it's a concept living where aesthetics and formal reasoning meet, and the query expansion finds articles at that intersection.
Run the same two queries on any keyword search engine and compare.
The AI Overview Sees Your Data, Not Just Your Text
Finding the right articles is half the problem. The other half is what the AI does with them when it writes a summary.
Most AI search implementations feed the LLM a title, a URL, and a text excerpt per result. That's enough to generate a paragraph that sounds reasonable. But "sounds reasonable" and "actually correct" aren't always the same thing. Ask "which articles have the most citations" and the AI can only guess since it sees the article text but not the citation count. Ask "first article published" and it may hallucinate a date if it couldn't see the actual publish dates.
Scolta solves this by passing all indexed metadata to the AI alongside each result. Not just the text, but also every structured field in the index: word count, reference count, date, taxonomy, and whatever you've configured as sortable or filterable.
Each field is labeled so the AI knows what it's looking at, and if the user sorted or filtered their results, the AI sees that too. In this way it knows which field was sorted, in which direction, and which filters were applied, and it uses this knowledge when responding to site visitors.
For The Athenaeum, that means the AI overview can reference actual numbers. "The longest articles about science" doesn't produce a vague summary about science topics, it produces a summary that cites specific word counts because the AI can see the word_count field for each result. "Most cited articles about history" references real citation counts. Scolta guides the AI response with real data, minimizing hallucinations.
When Users Want Results in a Specific Order
So far we've been talking about finding the right articles, and about the AI summarizing them accurately. Sometimes users also want those results in a particular order: "longest articles about science," "most cited articles about history," "newest articles about chemistry." Each of those has a sort intent baked into the query, and because Scolta already passes the AI your structured fields, it can act on it.
Scolta detects this automatically. When someone types "longest articles about science," Scolta's AI expansion pipeline recognizes two things: the user wants articles about science (the search part), and they want those articles sorted by length (the sort part). The search results come back re-ranked by word count, highest first, with a sort badge below the search box showing "Sorted by: word_count (highest first)" and a dismiss button if you'd rather go back to purely relevance ordering.
"Longest articles about science" returns science-related articles sorted by their actual word count, with the AI overview citing specific word counts because it can see the metadata. On a recent run, the top results included Periodic table (29,840 words), J. Robert Oppenheimer (20,025 words), Plutonium (17,024 words), and Otto Hahn (16,291 words). Your results will differ because Scolta uses an LLM to expand the query, and different expansions surface different articles. The word counts are real and come from the indexed metadata, but which articles the expansion selects as "science" will vary from search to search.
"Most cited articles" works the same way, sorting by reference count. "Newest articles about chemistry" sorts by date. "Shortest science articles" sorts ascending. "Articles about wars sorted by date" demonstrates explicit sort syntax, the user literally says "sorted by" and Scolta catches it regardless of how complex the rest of the query is.
Scolta also knows when not to sort. "Best practices for scientific writing" returns matching results in relevance order, with no sort badge. "Most common elements" is a discovery query, the user is looking for well-known elements, not trying to sort by some metric. Of course, classification isn't perfect and edge cases exist where it may not work as you intend, especially when the same word could mean "sort by this metric" or "tell me about this concept" depending on context.
Configuring Sortable Fields
Sort detection only works if you tell Scolta which fields are sortable. The admin UI at /admin/config/search/scolta has a Sortable Fields section where you add fields and descriptions through a form.
If you prefer config-as-code, the same thing in YAML:
# config/sync/scolta.settings.yml
sortable_fields:
- word_count
- date
- reference_count
sortable_field_descriptions:
word_count: 'Number of words in the article'
date: 'Publication or last-updated date'
reference_count: 'Number of references cited in the article'
Import the config and clear cache:
drush config:import -y
drush cr
Note that the field descriptions actually matter and are configuration. They're passed to the AI so it can map user language to field names. When someone types "most cited articles," the AI reads the description "Number of references cited in the article" and connects "cited" to reference_count. Without the description, it has to guess from the field name alone, and reference_count is less obvious than citation_count would be.
Adding Your Own Sortable Fields
The fields available for sorting are the same fields you index in Pagefind. If your content has a structured field, such as a price, a rating, a difficulty level, or a page count, you can make it sortable.
For Drupal, add the field to your Search API index configuration so it gets exported during the Scolta build, then add it to the Sortable Fields section in the Scolta admin page with a plain-language description so the AI can map user intent to the field. (Or add it to sortable_fields in scolta.settings.yml with a corresponding entry in sortable_field_descriptions, as shown in the example above.)
As a concrete example, say your Drupal site has a "Reading Level" field with values 'Beginner', 'Intermediate', and 'Advanced' that you've mapped to numeric values '1', '2', '3' in your content type. Add reading_level to your sortable fields with the description "Reading difficulty level: 1=Beginner, 2=Intermediate, 3=Advanced." Now "easiest articles about science" returns science articles sorted by reading level ascending. The AI reads the field description, maps "easiest" to "lowest reading level," and sorts accordingly.
The pattern is: structured field in your content → indexed by Pagefind → listed in sortable_fields** with a description → the AI handles the rest. You don't write sort logic or parse the user queries. You describe your data and the AI figures out when sorting applies.
Further Tuning (If You Want It)
Most sites will never need to go beyond a preset and a good site description. But if you're the kind of person who tunes (and if you're reading a blog post this deep into AI search configuration, you probably are), here's what to look at for encyclopedic content.
maxPagefindResults controls how many results Pagefind returns before Scolta re-ranks them. The default is 50; the catalog preset already raised it to 75, because at over 6,000 pages a query like "ancient civilizations" can match hundreds of articles, and a wider initial fetch gives the re-ranking stage more to work with. Pagefind is fast, so pushing it higher costs little if your corpus is bigger than ours.
aiSummaryTopN controls how many top results get sent to the AI for summary generation. The preset bumped it from 10 to 15, which suits broad queries that surface relevant articles across several domains. Raise it further and the tradeoff is more latency and more tokens per summary.
Custom stop words can help if your corpus has terms that appear everywhere but carry no search signal. For a Wikipedia corpus, you might add "article," "section," "reference", words that are ubiquitous in encyclopedia content but meaningless for ranking.
The admin UI exposes all of this. The Scoring section is collapsed by default (because the preset handles it), but expand it and every parameter has a numeric input with inline help text. Change a value, save, and it takes effect on the next search. No rebuild is needed for scoring changes, only content changes require a rebuild.
For the CLI-inclined, drush scolta:status shows your current configuration and index health. drush scolta:clear-cache wipes the AI response cache if you want to test expansion changes with fresh LLM calls instead of cached ones. By default Scolta is optimized to reuse search results if the same search is made multiple times.
Scolta has plenty of other knobs (multilingual expansion across 29 languages, custom AI prompts, alternate recency curves, per-element index weighting), all exposed in the admin UI. But for most sites the preset, a good site description, and maybe one or two scoring tweaks are the whole job.
Keeping the Index Current
When you publish, edit, or unpublish content, the search index needs to reflect those changes.
On Drupal, Scolta hooks into Search API's indexing system. Content changes queue automatically, and drush scolta:build picks them up (running the full export-then-index pipeline). For a site with frequent updates, run it on cron or trigger it from a deployment script. For a static corpus like The Athenaeum where articles don't change, a one-time build is enough.
If you've already exported content and just want to rebuild the Pagefind index (useful when testing scoring changes that affect index structure), drush scolta:rebuild-index skips the export step and re-indexes against already-exported content. Faster when the content hasn't changed.
Scoring changes take effect immediately, no rebuild needed. Change title_match_boost from 2.0 to 2.5, save, and the next search uses the new value. Only structural changes (new content, new fields in the index, and changes to the indexer configuration) require a rebuild.
What Comes Next
The next post in this series configures Scolta on a WordPress demo: a fictional diary of the Space Race, over 200 blog posts spanning 1957 to 1973. That content is narrative rather than encyclopedic, so the site type changes and the tunables change with it, with the same Scolta underneath.
A later post deploys Scolta on a medical site, where someone searching "my head hurts" needs to land on the right clinical terminology ("intracranial hypertension," say) buried in a large body of technical content. That's the vocabulary-bridging case the "Documentation & Reference" preset is built for, a different problem from an encyclopedia where readers already know the right words.
Across all of them, Scolta adapts to each platform's native patterns (Search API on Drupal, a settings-page plugin on WordPress, a config file and CLI commands elsewhere) while the presets, the scoring, and the AI underneath stay the same.
Try It
The Athenaeum is live at scolta-demo-drupal-pedia.tag1.ai. Search for something conceptual, "art born from suffering," "animals thought to be extinct," and watch what comes back.
Running Scolta on your own Drupal site is the same path this post walked: composer require the module, drush enable it, configure the Search API server, build the index, pick a site type, and write a good site description. For the bigger picture, Introducing Scolta covers the project and where it's headed, and The Practical Path to AI Search goes deep on the four-stage pipeline. Give it a try and let us know how it goes!