Comparing Xapian and Drupal 5's Core Search

Benchmarking Search

Photo of Jeremy Andrews
Jeremy Andrews - Founding Partner/CEO
July 9, 2008

SearchBench has received a couple of useful updates since yesterday's initial cloud tests. It can generate search queries based on actual content, and it can export search benchmark results. In gaining these features, it is now possible to use SearchBench to perform some actual performance comparisons.

Once again I set up these tests on an extra large EC2 instance. I still have not performed any tuning, and I continue to test Drupal 5 core search with Xapian search. My initial benchmarks show that Xapian offers a very significant 6x+ performance advantage over Drupal's core search when a given search query actually returns results. In addition, Xapian is able to index a large site in about a 3rd the time of Drupal 5's built in search. Read on for actual benchmark results and graphs.

These tests make it clear that it's important to use legitimate search terms when benchmarking search performance. SearchBench's new ability to extract wordlists from a site's actual content allows the tool to provide much more useful data. Again, note that neither Xapian nor MySQL has been tuned for these results, and that future benchmarks will aim to better understand how various tunings and configurations affect search performance.

Performance graph showing Xapian search response times with nonsense queries. Most queries show fast, consistent performance with occasional slowdowns when results are returned.

Most of these queries did not return any actual search results. The few slow downs you see are because Xapian did return results for some queries.

Performance graph showing Drupal core search response times with nonsense queries. Demonstrates consistent fast performance since no search results were returned for any queries.

These are the same queries that were used in the previous test. Note that Drupal core's search did not return results at any time. It would be interesting to compare the queries where Xapian does return results but Drupal core does not, and to fully understand why they the difference in search results.

Performance graph showing Xapian search response times with real content-based queries. Shows visible performance slowdowns when actual search results are returned, with average query time of 0.23845 seconds.

In this test, SearchBench generated wordlists based on words extracted from actual content on the website being tested. As a result, many of the queries returned actual results, visible in the performance slowdown above.

Some hard numbers from the above test:

Total tests 3
Searches per test 100
Total time 71.5365 seconds
Average time per test 23.8455 seconds
Average time per query 0.23845 seconds
Longest query 0.66174 seconds
Shortest query 0.12636 seconds

Performance graph showing Drupal core search response times with real content-based queries. Demonstrates significant performance slowdowns when results are returned, with average query time of 1.44620 seconds - about 6x slower than Xapian.

Thanks to SearchBench, the queries used in this test are identical to the queries used in the previous Xapian test, offering a more precise comparison between the two search solutions. There is an apparent slowdown in Drupal core powered searches when they return actual results. Much of this slow down is likely due to the creation of temporary tables, an issue that has been significantly improved in Drupal 6. This functionality is being back ported to Drupal 5 as an optional patch on which I plan to run additional benchmarks.

Some hard numbers from the above test:

Total tests 3
Searches per test 100
Total time 433.8613 seconds
Average time per test 144.6204 seconds
Average time per query 1.44620 seconds
Longest query 4.90253 seconds
Shortest query 0.11557 seconds

The raw search data from the above benchmarks can be found in this Gnumeric spreadsheet.

There are many more benchmarks planned, as detailed in my earlier blog posting. SearchBench is being developed as a tool to better understand search performance and scalability. Tag1 Consulting is focused on defining solid recommendations and best practices for obtaining optimal performance from LAMP-powered search solutions, and on continuing to improve Drupal's scalability.

Work With Tag1

Be in Capable Digital Hands

Gain confidence and clarity with expert guidance that turns complex technical decisions into clear, informed choices—without the uncertainty.