Setting up an automatic Smart Search indexing

In another article I've praised the Smart Search core extension and recommended you to use it. But there is a problem with this extension - you need to re-index your site after you add new things manually to get the most out of it. Although the Smart Search index is automatically kept up-to-date whenever content items are amended, there are some circumstances where you need to re-run the indexer. You can do this manually using the Index toolbar button in the Manage Indexed Content screen. Or there is another way to do it?

Yea, you're right, from a lazy old geek like me you should expect to see some automation Wink.

In fact, the problem can be relatively easily solved. Fortunately if you need to re-index content automatically then it is also possible to run the indexer as a command-line application. This makes it particularly convenient to run the indexer from a cron job (a regular *nix CRON or a pseudo-CRON componentf from Joomla). The Smart Search CLI  (Command Line Interface) application is located in the cli directory in your site's root directory (that is, the same directory as your configuration.php file). In this directory you will find a file called finder_indexer.php. While in shell, you can simply enter this command to run the indexer:

php finder_indexer.php

Ah, yea, humm, what the shell is? Good question, let's make it then simpler! Go to your hosting control panel, and locate the CRON Jobs applet. Any decent hosting control panel should have one. If you don't have it, you are using a wrong hosting company, that's sure! Whilst the specifics are beyond the scope of this article, in general you will merely have to enter the above command into the cron job manager and specify the time or times on which the job is to be run. You will probably need to include the full path to the indexer. For example, like this:

php /var/www/myjoomla/cli/finder_indexer.php

The exact format and the correct path in your case can vary depending on your hosting company's internal settings. If you use Rochen as your hoster, the above line will look like this:

/usr/local/bin/php /home/myaccount/public_html/finder_indexer.php

Once the command line is set up, all you need to do is to select the intervals when the command will be scheduled and the e-mail address where you will receive the notifications after the command was executed.

The e-mail you receive will contain something like this:

Smart Search INDEXER
============================
 
Starting Indexer
Setting up Finder plugins
Setup 154 items in 0.094 seconds.
 * Processed batch 1 in 0.213 seconds.
 * Processed batch 2 in 0.182 seconds.
 * Processed batch 3 in 0.177 seconds.
 * Processed batch 4 in 0.009 seconds.
Total Processing Time: 0.676 seconds.

The most common problem you can have is related with memory usage. The indexing is memory intensive task, and is determined by your settings in the parameters of the indexer on the Manage Indexed Content screen. You can change the parameters using the Options toolbar button on that screen. Note that both the Indexer Batch Size and Memory Table Limit fields affect the amount of memory used by the indexer.

If you are receiving Out of Memory messages in the notification mails, you can tweak these settings or you can try your chances. If the settings of your hosting accounts are permissive enough, you can force the CRON Job to use more memory, using some extra parameters, like:

php /usr/local/bin/php -d memory_limit=256M /home/myaccount/public_html/finder_indexer.php

Play with the 256M parameter until you find a safely running job with lowest possible memory allocated - overusing your servers resources can end up with banning your account!

Setting up this using a pseudo-CROn Joomla component is similar. Just pick your favourite one from JED, and set it up! If you want to know more about using CRON jobs generally, you can read for example this article!