Zoom Search Engine FAQ - Searching CMS

Q. How should I index my CMS (Content Management System)?

If your site is hosted within a CMS, you should note that some of these are fairly intricate scripts and some care need to be taken to make them understandable for search engines.

However, most of the better featured CMS offer some SEO (Search Engine Optimization) features and SEF (Search Engine Friendly) features to render it understandable by an external spider or bot such as those from Google, or of course, Zoom.

We would recommend consulting the documentation for your CMS for tutorials and advice on how to configure your CMS to be search engine friendly, as this would undoubtedbly help when you are indexing the site with Zoom.

Having said that, here are some tips on how you can configure Zoom to better index your CMS-based site.

Indexing Mambo or Joomla! websites

Joomla! (previously known as Mambo) is an open source Content Management System (CMS). As with many large, complicated CMS's, there may be a number of issues when spider crawling these websites due to the fact that they are rarely search engine friendly. In addition to this, since they are usually heavily configurable and customizable, it is difficult for us to give specific instructions since every install can vary greatly depending on the components you choose to use. However, it is possible to index Joomla sites with Zoom and we will try to provide some general advice below regarding our experiences.

You should also look at tips on SEO and SEF (Search Engine Friendly) settings in Joomla documentation. The same features, plugins, and advice to make your Joomla website more accessible to Google will apply to making it more accessible to index with Zoom.

For a majority of Joomla!-based websites, a well defined skip list may be all that's needed. Please see the previous explanation for more information on why this is necessary or what this achieves.

/task,calendar/
/task,register/
/task,lostPassword/
/task,userProfile/
/option,com_submissions/

The above will skip the calendar component, register/login, profiles and user submissions. On some sites however, there may be pages with variable components on the side which will change based on an "Itemid" parameter. This causes the existance of many distinct URLs which actually point to the same content page, but with a slightly different sidebar (eg. a "Who's online" box, Events box, etc.). These pages usually look something like this:

http://mysite.com/component/option,com_news/Itemid,100/

Where the number following "Itemid," changes the page sidebar or other components. In such cases, it may be possible to skip these variations, and only index one of these pages by skipping the various other Itemid values. For example, you could skip all variations such as "/Itemid,101", "/Itemid,102" etc. so that only "100" is indexed. Alternatively, if you have a link to the page without the Itemid attribute at all, you can skip all variations by simply having one skip list entry for "Itemid,".

This is made tricky however, if you, ironically enough, have some "search engine friendly" URL settings enabled in Joomla. Some of these settings rewrite the URL so that they may look like this:

http://mysite.com/content/view/5/100/

While some people believe that such URLs, which make the CMS parameters appear as if they are merely subfolders in the path of the URL, and thus make them more "attractive" to search engines (in that it forces the search engine to index more pages from the site); it can in fact, have a negative effect in that it now makes it impossible to recognise the parameters should we want to skip or ignore certain pages intentionally. This means you may end up indexing multiple pages of similar content - and this can have a negative effect on something like Google, if it decides that too many pages of your website look the same and believe your site is spamming.

In such cases, it may still be possible to filter out pages using the "Content filtering" option in Zoom (on the "Content filter" tab of the Configuration window). Here you can specify keywords that you wish to filter out pages by, if the page contains this word. You can also specify HTML in this list. So a content filter list like the following would skip all pages containing the "Who's Online" information box or any page containing the "noindex" meta robots tag:

-Who's online
-<meta name="robots" content="noindex">

For more information on using Content Filtering and the other indexing options of Zoom, please refer to the Users Guide.

Return to the Zoom Search Engine Support page