«
item10

The Design Workshop

Adding a search feature to your site

« Ð

The Atomz search feature works by creating an index of the HTML text it finds on the pages at the location you give for that account. If your site makes extensive use of graphic images of text, note that these ‘words’ will not be able to be indexed, and therefore won’t be searchable. The indexing process begins by looking at the default page supplied by the Web server: almost always the index.html file. It stores a copy of all the text if finds there, follows each internal link it finds, indexes any text content on those pages, and follows any further links. This process continues until all pages that are available within the supplied URL and that are findable, whether via being the default served page or through links, have been read.

This indexing process won’t touch pages outside the URL you supply, so if you link to www.apple.com anywhere you don’t have to worry about your searches throwing up hundreds of pages from there. If you have an area on your site which you don’t want indexed but which is both within the URL you supplied and is linked to from one or more pages, you can use options in your Atomz account pages to specify pages or complete directories which shouldn’t be indexed. This ability has a few useful options; for example you can choose to have a page followed but not indexed or not followed at all, using ‘URL masks’. This is found in the Options section of your Atomz account pages.

If you want to have a page indexed but none of the links followed, put its URL into the URL mask field on the site as an included (rather than excluded) URL mask, with ‘nofollow’ after the address. The page’s text content will be read, but any addresses will be ignored. Alternatively, if you’d like a page’s links to be followed but any text content to be ignored, for example with a table of contents listing, where it doesn’t make much sense to use in searches but it does provide routes to many more suitable pages, put ‘noindex’ at the end of the line, as in ‘include http://www.mysite.com/contents.html noindex’.

Conversely, if you want to index specific pages or areas which aren’t linked to from the main entrypoint, use the URL Entrypoints page to list these addresses. This is also useful if your site uses more than one domain for part of its content.

A number of common words will be automatically excluded from the indexing process. It doesn’t normally make any sense keeping track of words such as a, an, the, is, and so on, and so these are automatically listed in the Excluded Words list in your Atomz account. You can add more to this list if you have any words that you don’t want to be used in searches to keep irrelevant results from being offered.

And finally, if your site uses frames you may want to specify a specific frame target name to be used with links in the search results page. This ensures that pages are opened in the appropriate frame of your site rather than taking over the whole browser window. If you don’t use frames then the default target of ‘_self’ is almost certainly the most appropriate - although in some cases forcing a new window to open up for each clicked result by using ‘_blank’ instead may be worth considering.