General
cancel
Showing results for 
Search instead for 
Did you mean: 

How do you stop Internet search engines from seeing your web site?

SOLVED
Go to solution

How do you stop Internet search engines from seeing your web site?

I'm running an Apache web server on Red Hat Linux 6.1 and I don't want some of the web pages to show up on any search engines (like Yahoo or Google). How can I prevent this from happening? Is there a configuration parameter in Apache that blocks the search engine query from finding all of the web pages on my web server?

Thanks,

Paul Mancillas
3 REPLIES
John Poff
Honored Contributor
Solution

Re: How do you stop Internet search engines from seeing your web site?

Hi,

I think there is a meta tag you can put in your html which will block the robot crawlers from indexing your site. Here is a link to some information about it on Google:

http://www.google.com/webmasters/3.html#B3

JP
Michael Armbrecht
Frequent Advisor

Re: How do you stop Internet search engines from seeing your web site?

Another method is to create a file called "robots.txt".
More information you can find here:
http://httpd.apache.org/docs/misc/howto.html#stoprob

More on the syntax of the "robots.txt" file is here:
http://www.searchengineworld.com/robots/robots_tutorial.htm

The only problem is that there might be search engines that do not obey the rules in your "robots.txt" file or do not care about it at all, in this case the only way out is to check which browser id string these search engines use and block access for them individually.

Michael
Never be afraid to try something new. Remember, amateurs built the ark. Professionals built the Titanic.
Gregory Fruth
Esteemed Contributor

Re: How do you stop Internet search engines from seeing your web site?

You can use the "Allow" and "Deny" directives in the
config file to restrict access. If you're trying to keep
out all interlopers, use:

Allow from my_domain.com
Allow from my_customers_domain.com

All the major search engines will obey robots.txt,
so if those are the concern then robots.txt is the
answer. If you don't trust robots.txt or get hit by a
rogue search engine that doesn't obey it, use the
"Deny" directive:

Deny from some_search_site.com
Deny from some_other_site.com

HTH