Search engines care about dead pages in your site that were not found, so that those can be differentiated from the rest of the site. Search engines expect a "404 not found" message, and you probably rely on Drupal to support that. Are you really sure that your "404 - page not found" page really returns the "404 not found" header and not "200 OK" or "404 - OK"? If you're running Drupal - I suggest you double check.
Some of those who submitted their sitemap to Google's Webmaster Tools might have came across a message saying that the sitemap can not be verified since all pages return "OK" success instead of "404 not found" headers - which is another example of how Google requires a proper 404 page. To check how your site is responding you can try one of the following ways :
- Use an online tool, like - HTTP Header Viewer or HTTP Status Codes Checker.
- Use the Live HTTP headers Firefox plugin.
Track a URL on your site that is suppose to return a "404 not found" message, and check the header. In some cases, like when running PHP CGI server, a wrong header will be returned, like "404 OK". This might happen since Drupal doesn't cover all possible configurations and follow up on PHP bugs.
Drupal's "404 OK" possible problem
When running Drupal on a server with PHP as CGI you have to change line 288(?) in /includes/common.inc from
drupal_set_header('HTTP/1.0 404 Not Found');
drupal_set_header('Status: 404 Not Found');
Otherwise it will not send the correct 404 Not Found headers. That includes popular hosts like Site5, Bluehost, etc. More information can be found here: http://us3.php.net/header
This was true for my Drupal sites running on a specific host (Siteground), which were returning "404 OK" instead of "404 not found", due to a PHP CGI bug that requires a different command. To fix it, you need to change the common.inc file as suggested and the problem will be sorted.
Drupal's "403 Forbidden" possible problem
It's interesting to read that one of the commenters has suggested another issue with the way 403 Forbidden is returned. Although he didn't manage to get this sorted, I found that changing the drupal_access_denied in common.inc in quite the same way works okay. So change :
drupal_set_header('HTTP/1.0 403 Forbidden');
drupal_set_header('Status: 403 Forbidden');
Drupal Custom 404 pages
There are a couple of modules that can enhance Drupal's "404 not found" behavior. Consider using the following modules :
This module allows the site admin to create custom error pages for 404 (not found), and 403 (access denied), without the need to create nodes for each of them.
Since the error pages are not real nodes, they do not belong a category term, they do not show in the search, and will not show up in node listings, or in Popular Content block.
Instead of showing a standard "404 Page not found", this module performs a search on the keywords in the URL, e.g. if a user goes to http://example.com/does/not/exist, this module will do a search for "does not exist".
It includes beta stage search engine keywords detections.