Google really doesn’t like content duplication on sites and so it is advisable to prevent the Google crawler from reaching the same content on your site from more than one url. Since WordPress does offer many ways of reaching your content, you should block certain URL and URL paths by defining the right robots.txt.

Here’s my suggestion for the WordPress robots.txt :

User-agent:  Googlebot
# Disallow all directories and files within
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
# Disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
# Disallow parsing individual post feeds, categories and trackbacks..
Disallow: */trackback/
Disallow: */feed/
Disallow: /category/*

Be extremely careful when implementing this. For example, some WordPress installations have Gallery2 embedded which – for reasons unknown – likes to run with main.php in the url (even with url-rewrite enabled!). Furthermore, if your blog is in a sub-directory in your domain and you change the robots.txt for the entire domain note that you might block essential pages in other sub-directories. I imagine this is the reason why robots.txt isn’t included as part of the default wordpress installation.
As explained by my fellow bloggers who trackbacked, you also need to take care with the agents you block, and it would be wise to target bots specifically instead of using the problematic * symbol in the "user-agent" field.

Tagged with:

43 Responses to WordPress SEO : using robots.txt to avoid content duplication

  1. AdSense and robots.txt (Part 2)…

    In AdSense and robots.txt (Part 1) I described the basic syntax for the robots.txt file. Today we look at how the robots.txt file can affect your AdSense income if you’re not careful with how you declare the exclusion rules.

    Mediabot: The AdSense Cra…

  2. filination says:

    You’re absolutely right. Thanks for the helpful comments.

  3. Frostfox says:

    I tried your suggestion about the robot.txt. I am not sure if it was the reason, but my Google page rank went from a 2 to a 4. The only other reason I can think of is that I moved my blog from the URL to

  4. Lonnie says:

    Consolidating your incoming links ups your page rank score. has some tools to help you with consolidation….

    I generally avoid any exclude tags on the site…The dangers outweigh the advantages for me…Any case studies on this one?

  5. filination says:

    Frostfox – Yeah, I believe Lonnie is right, but you should know that pagerank only updates once every 3-4 month, so it’s not something that you get immediate results on. But redirecting a few reachable urls into one, is very good practice, especially if Google penalized you for duplicate content.

    Lonnie – Yeah, there are, and those are all over the SEO blogosphere. You can start off by SEOBook and see his self-report as well as incoming trackbacks. As long as you closely monitor the robots.txt performance in Google webmasters tools, I believe you’ll be o’right.

  6. Frostfox says:

    I have been playing around with the Google webmaster thing for a bit now.
    2 things about your robot.txt, one you have a spelling mistake “indididual” should be “individual”, and isn’t disallowing files that end with .php a bad idea? Your index for your site is index.php.

  7. filination says:

    Thanks for the spelling correction 馃槢 :$

    Your question about index.php is actually what content duplication is about. Some blogs allow the exact same page to appear through index.php and their main blog path “/” and that’s something you want to avoid.

  8. Apache Gal says:

    Nice post dude.. You will want to check out WordPress robots.txt for more examples.

  9. […] 讙讜讙诇 诇讗 讗讜讛讘 讟住讟讬诐 讻驻讜诇讬诐 讜诪讜专讬讚 诇讻诐 讗转 讛-PR 注诇 讚驻讬诐 砖讛讜讗 诪讗谞讚拽住 诪住驻专 驻注诪讬诐, 讻诪讜 讻谉 讬砖 讚驻讬诐 砖讗转诐 讘注爪诐 诪诪砖 诇讗 专讜爪讬诐 砖讙讜讙诇 讬讗谞讚拽住 诇讻诐. 讻转讜爪讗讛 诪讻讱 讗转诐 讘讛讞诇讟 专讜爪讬诐 诇诪谞讜注 诪讙讜讙诇 诇讛讙讬注 诇讚驻讬诐 砖讜谞讬诐 讚专讱 讬讜转专 诪讻转讜讘转 讗讞转 (讜讜专讚驻专住 讚讜讜拽讗 诪讗驻砖专 讝讗转) 讜讝讗转 谞讬转谉 诇注砖讜转 讘注讝专转 讛拽讜讘抓 robots.txt. 讟讬驻讬诐 诇砖讬谞讜讬 讛拽讜讘抓 谞讬转谉 诇拽专讜讗 讘驻讜住讟 “讗讬讱 诇讛砖转诪砖 讘拽讜讘抓 robots.txt 注诇 诪谞转 诇诪谞讜注 讻驻讬诇讜转 转讜讻谉”. […]

  10. fiLi says:

    Yeah, I later found your post through various bloggers on the net (JohnTP etc.). That’s a good comprehensive post you wrote there…

  11. […] Using Feedburner? then you should disable your feeds from search engine indexing. After you’re done tweaking your robots.txt to avoid content duplication and sorting your .htaccess to setup all the redirects you should also take care of the content duplication happening with your Feedburner feed. […]

  12. Mark Wilson says:

    Hey Fili – thanks for the advice; unfortunately by blocking all PHP files I stopped Google from accessing my home page (the Google Webmaster Tools said that


    I read your comment above to Frostfox, do you have any advice for dealing with the situation where and are actually one and the same?

    TIA, Mark

  13. fiLi says:

    Glad I could help.

    I believe this next plugin will take care of that problem for you (and a few other duplicate content issues) :
    Permalink Redirect

  14. […] following fiLi’s advice for using robots.txt to avoid content duplication, I started to edit my robots.txt file. I won’t list the file contents here – suffice to say […]

  15. Mark Wilson says:

    Thanks again fiLi – that plugin looks really useful. M

  16. […] just about the same thing, but might be a little more tricky to set up correctly. Check out “WordPress SEO : using robots.txt to avoid content duplication” for […]

  17. […] Robots.txt : Either robots.txt doesn’t exist at all (LT), it has something that does nothing (MKS), or it’s useless (TCI). Use robots.txt to avoid content duplication. […]

  18. […] kinda like that, but it doesn’t seem to cover everything. Fili’s Tech has an article on wordpress seo for wordpress too, and I like his ideas. So I ended up with something like this: # Disallow all directories and […]

  19. […] WordPress SEO : using robots.txt to avoid content duplication […]

  20. AskApache says:

    Great blog and nice post Fili.. I like how you are keeping it simple, I recently changed my robots.txt from WordPress robots.txt example, to a simpler version, perhaps its time for a followup article???

  21. […] 鍒氭墠鐪嬪埌 filination.com涓婇潰鎻愬埌鐨勪竴涓猺obots鏂囦欢锛屽苟涓旀彁渚涗簡绠鐭殑瑙i噴銆傚垰鍒氭帴瑙obots.txt鏂囦欢鐨勬湅鍙嬶紝鍙互鍙傝冧竴涓嬶細 User-agent: Googlebot […]

  22. […] a combination between the files on this robots.txt post at Connected Internet and this robots.txt post from Filination. If you use these you’ll manage to maximise your Page Rank on all of your […]

  23. This robots.txt file looks very simple and thanks for some of the points you made. Now I again have to go through all the site I visited for the robots.txt file and create my own file based on the convincing information provided by them. Hopefully, after experimenting a little I would come to know which is best for a wordpress site. Thank you for the information.

  24. If an RSS feed is the Yahoo backdoor, is a Blog Google聮s?…

    Though the answer is in a book I wrote this July, the question is still asked of me repeatedly. Why does it work for some sites and not others? And how come some blogs get indexed in a day and then are dropped, and others stay in Google indefinitely?…

  25. […] one suggestion for the WordPress robots.txt from Fili’s Tech: User-agent: […]

  26. […] to the last line in the file. There has also been other bloggers such as Everton, 20 steps and FiLi who have created a Robots.txt and saw a marked increase in their blog traffic. Trust me, Google […]

  27. […] robots.txt – 砖讜讘, 砖讬诪讜砖 讘拽讜讘抓 讝讛 讬讻讜诇 诇注讝讜专 诇讛诪谞注 诪讘注讬讜转 讻驻讬诇讜转 转讻谞讬诐. 砖讜讜讛 诇拽专讜讗 讗转 讛驻讜住讟 砖诇 驻讬诇讬 讘谞讜砖讗. […]

  28. […] Dica: O arquivo robots.txt controla o que os rob么s dos mecanismos de busca podem ou n茫o indexar em seu blog. Esse artigo cont茅m o robots.txt ideal para um blog WordPress. Leia mais… […]

  29. Romano says:

    Many thanks for your preciuos info..
    Grazie mille and greetings from Italy.

  30. […] WordPress SEO: using robots.txt to avoid content duplication […]

  31. saytopedia says:

    Useful tip and a simple solution. Thanks.

  32. […] Setup your robots.txt file for WordPress […]

  33. Seo Google Pagerank says:

    Why does it work for some sites and not others? And how come some blogs get indexed in a day and then are dropped, and others stay in Google indefinitely?鈥

  34. jane says:

    Interesting! Always looking for useful SEO tips.

  35. Orion SEO says:

    Some people optimizing their websites/blogs are not aware using of this, it is better that there are always someone who are ready to impart their tips and seo knowledge like you. Thanks for your kindness.

  36. Orion SEO says:

    Some people did not aware of what they are doing in SEO, thanks for someone like you.

  37. You should check WordPress Robots.txt for Silo SEO as it goes into more detail on removing duplicate content on wordpress using robots.txt.

  38. i think User-agent: * is very good!

  39. Maski says:

    Ok,.. lets say my blog has been online without a robots file for a couple.. well more than 4 years, traffic comes and goes, but if i change my robots to your recomendations, when google reads it next time, would it remove instantly the duplicate content?


    • Fili says:

      MASKI – it would take a few weeks for Google to respond, but it doesn’t matter whether you had robots.txt before and for how long. Just add it, and Google will adjust accordingly. You can track Google’s use of robots.txt using the Google Webmasters Central.

Leave a Reply

Your email address will not be published. Required fields are marked *

Set your Twitter account name in your settings to use the TwitterBar Section.