The commonly copied content is called duplicateWe when we start one seo on page we pay close attention to the copied content which is one of the most widespread drawbacks of search engine positioning and, curiously, one of the least attended, an example was this Friday a website of a company of great prestige and name of the country, which with only 75 urls that on the web more or less about thirty of them were duplicated, badly, we no longer enter with the lack of h1, title and description of some of the articles, which makes it impossible to position these articles.
We continue to optimize the web and talk about duplicatesIt is often talked about in what way optimize a web page in conditions for search engines, mainly google, how to get quality links, when before all this the vast majority of websites in this country even many managed by companies supposedly prestigious in the internet market do not they have paid attention to this widespread problem "Annoying duplicate content".
In this article I will try to explain everything I consider important about duplicate contentYou can find many articles related to it on the net, but really the years and the practice of fighting them is how to solve it, an example happened to me during my beginnings and they were two types of duplicate content content generated with 3 www and content generated without 3 www, as well as closing the urls with / at the end or without the / at the end all this generates duplicate content, so if you think it's yours and we go a little deeper to see if I can tell you how to detect it and solve the problem a little above
What it is, where to find it and how to get rid of it.First it would be good to give you an introduction of what it is copied content really, since it's not the simple copy paste we all have in mind when reading this header.
What is the copied contentIt can be said without any doubt that the copied content is any text repeated in more than one URL, whether internal or external. This is what happens when your site produces multiple copies of the same page, or when a spammer copies one of your articles or you use csv from the companies you are using and don't modify them so you're using a source used by more companies, in which case manipulation via programming or manual is vital. Many people think that I have come across more than one case that has led to discussion, including that duplicate content is not relevant and that this is not the case, as it is a difficult problem. Any Google user expects to find different, varied and very good quality results, not the same content over and over again, it makes sense as when we go shopping for Meritxell, Carlemany etc.… For this reason you need to filter all copies of the same content so that they do not appear. Supposedly others say obviously I do not believe in the impartiality of google, the filtering process that the search engine does is transparent, but I in my case when I do a new article, at the time I index it, and google puts 22 seconds ago We've indexed, very few people know how to do that, so I'm sure I'll be the first to be indexed with this article that is mine by the search engine, this system works by duplicating with external websites, always prove that you're there original. You don't have to know that your pages have been marked as duplicates, and you may even think that your pages are unique and attract visitors when the reality is very, very different so we have tools to control it. So if your site produces copied content, you lose the ability to appear in search results and be penalized and if the sanction is severe even vanished something we will now delve into.
Serious Consequences of Copied ContentNow that you are beginning to know and understand why it is so essential to avoid any duplicate content Whether internal or external, it's a good idea to know what we might call google penalties and what inconvenience your website may cause.
Wrong pagesHave different pages for one same content means leaving the search engine to choose the right page. This is not a good idea, as you can choose a different version than the one you want. bad friends created from .htaccess, but everything has a solution.
Worse visibilityWhen creating copied content the consequences may be that the search engine may end up showing a lesser imitation of the url you wanted to index which took you hours to edit, and therefore position it worse of what would be the good version, consequence less good quality traffic.
Poor indexingReally and without a doubt and always think that I am talking about the facts that I have personally suffered, the indexing of your pages can be seriously harmed by the fact that the search engine invests its time in crawling duplicate pages. If the copied content is an essential part of the site, the search engine will visit the essential pages less often, and if there are other websites such as those of the competition that it considers more authoritative and without duplicates, they will always have preference for about yours.
Waste of linksThis is a very important point, as the duplicate pages they can perceive these valuable backlinks that we need so much to position our website and dilute the strength of our content, while all these backlinks could (and should) be adding strength on a single page which is what we really wanted.
Wrong URL recognitionThis is the case that I used to comment on before that I manually submit the articles to be indexed in 20 seconds, because if we can't find that the search engine decides that our content is originating from another domain and exclude our pages from the results. It's the pinnacle of summers, the most happens and I read it every day in one of my Skype student groups, and after all that one wonders…
Is it true that Google penalizes copied content?Lived on my own websites especially with the entry of Panda and penguin I can tell you yes, it's true, so I mean Google rejects copied content, if it is a slight thing to be able to not penalize it, in this case what it does is to filter it so that it does not appear in the results, which is sufficient punishment. However, sites that are dedicated to copying and / or rewriting the content of others systematically, the more serious cases as I told you before if they are penalized, but think that you can find a website with duplicate 100% content and not be penalized , in this case I recommend you look at the source code of the website where the goals you can read "noindex", in this case they tell google not to index this duplicate content, and this makes the search engine not index these pages and does not penalize them. The well-known Panda algorithm was designed for the abuse that existed with automated websites copying third-party content, but this fact is still done but as I said before with noindex or other alternatives, we are experts in automated websites , and when Panda entered we had very strong penalties and today we have reinvented ourselves.
Causes copied or duplicate contentWhen people think of copied content the first thing that comes to mind is the image of a spammer landing on your site, copying several articles and complaining about them in another domain, few people know the ability of real webmasters to extract an entire website, yes you read that right or 300 articles in just a couple of hours I already have it and I'll tell you what I do with this content, and the truth is that the professionals know how to get the most out of it and make it very clear, we don't generate duplicate content, this is in the hands of learners today, hehe, I'll tell you more clearly, I can generate as many as 200 different articles from one article and none of them are duplicated so multiply, in 2 hours and 10 minutes I have created two thousand unique articles. But other than what is very common today in professional webmasters like us, the duplicate content most hunted by google is usually from the same web, no matter how optimized you think it is.
Being able to browse duplicate content and you don't know itNow all you have to do is optimize your website duplicate content, is a part of Seo on page, really the most forgotten by most SEOS o Assumptions breasts, those who have heard that links need to be made to position, which they do, now without an order, those who have heard this from h2, h3 and strongs, etc., but really have no idea, you do an audit of the web that they try to position and have hidden text which can be penalized, h1 multiplied by 3 negative thing because you deceive the search engine, as they do not know how to touch code or locate or read it, but we will now teach you to check the duplicate content, may another day use I will make an article of seo on page half advanced.
Non-mandatory domainI for example have a habit of operating without the 3 wwww, so I redirect all the urls that are generated with 3 w to the version that does not carry them, then you have to look at how your website operates, if it operates in both ways we are generating duplicate content, since search engines consider it as two different domains, if you prefer it as 2 (sub) domains, one starting with "www." and the other without them, the same happens with the bar at the end of the urls, it's more me from my cpanel or resitrador agent I can direct the (sub) domain to a specific ip and the other (sub) domain to another ip creating 2 totally different websites, which I do not recommend, better to create natural subdomains. In this case, if you find that you have the 2 (sub) domains active you will have to do as I do, redirect to the good domain, and which is the good domain, I would recommend you to take the one with higher PA Page authority. So we can say that the good version is the mandatory one and not setting it properly causes your site to be repeated in both variations.
Secure pagesIt's very similar to what we talked about in the previous point, so we can say in a similar way to what happens with the mandatory domain, if your website uses SSL encryption "https", you can end up with an accurate copy in the secure version which starts with HTTPS) that if you do not redirect the http version to the https version you could generate 2 (sub) domains as we said before if above we do not even have the 3 www and the closing of urls /, calculate the volume of duplicates that we generate so as not to create the right routes, we are talking about 6 in the best cases of duplicate urls for each of them, amazing !, right ??
The famous session IDMany sites handle user sessions by entering a code at the bottom of the URL of each page, mainly websites that work with webmasters and these IDS indicate where the sale is coming from, it would be the name or number of the product seller. These factors, different for each of the user sessions, may make the search engine think that they are separate pages, even though they are really exactly the same.
Known as active contentThere are sites that assign factors to URLs to monitor the content displayed to the user we could call a tracking id, as with session IDs, search engines may interpret these pages as copies.
The Always Neglected FilesA disadvantage of weblogs is that exactly the same content appears on different pages, as in category files and tags, to serve as an example for this reason in wordpress we have yoast where we can discriminate very well all these drawbacks , we use the premium version, plugin that we put to all our wordpress clients for free.
The famous paginationAny website that uses pagination can have this drawback, especially if the pages in the series share exactly the same title and description, there are two simple ways to solve these are: put from page 2 in the middle of the title and the page description 2, page 3 and so on or that from page 2 to goals a simple noindex.
Mobile version if you are not responsivePlease note that we do not recommend that your smart mobile version be in a separate URL, it has come a long way today with repsonsive design, however if you do not have a really bomb product, it is not worth investing in apps, all you have to do is put yourself in the customer's place and think if you use it you would download it, nowadays the apps are not that they are extinct, not for long really, if to buy a shirt I have 25 responsive websites and yours in app, you think, that is will they download your app ??, now you are a bomb is something different. But let's go to work as we were saying if the mobile version of your pages is produced in a separate URL (eg a subdomain or a subdirectory) and is not configured properly, the search engine may have annoyances to know that it is a version parallel.
Causes when content is duplicated outside of our websiteWe refer to cases such as Social Networks, CDN systems, which will have your content on a second server, the most common being scraped by a third party, then we delve into these most common reasons which are:
Syndication or social networksOne of the involuntary plagiarisms caused by oneself without realizing it which is to send your content to other places to attract traffic, such as to put an example via RSS, such as facebook, twitter, instagram etc. The inconvenience may arise when they post a full copy of your content instead of a snippet that is as it should be.
Location TLDThis practice is not very common, but people with little knowledge of web positioning believe that to position a particular country it is best to use their TLD, so to target multiple countries they may have used exactly the same content (or practically) in multiple domains, as if to use an example in a .es and a .mx, com.ar, .co, etc.
The CDNMost commonly used for high-traffic fat websites, and for this reason they diversify their content into a content delivery network called CDN, this involves answering some of the web content (primarily static files) on the network's servers. This can be a nuisance if proper action is not taken.
What we use ScrapingScrapers use different types of robot and non-robot software, you don't have to be a crack to copy your page and publish it in another domain, what really happens is that professionals use more advanced techniques, it's not just about scraping a web and that's it, but we spin the content giving a total change, we look for different images which we download en masse from the web, we rename them en masse with the tags we want to position and neither google nor the publisher itself notices that what he had written before, we also use plagium tools that give us so much % originality of the article, but I will also tell you something very clear, few people know how to apply the SEO on the web, once you've pinned, modified photos and applied the SEO, an article of mine achieves a 90% originality, so it passes all the google tests that we will talk about now and don't miss this, many people sell these articles on the net as their own.
Third party plagiarismOccasionally, someone may copy text from your site and post it on their feed or rss feed, as there are also tools if you don't block your feed or make only an excerpt. Most of the time it happens intentionally plus others don’t (the mentality is completely free on the internet and can be used freely) or so it is said.
How to notice copied contentGoogle identifies the copied content when it comes with identical or very similar titles, descriptions, headings and sections, think that the robot reads from left to right, if my content is the same as yours from left to right, without any modification, we are looking a fully copied content, now as I told you at the beginning of the article if the shoes are of the same brand but of different color and price, it is no longer the same, that if you see the same shoes of color and price, store after store , which is very common in the streets of Andorra, where all or most of the shoe stores are from Àlex boutiques and the clothing stores are from Via moda, this would be on the net duplicate content, to avoid it we would do as they mix it, but with that power would not be enough, you see.
Therefore, to review the copied content you need to start here. Now let me show you the methods we use: