SEO

A very common problem is copied or duplicate content

31/07/2016 | by Digital Trafficker

Taula de Continguts

The commonly copied content is called duplicate

We when we start one seo on page we pay close attention to the copied content which is one of the most widespread drawbacks of search engine positioning and, curiously, one of the least attended, an example was this Friday a website of a company of great prestige and name of the country, which with only 75 urls that on the web more or less about thirty of them were duplicated, badly, we no longer enter with the lack of h1, title and description of some of the articles, which makes it impossible to position these articles.

We continue to optimize the web and talk about duplicates

It is often talked about in what way optimize a web page in conditions for search engines, mainly google, how to get quality links, when before all this the vast majority of websites in this country even many managed by companies supposedly prestigious in the internet market do not they have paid attention to this widespread problem "Annoying duplicate content".

In this article I will try to explain everything I consider important about duplicate content

You can find many articles related to it on the net, but really the years and the practice of fighting them is how to solve it, an example happened to me during my beginnings and they were two types of duplicate content content generated with 3 www and content generated without 3 www, as well as closing the urls with / at the end or without the / at the end all this generates duplicate content, so if you think it's yours and we go a little deeper to see if I can tell you how to detect it and solve the problem a little above

What it is, where to find it and how to get rid of it.

First it would be good to give you an introduction of what it is copied content really, since it's not the simple one copy paste that we all have in mind when reading this header.

What is the copied content

It can be said without any doubt that the copied content is any text repeated in more than one URL, whether internal or external. It's what happens when your site produces multiple copies of the same page, or when a spammer copies one of your articles or you use csv of the companies you own and don't modify them for what you're using a source used by more companies, in this case manipulation via programming or manual is vital. Many people think that I have come across more than one case that has led to discussion including that duplicate content is not the least relevant and it is not, quite the opposite since it is a difficult problem. Any Google user expects to find different, varied and very good quality results, not the same content over and over again, it has its own logic like when we go shopping in Meritxell, Carlemany etc... For this reason it is necessary to filter all copies of same content so that they do not appear. Supposedly others say obviously I don't believe in Google's impartiality, the filtering process that the search engine does is transparent, but in my case when I make a new article, I immediately index it, and Google puts it 22 seconds ago We have indexed, very few people know how to do this, in this way I make sure to be the first to be indexed with this article that is mine by the search engine, this system works by duplicating with external websites, always prove that you are the original You don't have to know that your pages have been marked as duplicates, and you can even think that your pages are unique and attract visitors when the reality is very, very different, which is why we have tools to control it. So if your site produces copied content, you lose the ability to appear in search results and be penalized and if the sanction is severe even vanished something we will now delve into.

Serious Consequences of Copied Content

Now that you are beginning to know and understand why it is so essential to avoid any duplicate content Whether internal or external, it's a good idea to know what we might call google penalties and what inconvenience your website may cause.

Wrong pages

Have different pages for one same content means leaving the search engine to choose the right page. This is not a good idea, as you can choose a different version than the one you want. bad friends created from .htaccess, but everything has a solution.

Worse visibility

When creating copied content the consequences may be that the search engine may end up showing a lesser imitation of the url you wanted to index which took you hours to edit, and therefore position it worse of what would be the good version, consequence less good quality traffic.

Poor indexing

Really and without a doubt and always think that I am talking about the facts that I have personally suffered, the indexing of your pages can be seriously harmed by the fact that the search engine invests its time in crawling duplicate pages. If the copied content is an essential part of the site, the search engine will visit the essential pages less often, and if there are other websites such as those of the competition that it considers more authoritative and without duplicates, they will always have preference for about yours.

Waste of links

This is a very important point, as the duplicate pages they can perceive these valuable backlinks that we need so much to position our website and dilute the strength of our content, while all these backlinks could (and should) be adding strength on a single page which is what we really wanted.

Wrong URL recognition

This is the case that I used to comment on before that I manually submit the articles to be indexed in 20 seconds, because if we can't find that the search engine decides that our content is originating from another domain and exclude our pages from the results. It's the pinnacle of summers, the most happens and I read it every day in one of my Skype student groups, and after all that one wonders…

Is it true that Google penalizes copied content?

Lived on my own websites especially with the entry of Panda and penguin I can tell you yes, it's true, so I mean Google rejects copied content, if it's something light power doesn't penalize it, in this case what it does is filter it so that it doesn't appear in the results, which is punishment enough. However, sites that are dedicated to copying and / or rewriting the content of others systematically, the most serious cases as I told you before if they are penalized, but think that you can find a website with duplicate 100% content and not be penalized , in this case I recommend you look at the source code of the website where you can read "noindex" in the metas, in this case they tell google not to index this duplicate content, and this makes the search engine not index these pages and does not penalize them. The well-known Panda algorithm was designed for the abuse that existed with automated websites copying third-party content, but this fact is still done but as I said before with noindex or other alternatives, we are experts in automated websites , and when Panda entered we had very strong penalties and today we have reinvented ourselves.

Causes copied or duplicate content

When people think of copied content the first thing that comes to mind is the image of a spammer landing on your site, copying several articles and complaining about them in another domain, few people know the ability of real webmasters to extract an entire website, yes you read that right or 300 articles in just a couple of hours I already have it and I'll tell you what I do with this content, and the truth is that the professionals know how to get the most out of it and make it very clear, we don't generate duplicate content, this is in the hands of learners today, hehe, I'll tell you more clearly, I can generate as many as 200 different articles from one article and none of them are duplicated so multiply, in 2 hours and 10 minutes I have created two thousand unique articles. But other than what is very common today in professional webmasters like us, the duplicate content most hunted by google is usually from the same web, no matter how optimized you think it is.

Being able to browse duplicate content and you don't know it

Now all you have to do is optimize your website duplicate content, is a part of Seo on page, really the most forgotten by most SEOS o Assumptions breasts, those who have heard that links need to be made to position, which they do, now without an order, those who have heard this from h2, h3 and strongs, etc., but really have no idea, you do an audit of the web that they try to position and have hidden text which can be penalized, h1 multiplied by 3 negative thing because you deceive the search engine, as they do not know how to touch code or locate or read it, but we will now teach you to check the duplicate content, may another day use I will make an article of seo on page half advanced.

Non-mandatory domain

I for example have a habit of operating without the 3 wwww, so I redirect all the urls that are generated with 3 w to the version that does not carry them, then you have to look at how your website operates, if it operates in both ways we are generating duplicate content, since search engines consider it as two different domains, if you prefer it as 2 (sub) domains, one starting with "www." and the other without them, the same happens with the bar at the end of the urls, it's more me from my cpanel or resitrador agent I can direct the (sub) domain to a specific ip and the other (sub) domain to another ip creating 2 totally different websites, which I do not recommend, better to create natural subdomains. In this case, if you find that you have the 2 (sub) domains active you will have to do as I do, redirect to the good domain, and which is the good domain, I would recommend you to take the one with higher PA Page authority. So we can say that the good version is the mandatory one and not setting it properly causes your site to be repeated in both variations.

Secure pages

It's very similar to what we talked about in the previous point, so we can say in a similar way to what happens with the mandatory domain, if your website uses SSL encryption "https", you can end up with an accurate copy in the secure version which starts with HTTPS) that if you do not redirect the http version to the https version you could generate 2 (sub) domains as we said before if above we do not even have the 3 www and the closing of urls /, calculate the volume of duplicates that we generate so as not to create the right routes, we are talking about 6 in the best cases of duplicate urls for each of them, amazing !, right ??

The famous session ID

Many sites handle user sessions by entering a code at the bottom of the URL of each page, mainly websites that work with webmasters and these IDS indicate where the sale is coming from, it would be the name or number of the product seller. These factors, different for each of the user sessions, may make the search engine think that they are separate pages, even though they are really exactly the same.

Known as active content

There are sites that assign factors to URLs to monitor the content displayed to the user we could call a tracking id, as with session IDs, search engines may interpret these pages as copies.

The Always Neglected Files

A disadvantage of weblogs is that exactly the same content appears on different pages, as in category files and tags, to serve as an example for this reason in wordpress we have yoast where we can discriminate very well all these drawbacks , we use the premium version, plugin that we put to all our wordpress clients for free.

The famous pagination

Any website that uses pagination can have this drawback, especially if the pages in the series share exactly the same title and description, there are two simple ways to solve these are: put from page 2 in the middle of the title and the page description 2, page 3 and so on or that from page 2 to goals a simple noindex.

Mobile version if you are not responsive

Please note that we do not recommend that your smart mobile version be in a separate URL, it has come a long way today with repsonsive design, however if you do not have a really bomb product, it is not worth investing in apps, all you have to do is put yourself in the customer's place and think if you use it you would download it, nowadays the apps are not that they are extinct, not for long really, if to buy a shirt I have 25 responsive websites and yours in app, you think, that is will they download your app ??, now you are a bomb is something different. But let's go to work as we were saying if the mobile version of your pages is produced in a separate URL (eg a subdomain or a subdirectory) and is not configured properly, the search engine may have annoyances to know that it is a version parallel.

Causes when content is duplicated outside of our website

We refer to cases such as Social Networks, CDN systems, which will have your content on a second server, the most common being scraped by a third party, then we delve into these most common reasons which are:

Syndication or social networks

One of the involuntary plagiarisms caused by oneself without realizing it which is to send your content to other places to attract traffic, such as to put an example via RSS, such as facebook, twitter, instagram etc. The inconvenience may arise when they post a full copy of your content instead of a snippet that is as it should be.

Location TLD

This practice is not very common, but people with little knowledge of web positioning believe that to position a particular country it is best to use their TLD, so to target multiple countries they may have used exactly the same content (or practically) in multiple domains, as if to use an example in a .es and a .mx, com.ar, .co, etc.

The CDN

Most commonly used for high-traffic fat websites, and for this reason they diversify their content into a content delivery network called CDN, this involves answering some of the web content (primarily static files) on the network's servers. This can be a nuisance if proper action is not taken.

What we use Scraping

Scrapers use different types of robot and non-robot software, you don't have to be a crack to copy your page and publish it in another domain, what really happens is that professionals use more advanced techniques, it's not just about scraping a web and that's it, but we spin the content giving a total change, we look for different images which we download en masse from the web, we rename them en masse with the tags we want to position and neither google nor the publisher itself notices that what he had written before, we also use plagium tools that give us so much % originality of the article, but I will also tell you something very clear, few people know how to apply the SEO on the web, once you've pinned, modified photos and applied the SEO, an article of mine achieves a 90% originality, so it passes all the google tests that we will talk about now and don't miss this, many people sell these articles on the net as their own.

Third party plagiarism

Occasionally, someone may copy text from your site and post it on their feed or rss feed, as there are also tools if you don't block your feed or make only an excerpt. Most of the time it happens intentionally plus others don’t (the mentality is completely free on the internet and can be used freely) or so it is said.

How to notice copied content

Google identifies the copied content when it comes with identical or very similar titles, descriptions, headings and sections, think that the robot reads from left to right, if my content is the same as yours from left to right, without any modification, we are looking a fully copied content, now as I told you at the beginning of the article if the shoes are of the same brand but of different color and price, it is no longer the same, that if you see the same shoes of color and price, store after store , which is very common in the streets of Andorra, where all or most of the shoe stores are from Àlex boutiques and the clothing stores are from Via moda, this would be on the net duplicate content, to avoid it we would do as they mix it, but with that power would not be enough, you see.

Therefore, to review the copied content you must start the here. Now I show you the methods we use:

The famous old Google Search Console Webmaster Tools

If you've signed up for the Google Webmaster Tools, which is a great way to control your website, this is definitely the best place to start. Access Search Aspects, HTML Code Enhancements, and pay attention to duplicate title tags and meta descriptions. The report tells you how many replicas there are and exactly on which pages they were found so that you can correct them. , analyze all to see if the problem is the same if it is one of the cases we talked about before.

Site footprint

In future articles I will explain what footprints are and how to use them to refine your google searches, now we will go to google.ad and put it in the browser site: www.elmeudomini.com, it is a very effective procedure but it requires a lot of work . It consists of searching your site for certain keywords or key phrases, such as products in the case of an online store in which case we could do a search like this (site: www.elmeudomini.com intext: "the keyword" or inurl : "The title of the product") intext we tell you that within the text of the web has to be, inurl that is in the url, and we put the commas to be precise and not to do an extensive search in the url. In the result you can see if there are pages with duplicate titles and descriptions. This procedure also lets you know if certain pages have been moved to the secondary index when you see a message on the last page of results ("repeat search and also include omitted results"). This is a symptom of copied content.

Screaming Frog

This powerful tool lets you track your site behind duplicate content, among many other things like broken links, or links to unknown third parties. The tabs you are interested in are URI, Page Titles, Meta Description and H1, using the Duplicate filter. How to use it, later we will do a tutorial to get the most out of it and how to install it for free, it is available for both Mac and PC, once installed we enter the key and it works perfectly, we will configuration spider and we configure it according to our needs, if we do not have any spam blocker on the web we do not need to put any proxy, once this is done, we must put the Spider Mode and allow us to put our url or d others, in Serp or list mode is to analyze hundreds of websites, it is a very useful tool for buying expired domains, here I drop it, but to control the issue of duplicate content or missing H1 tags, title or description even to do an audit SEO with the spider section you have plenty, you can also download the CSV WITH RESULTS.

Google Analytics

You can also locate duplicate pages in Analytics through the Behavior, Site Content, Landing Pages report. The key is to look for suspicious URLs and pages that receive less traffic from search engines than they are.

External web analytics tools

There are tools that can identify copied content, apart from broken backlinks, non-indexed pages and other inconveniences that are very difficult to notice with the naked eye, such as the most powerful example of all is Ahrefs, which today has raised our prices and does not allow us work as massively as we were used to, but we also have Sistrix which together with ahrefs is a super powerful weapon for the good SEOS, then we have others like Majestic, Semrush, and here you can see really good one SEO When using these tools and how to use them.

Warn when out of place

If the copies are outside your website, you can check the copied content through the following tools: As I have already told you more than once, we are webmasters so we know all the tricks when it comes to copying all websites and we are the first to not want it to be noticed that the content it is copied or duplicated, not because google penalizes our website, because we are also experts in recovering the website from penalties if not why if we work with adsense they can close our account and we don't like that, we have already been closed more than one by the famous blackhatadsense and they block you from all the websites that are related to that account then we have to redirect everything, and there is no desire to waste so much time, to avoid the duplicated topic and to know the percentage of originality of our plagiarized articles in the first instance we can use tools like for example. The most used by us is: Plagiarism Checker – Advantages up to 2 thousand queries or free monthly paragraphs, it has a plugin for WordPress that allows you to analyze the post once it is finished, this saves you the tedious copy paste, or make the text without photos or strongs, or h2. etc ... Highly recommended by MAWEBMASTERS CopyScape - it works by url, so you have to post first, it's pretty good but it doesn't convince me, you have a free version like Premium Duplichecker - In this case you can put the text to analyze give the search button and it will give you the search results, let them know the percentage of originality of the article. Plagiarism, Plagiarism they work just like Duplichecker, so if you have Wordepress now you have to install the Duplichecker plugin Plagiarism Checker and thus save you stories.

How to get rid of copied content

Clearly, search engines don't like copied content because it leads to a poor user experience. So if you have copied content in your place, you have to do everything possible to delete it, do not be afraid that it is not necessary to delete the whole article !!. Here are the top options for dealing with the problem:

Use Canonical Rel for internal duplicate

The label "rel = canonical" was invented exactly to deal with this problem, which is the best solution. It consists of a line of code in the section of the page that indicates to the search engine which version is the right one (the mandatory one). It can also be included in the HTTP header of the page. Example This appears in the header of our website.

We will create 301 redirects

It is recommended when you cannot use the required tag, when moving content from one page to another, or when you set the required domain. Three hundred u redirects are commands included in the .htaccess file in Apache, they can also be created with php. This is an example of 301 from without 3 www to with 3 www, also keep in mind that in the coming days we will talk about it strength of the 301 and I will upload many more examples. RewriteCond %{HTTP_HOST} ^madwebmasters\.com$ [NC] RewriteRule ^(.*)$ http://www.madwebmasters.com/$1 [R=301,L] RewriteRule ^index\.php$ – [L]

Reject access to robots

To avoid finding duplicate pages by search engines, you can use the meta tag noindex robots or from the robots.txt file where we can order google not to index content such as if we put: User-agent: * Disallow: / no-index / Disallow: / author / The url madwebmasters.com/no- indexing / and those of google authors would be overlooked, but if we use robots = "noindex" follow for example we tell you not to index it but to follow the links we have in this URL, it is recommended.

Manage Google Search Console URL factors,

In case the copied content is caused by factors, you can tell Google which ones to ignore via the Crawl Tool, URL Settings in Google Search Console, here is very good especially for IDS, session we talked about earlier, then we could ask google that those websites that carry for example ID = that are not indexed, with the addiction button of a parameter this way would only read the good ones.

Use Schema.org

The search engine can use structured data to solve confusion between duplicate pages. Group pages or rewrite content - Both are prudent solutions when multiple pages on your site display very similar or identical content.

Release when you are away from our website

If your content has been copied by a third party, things are already changing, but don't worry, everything has a solution, I've already told you that we are experts, and we are also experts in getting out of penalties, next options:

Ask them to delete it

Send a message via email or contact form to politely request that they delete it. If there is no way to communicate, use the WHOIS registration email, which you can get through the Whois.net tool even though it may be hidden. Also ask them, if they are not willing to get rid of the content, to at least put a link to the page they copied and a noindex in the head as we talked before, if not the best will be put it on us while the problem lasts. This will help the search engine to identify the original source, the problem is if there are many external duplicates, that you spend the whole day sending emails.

Request deletion from Google

If the communication does not work, you can ask the search engine to remove the infringing page from its results. Therefore, submit a petition based on the American Millennium Copyright Act (DMCA), if you deal with more webmasters is a massive, in these cases the results are very fast , no matter how much we use third-party content, it cannot be used in its entirety, you have to add your personality point and you have to modify it, the third-party content has to be used as a basis for content that keep in mind, starting from 0 is very complicated and the truth is we are not all publishers or journalists, the biggest problem is the automated tools we use for positioning with which we generate content for thousands of sites where we will create our third 2 mainly, but we'll talk about that another day.

It helps to thrive on the detection of plagiarism

Along with the above, you can submit your case to Google for use as an example in improving its algorithms. Please note that it will not be used as a spam or copyright report.

General tips for preventing copied content from appearing

Never use exactly the same title / description on more than one page. The text on each page must be unique to the entire site and the entire web. Include only the required version of the page in your Sitemaps. quote from another site, always and under all circumstances include a backlink to the original When you go to copy an entire page, ask permission first; includes a link to the source, and denies the search engine access to the page via robots (you can also use canonical rel between domains) Having trouble copying content to your site? Here you have me for any queries, feel free to ask but not to spam your website, I am also an expert, hehehehe

Digital Trafficker

Specialist in SEM Campaigns, Facebook Ads as a Digital Traffficker. Experience in creating and managing large budget campaigns. SEO assistant in large projects, with demonstrable positive results. Characterized to always offer a great transparency and clarity in the reporting of results.

Share the article

Google reviews

Share on: