The Duplicate Content Question

HomeTutorialsThe Duplicate Content Question

Please note that affiliate links may be included in some posts.

It raises the hackles of every SEO: "duplicate content".

If you publish content online, you're probably aware that Google frowns upon duplicate content.

According to the search giant, duplicate content refers to "substantive blocks of content within or across domains that either completely match other content or are appreciably similar." (Source)

The consequences of duplicate content could include duplicate content being filtered from search results or even the deindexing of offending sites from Google Search.

I have special cause to be leery of duplicate content because I've been penalized for it in the past.

So my interest was piqued by a recent Lion Zeal podcast where Kyle Roof discussed his intriguing empirical insights into the duplicate content question.

I'll discuss his insights in a bit, but first I'll share my own story of how I was smacked down with a duplicate content penalty several years ago.  

When I first started in internet marketing in 2014, about 6 months in, my fleet of junky affiliate sites was ultimately deindexed for thin and duplicate content.

I learned a painful lesson early on that helped me 'course correct'. 

The Backstory 

When I first started building niche sites in 2014, I was using a product called Kontent Machine, The Best Spinner (article spinning software), and a WordPress plugin called WooZone, to create hundreds of product pages and spun posts.

Kontent Machine

Within the sites, the spun content was very similar and between my site and the Amazon product pages WooZone was cloning, the content was completely identical. Looking back, I really should have known better.

But I was just screwing around on the internet.

I was buying exact match domains and spending agonizing hours spinning words, sentences, paragraphs, and images and then using Kontent Machine to upload it to a site with specific keywords, downloaded from LongTailPro, dynamically inserted into the content. 

I think I came to Google's attention after a particularly ambitious WooZone session where I imported over a hundred "seat cushion" product pages for my seat cushion review website.

That's when the hammer came down and I received a manual penalty for thin or duplicate content on all of my sites. I was deindexed.

SCR logo

Probably the best-looking logo I've ever designed

It was pretty demoralizing. I was making ~$150 a month (peanuts compared to now) at that point and getting a couple hundred visitors a month across all of my sites- nothing too impressive. 

While dispirited, I had enough perspective to realize that it was a valuable learning opportunity. Plus I was too obsessed with digital marketing and niche site building to quit.

I'm not a moralistic white hat SEO- at this point I'm more 'practical' than anything else. I prefer building sustainable, 'no risk' websites, rather than using Private Blog Networks or grayer-hat strategies to rank- though I'm still fascinated learning about those techniques.

What’s The Point?

After my sites were penalized I became super paranoid about duplicate content on my sites. I knew that duplicating a product page was obviously "duplicate content", but what about 7 words of content in a row? 10 words? 13 words?

What exactly is the threshold?

I had no idea.

For example, searching for this exact phrase in Google: "When I first started building niche sites" there are 7 exact matches. And you can see at the bottom of the page that Google has filtered out some other exact matches from the SERPs.

When I first started building niche sites Google Search (Small)

Does this mean that because I use this exact match phrase in the first few paragraphs of my post that the entire post would suffer a ranking penalty or be filtered from the SERPs altogether?

What’s The Duplicate Content Threshold?

No one in the SEO community really knows what threshold Google uses to determine duplicate content because their algorithm isn't public knowledge.

However, Kyle Roof, who appeared on the Lion Zeal podcast, has some unique insight.

His SEO agency, High Voltage SEO, specializes in running statistical tests to empirically analyze some of these confounding questions about Google's ranking algorithm.

One of his insights is that the duplicate content filter is actually "binary". The video below will start at the relevant section where Kyle Roof discusses this.


"So Google talks about unique content. Google just wants unique content, fresh content, you know, we want the most unique thing that's going on the web.

So my thought process as soon as you read that it's like well how much unique is unique? And so we tested that and you may have seen it, have you ever seen a little blue line where they filter out these results because we think they're the same?

So you can get past that filter, the duplicate content filter, with 51% unique content. 

So that's what we tested and figured out your page only needs to be 51% to get past that threshold for Google's uniqueness which is a pretty low bar.

So when Google is talking about like 'hey, we need fresh new things', yeah, they need 51% unique."

The Takeaway

The takeaway is that, as of this writing, October 2017, a page is either considered duplicate or not. This has important implications.

To my mind, I had blindly assumed that Google was assigning qualitative scores to content- analyzing content duplication across and between sites and that your SERP rank was either helped or harmed depending on whether it found 0%, 10%, 15%, 25%, etc. duplication.

Apparently, Google's algorithm when it comes to monitoring duplicate content in the SERPs is a lot less advanced than that. The decision to filter your content from search is as binary as whether it is 51% unique or not. 

For me, this is somewhat relieving. Although I only publish unique content on my various money-making properties, I'm less paranoid now about rewritten or marginally similar content triggering a death-blow from Google. 

It's okay, for example, to block quote a section of text if it enhances your content- as long as that content isn't more than 49% of the total. 

Keep In Mind

This isn't to say that Google won't release an update in the future that changes how it analyzes redundant content in the SERPs. So I wouldn't go wild, for example, producing 51% unique content because it halves your content expenses. 

As well, you're not immune to manual reviews. The SERP filtering is automated- but if you are reproducing lots of exact-match duplicate content on your site, there's still a chance you'll end up in Google's crosshairs. 

I'm not a technical SEO- there are a lot of people (check out Josh Bachynski's YouTube channel) with a more advanced, technical understanding of the search engines. This is just my perspective as a content & affiliate marketer making a living online. 

Let me know your thoughts in the comments section below!

Last Updated on November 15, 2023 by j3teq

Ryan Nelson
Ryan Nelson
​Ryan Nelson is a NYC-based Industrial-Organizational Psychologist and a full-stack online marketer. He created to help people discover and build profitable, content-focused online businesses.



Please enter your comment!
Please enter your name here