Posted by Jeff_Baker
Grab yourself a cup of coffee (or two) and buckle up, because we’re doing maths today.
Back it on up...
A quick refresher from last time: I pulled data from 50 keyword-targeted articles written on Brafton’s blog between January and June of 2018.
We used a technique of writing these articles published earlier on Moz that generates some seriously awesome results (we’re talking more than doubling our organic traffic in the last six months, but we will get to that in another publication).
We pulled this data again… Only I updated and reran all the data manually, doubling the dataset. No APIs. My brain is Swiss cheese.
We wanted to see how newly written, original content performs over time, and which factors may have impacted that performance.
Why do this the hard way, dude?
“Why not just pull hundreds (or thousands!) of data points from search results to broaden your dataset?”, you might be thinking. It’s been done successfully quite a few times!
Trust me, I was thinking the same thing while weeping tears into my keyboard.
The answer was simple: I wanted to do something different from the massive aggregate studies. I wanted a level of control over as many potentially influential variables as possible.
By using our own data, the study benefited from:
- The same root Domain Authority across all content.
- Similar individual URL link profiles (some laughs on that later).
- Known original publish dates and without reoptimization efforts or tinkering.
- Known original keyword targets for each blog (rather than guessing).
- Known and consistent content depth/quality scores (MarketMuse).
- Similar content writing techniques for targeting specific keywords for each blog.
You will never eliminate the possibility of misinterpreting correlation as causation. But controlling some of the variables can help.
As Rand once said in a Whiteboard Friday, “Correlation does not imply causation (but it sure is a hint).”
What we gained in control, we lost in sample size. A sample size of 96 is much less useful than ten thousand, or a hundred thousand. So look at the data carefully and use discretion when considering the ranking factors you find most likely to be true.
This resource can help gauge the confidence you should put into each Pearson Correlation value. Generally, the stronger the relationship, the smaller sample size needed to be be confident in the results.
So what exactly have you done here?
We have generated hints at what may influence the organic performance of newly created content. No more, and no less. But they are indeed interesting hints and maybe worth further discussion or research.
What have you not done?
We have not published sweeping generalizations about Google’s algorithm. This post should not be read as a definitive guide to Google’s algorithm, nor should you assume that your site will demonstrate the same correlations.
So what should I do with this data?
The best way to read this article, is to observe the potential correlations we observed with our data and consider the possibility of how those correlations may or may not apply to your content and strategy.
I’m hoping that this study takes a new approach to studying individual URLs and stimulates constructive debate and conversation.
Your constructive criticism is welcome, and hopefully pushes these conversations forward!
The stat sheet
So quit jabbering and show me the goods, you say? Alright, let’s start with our stats sheet, formatted like a baseball card, because why not?:
And as always, here is the original data set if you care to reproduce my results.
So now the part you have been waiting for...
1. Time and performance
I started with a question: “Do blogs age like a Macallan 18 served up neat on a warm summer Friday afternoon, or like tepid milk on a hot summer Tuesday?”
Does the time indexed play a role in how a piece of content performs?
Correlation 1: Time and target keyword position
First we will map the target keyword ranking positions against the number of days its corresponding blog has been indexed. Visually, if there is any correlation we will see some sort of negative or positive linear relationship.
There is a clear negative relationship between the two variables, which means the two variables may be related. But we need to go beyond visuals and use the PCC.
The data shows a moderate relationship between how long a blog has been indexed and the positional ranking of the target keyword.
But before getting carried away, we shouldn’t solely trust one statistical method and call it a day. Let’s take a look at things another way: Let’s compare the average age of articles whose target keywords rank in the top ten against the average age of articles whose target keywords rank outside the top ten.
Now a story is starting to become clear: Our newly written content takes a significant amount of time to fully mature.
But for the sake of exhausting this hint, let’s look at the data one final way. We will group the data into buckets of target keyword positions, and days indexed, then apply them to a heatmap.
This should show us a clear visual clustering of how articles perform over time.
This chart, quite literally, paints a picture. According to the data, we shouldn’t expect a new article to realize its full potential until at least 100 days, and likely longer. As a blog post ages, it appears to gain more favorable target keyword positioning.
Correlation 2: Time and total ranking keywords on URL
You’ll find that when you write an article it will (hopefully) rank for the keyword you target. But often times it will also rank for other keywords. Some of these are variants of the target keyword, some are tangentially related, and some are purely random noise.
Instinct will tell you that you want your articles to rank for as many keywords as possible (ideally variants and tangentially related keywords).
Predictably, we have found that the relationship between the number of keywords an article ranks for and its estimated monthly organic traffic (per SEMrush) is strong (.447).
We want all of our articles to do things like this:
We want lots of variants each with significant search volume. But, does an article increase the total number of keywords it ranks for over time? Let’s take a look.
Visually this graph looks a little murky due to the existence of two clear outliers on the far right. We will first run the analysis with the outliers, and again without. With the outliers, we observe the following:
There appears to be a relationship between the two variables, but it isn’t as strong. Let’s see what happens when we remove those two outliers:
Visually, the relationship looks stronger. Let’s look at the PCC:
The relationship appears to be much stronger with the two outliers removed.
But again, let’s look at things another way.
Let’s look at the average age of the top 25% of articles and compare them to the average age of the bottom 25% of articles:
This is exactly why we look at data multiple ways! The top 25% of blog posts with the most ranking keywords have been indexed an average of 149 days, while the bottom 25% have been indexed 74 days — roughly half.
To be fully sure, let’s again cluster the data into a heatmap to observe where performance falls on the time continuum:
We see a very similar pattern as in our previous analysis: a clustering of top-performing blogs starting at around 100 days.
Time and performance assumptions
You still with me? Good, because we are saying something BIG here. In our observation, it takes between 3 and 5 months for new content to perform in organic search. Or at the very least, mature.
To look at this one final way, I’ve created a scatterplot of only the top 25% of highest performing blogs and compared them to their time indexed:
There are 48 data plots on this chart, the blue plots represent the top 25% of articles in terms of strongest target keyword ranking position. The orange plots represent the top 25% of articles with the highest number of keyword rankings on their URL. (These can be, and some are, the same URL.)
Looking at the data a little more closely, we see the following:
90% of the top 25% of highest-performing content took at least 100 days to mature, and only two articles took less than 75 days.
Time and performance conclusion
For those of you just starting a content marketing program, remember that you may not see the full organic potential for your first piece of content until month 3 at the earliest. And, it takes at least a couple months of content production to make a true impact, so you really should wait a minimum of 6 months to look for any sort of results.
In conclusion, we expect new content to take at least 100 days to fully mature.
But wait, some of you may be saying. What about links, buddy? Articles build links over time, too!
It stands to reason that, over time, a blog will gain links (and ranking potential) over time. Links matter, and higher positioned rankings gain links at a faster rate. Thus, we are at risk of misinterpreting correlation for causation if we don’t look at this carefully.
But what none of you know, that I know, is that being the terrible SEO that I am, I had no linking strategy with this campaign.
And I mean zero strategy. The average article generated 1.3 links from .5 linking domains.
The one thing consistent across all the articles was a shocking and embarrassing lack of inbound links. This is demonstrated by an insignificant correlation coefficient of -.022. The same goes for the total number of links per URL, with a correlation coefficient of -.029.
These articles appear to have performed primarily on their content rather than inbound links.
(And they certainly would have performed much better with a strong, or any, linking strategy. Nobody is arguing the value of links here.) But mostly...
Shame on me.
Shame. Shame. Shame.
But on a positive note, we were able to generate a more controlled experiment on the effects of time and blog performance. So, don’t fire me just yet?
Note: It would be interesting to pull link quality metrics into the discussion (for the precious few links we did earn) rather than total volume. However, after a cursory look at the data, nothing stood out as being significant.
3. Word count
Content marketers and SEOs love talking about word count. And for good reason. When we collectively agreed that “quality content” was the key to rankings, it would stand to reason that longer content would be more comprehensive, and thus do a better job of satisfying searcher intent. So let’s test that theory.
Correlation 1: Target keyword position versus total word count
Will longer articles increase the likelihood of ranking for the keyword you are targeting?
Not in our case. To be sure, let’s run a similar analysis as before.
The data shows no impact on rankings based on the length of our articles.
Correlation 2: Total keywords ranking on URL versus word count
One would think that longer content would result in is additional ranking keywords, right? Even by accident, you would think that the more related topics you discuss in an article, the more keywords you will rank for. Let’s see if that’s true:
Not in this case.
Word count, speculative tangent
So how can it be that so many studies demonstrate higher word counts result in more favorable rankings? Some reconciliation is in order, so allow me to speculate on what I think may be happening in these studies.
- Most likely: Measurement techniques. These studies generally look at one factor relative to rankings: average absolute word count based on position. (And, there actually isn’t much of a difference in average word count between position one and ten.)
- Likely: High quality content is longer, by nature. We know that “quality content” is discussed in terms of how well a piece satisfies the intent of the reader. In an ideal scenario, you will create content that fully satisfies everything a searcher would want to know about a given topic. Ideally you own the resource center for the topic, and the searcher does not need to revisit SERPs and weave together answers from multiple sources. By nature, this type of comprehensive content is quite lengthy. Long-form content is arguably a byproduct of creating for quality. Cyrus Shepard does a better job of explaining this likelihood here.
- Less likely: Long-form threshold. The articles we wrote for this study ranged from just under 1,000 words to nearly as high as 4,000 words. One could consider all of these as “long-form content,” and perhaps Google does as well. Perhaps there is a word count threshold that Google uses.
As we are demonstrating in this article, there may be many other factors at play that need to be isolated and tested for correlations in order to get the full picture, such as: time indexed, on-page SEO (to be discussed later), Domain Authority, link profile, and depth/quality of content (also to be discussed later with MarketMuse as a measure). It’s possible that correlation does not imply correlation, and by using word count averages as the single method of measure, we may be painting too broad of a stroke.
This is all speculation. What we can say for certain is that all our content is 900 words and up, and shows no incremental benefit to be had from additional length.
Feel free to disagree with any (or all) of my speculations on my interpretation of the discrepancies of results, but I tend to have the same opinion as Brian Dean with the information available.
At this point, most of you are familiar with MarketMuse. They have created a number of AI-powered tools that help with content planning and optimization.
We use the Content Optimizer tool, which evaluates the top 20 results for any keyword and generates an outline of all the major topics being discussed in SERPs. This helps you create content that is more comprehensive than your competitors, which can lead to better performance in search.
Based on the competitive landscape, the tool will generate a recommended content score (their proprietary algorithm) that you should hit in order to compete with the competing pages ranking in SERPs.
But… if you’re a competitive fellow, what happens if you want to blow the recommended score out of the water? Do higher scores have an impact on rankings? Does it make a difference if your competition has a very low average score?
We pulled every article’s content score, along with MarketMuse’s recommended scores and the average competitor scores, to answer these questions.
Correlation 1: Overall MarketMuse content score
Does a higher overall content score result in better rankings? Let’s take a look:
A perfect zero! We weren’t able to beat the system by racking up points. I also checked to see if a higher absolute score would result in a larger number of keywords ranking on the URL — it doesn’t.
Correlation 2: Beating the recommended score
As mentioned, based on the competitive landscape, MarketMuse will generate a recommended content score. What happens if you blow the recommended score out of the water? Do you get bonus points?
In order to calculate this correlation, we pulled the content score percentage attainment and compared it to the target keyword position. For example, if we scored a 30 of recommended 25, we hit 120% attainment. Let’s see if it matters:
No bonus points for doing extra credit!
Correlation 3: Beating the average competitors’ scores
Okay, if you beat MarketMuse’s recommendations, you don’t get any added benefit, but what if you completely destroy your competitors’ average content scores?
We will calculate this correlation the same way we previously did, with percentage attainment over the average competitor. For example, if we scored a 30 over the average of 10, we hit 300% attainment. Let’s see if that matters:
That didn’t work either! Seems that there are no hacks or shortcuts here.
We know that MarketMuse works, but it seems that there are no additional tricks to this tool.
If you regularly hit the recommended score as we did (average 110% attainment, with 81% of blogs hitting 100% attainment or better) and cover the topics prescribed, you should do well. But don’t fixate on competitor scores or blowing the recommended score out of the water. You may just be wasting your time.
Note: It’s worth noting that we probably would have shown stronger correlations had we intentionally bombed a few MarketMuse scores. Perhaps a test for another day.
5. On-page optimization
Ah, old-school technical SEO. This type of work warms the cockles of a seasoned SEO’s heart. But does it still have a place in our constantly evolving world? Has Google advanced to the point where it doesn’t need technical cues from SEOs to understand what a page is about?
To find out, I have pulled Moz’s on-page optimization score for every article and compared them to the target keywords’ positional rankings:
Let’s take a look at the scatterplot for all the keyword targets.
Now looking at the math:
If you have a keen eye you may have noticed a few strong outliers on the scatterplot. If we remove three of the largest outliers, the correlation goes up to -.435, a strong relationship.
Before we jump to conclusions, let’s look at this data one final way.
Let’s take a look at the percentage of articles with their target keywords ranking 1–10 that also have a 90% on-page score or better. We will compare that number to the percentage of articles ranking outside the top ten that also have a 90% on-page score or better.
If our assumption is correct, we will see a much higher percentage of keywords ranking 1–10 with an on-page score of 90% or better, and a lower number for articles ranking greater than 10.
This is enough of a hint for me. I’m implementing a 90% minimum on-page score from here on out.
Old school SEOs, rejoice!
6. The competition’s average word count
We won’t put this “word count” argument to bed just yet...
Let’s ask ourselves, “Does it matter how long the average content of the top 20 results is?”
Is there a relationship between the length of your content versus the average competitor?
What if your competitors are writing very short form, and you want to beat them with long-form content?
We will measure this the same way as before, with percentage attainment. For example, if the average word count of the top 20 results for “content marketing agency” is 300, and our piece is 450 words, we hit 150% attainment.
Let’s see if you can “out-verbose” your opponents.
Alright, I’ll put word count to bed now, I promise.
7. Keyword density
You’ve made it to the last analysis. Congratulations! How many cups of coffee have you consumed? No judgment; this report was responsible for entire coffee farms being completely decimated by yours truly.
For selfish reasons, I couldn’t resist the temptation to dispel this ancient tactic of “using target keywords” in blog content. You know what I’m talking about: when someone says “This blog doesn’t FEEL optimized... did you use the target keyword enough?”
There are still far too many people that believe that littering target keywords throughout a piece of content will yield results. And misguided SEO agencies, along with certain SEO tools, perpetuate this belief.
Yoast has a tool in WordPress that some digital marketers live and die by. They don’t think that a blog is complete until Yoast shows the magical green light, indicating that the content has satisfied the majority of its SEO recommendations:
Uh oh, keyword density is too low! Let’s see if it that ACTUALLY matters.
Not looking so good, my keyword-stuffing friends! Let’s take a look at the PCC:
Believers would like to see a negative relationship here; as the keyword density goes down, the ranking position decreases, producing a downward sloping line.
What we are looking at is a slightly upward-sloping line, which would indicate losing rankings by keyword stuffing — but fortunately not TOO upward sloping, given the low correlation value.
Okay, so PLEASE let that be the end of “keyword density.” This practice has been disproven in past studies, as referenced by Zyppy. Let’s confidently put this to bed, forever. Please.
Oh, and just for kicks, the Flesch Reading Ease score has no bearing on rankings either (-.03 correlation). Write to a third grade level, or a college level, it doesn’t matter.
TL;DR (I don’t blame you)
What we learned from our data
- Time: It took 100 days or more for an article to fully mature and show its true potential. A content marketing program probably shouldn’t be fully scrutinized until month 5 or 6 at the very earliest.
- Links: Links matter, I’m just terrible at generating them. Shame.
- Word count: It’s not about the length of the content, in absolute terms or relative to the competition. It’s about what is written and how resourceful it is.
- MarketMuse: We have proven that MarketMuse works as it prescribes, but there is no added benefit to breaking records.
- On-page SEO: Our data demonstrates that it still matters. We all still have a job.
- Competitor content length: We weren’t successful at blowing our competitors out of the water with longer content.
- Keyword density: Just stop. Join us in modern times. The water is warm.
In conclusion, some reasonable guidance we agree on is:
Wait at least 100 days to evaluate the performance of your content marketing program, write comprehensive content, and make sure your on-page SEO score is 90%+.
Oh, and build links. Unlike me. Shame.
Now go take a nap.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!