Before I get in to it, I have to admit that this is, essentially, a conspiracy theory (though most inference from Google behavior has to be) and the title of this piece could easily have been ‘The possible reason Google is probably not, most of the time, giving advice about a large proportion of reasonably large updates’.
This is my personal Area 51 mystery (though I would say it is probably easier to penetrate the intentions of the US government’s Groom Lake base than those of Google), so I will concede immediately that this theory of mine (which I have been boring my colleagues at Click Consult with for a while) may not be true now – but it will almost certainly be true in the next few years. My theory? No-one knows what they are talking about.
Over the last decade, machine learning has come on in leaps and bounds, and the big tech companies have played a large part in that progress by virtue of throwing money at it. But what has probably been the most influential reason for the rapid progress the industry has experienced has been the adoption of the neural network method of machine learning — the method that played a part in IBMs Watson winning at Jeopardy and Google identifying cats in unlabeled videos (harder than it sounds).
For a long time, it was assumed that AI, robots or your personal sci-fi silicon intelligence of choice would be programmed from the ground up by human, computer scientists, or puckish hacker geniuses that just won’t play by the rules. However, as with most things, what held us back was us. Even the towering intellects attempting to fully pre-program artificial intelligence could not successfully produce more than narrow intelligence – things which are little more than tools.
Neural networks, the computation method loosely based on the networks in animal brains, have allowed us to get out of our own way – enabling machines to learn through rapid iteration. Neural networks have no need for human teachers; they are a hodgepodge of interlinking pathways that do a job – and while we can judge how well they do that job (as, in my conspiracy, would be the role of Google’s ‘Quality Raters’), we are incapable, for the most part, of describing exactly how the job was done. As you can see in the diagram below – we can control the input, we can observe the output, but the actual computation is hidden.
To make an entirely too simple analogy, it is as though we can place the eggs, flour, milk, sugar, and butter in to a box and from that box we can take a cake some moments later – and while we can infer the process that led to the cake from the ingredients, we have no way of knowing for certain.
Here is a link to a video by a favorite YouTuber of mine, CGP Grey, whose video on machine learning is a great primer. However, all of this has been leading up to the concept that is at the heart of this conspiracy theory of mine.
If any of you follow the work of Bill Slawski – a former lawyer turned SEO and the go-to authority on Google patents – you will have watched the proliferation of patents granted to Google over the last five years that deal specifically with mathematical inference from user interactions and automated quality assessment. This includes patents such as the following on using CTR and other user interactions to improve search results.
In general, the subject matter described in this specification can be embodied in a computer-implemented method that includes determining a measure of relevance for a document result within a context of a search query for which the document result is returned, the determining being based on a first number in relation to a second number, the first number corresponding to longer views of the document result, and the second number corresponding to at least shorter views of the document result; and outputting the measure of relevance to a ranking engine for ranking of search results, including the document result, for a new search corresponding to the search query. The subject matter described in this specification can also be embodied in various corresponding computer program products, apparatus and systems.
This information combined with patents outlining the process involved in the creation of ontologies and entity detection to determine expertise and authority (something which may well have played a part in the apparent increase in the weighting of brand in a few of the latest updates), all provide Google with the necessary tools to allow a machine learning algorithm loose on the SERPs.
Whether or not it is the case now, it is inevitable that machine learning will at some point determine the results that people see. And, that the weighting of the potentially thousands of ranking factors it will have at its disposal will vary so wildly and so inconsistently across industries that it will be impossible for anyone to explain the exact methodology for determining rankings. Updates will come so regularly that even Barry Schwartz won’t be able to put out articles fast enough. The black box of the Google algorithm either is or soon will be, a black box even to the engineers working on it.
What This Means for SEO
The long and short of it is in the title – while Google has never been overly talkative around their algorithm updates, there was a time when we would at least get advice such as ‘thin and poor quality content’, or on ‘unnatural link profiles’ and while we may see announcements when a new factor is added to the mix or the weighting is manually altered – as with the shift to mobile or the speed update – the day to day running of the algorithm, and its near continuous refinement, will be increasingly opaque.
As such, the near fanatical obsession that many SEOs have with pinpointing, naming or defining fluctuations in SERPs will be increasingly futile (though that is not the same as analysis – group analysis will always be valuable, just not the race to spot and name a new update). Instead, we are going to need to pursue ever more finely tuned strategies for each individual industry – something which may lead to the further specialization of some agencies and an increasing trend toward large brands developing in-house teams.
The positive to take from this is that, while SEO may become more challenging to do well, it is going to ensure that a good SEO will be in some demand.
What This Means for SEO Techniques
The focus of the last few updates has, with reliable consistency, been pinned to the quality of a site – with the ‘expertise, authority, trust’ a good trinity to look to for the future of the industry. As such, there are things which will need to change, things which will become more or less important, and a host of differences between industries, but there are a few things that will be consistent in search, amongst which are:
1. Links Will Always be Important – But Where They Are From Even More So
The construction or establishment of ontologies is a phrase you will have no doubt come across – or will do with increasing regularity – and links will play a part in this. The definition of an ontology is:
[A] set of concepts and categories in a subject area or domain that shows their properties and the relations between them.
In this regard, links will be the connective tissue between these categories and concepts, and you will need to ensure that the brands you work for or with are established within existing industry ontologies and mold them. For this reason, the importance of where a link is placed will need to be calculated in a new way. While, previously, the authority of a site would be paramount (often calculated by DA), this will become secondary to the relevance of the linking domain.
Let's say you have a car parts auction website with content that is expertly written, beautifully targeted anchor text and the site’s authority is significant — if a link for the site is placed on a site that is predominately about baking, it will be less useful for the brand’s part in the overall industry’s ontology than one from a site half as well written, half as authoritative but which is focused entirely on car parts. The calculation of link relevance will begin to take in the relevance of the entire linking domain, not just the paragraph surrounding the link or the anchor text itself.
2. Your Expertise Will Always Be Important – So Create Expert Entities
Whether you choose to treat the overall brand or individual employees, products or services as the entity will depend on the brand, though I think a combination of all will likely be most successful. But these entities will provide the signals that your site communicates regarding your brand’s expertise and authority; this means that you will need to focus on building relationships with industry publications and the industry at large.
By sharing knowledge and opinion – not just on your own site, but on those important to consumers and other members of the industry – you can build a network of brand or name occurrences that establish you as holding the desired authority to rank well for various search terms. Allow your staff members to build their profile in the industry, ensure your content carries a byline and, in turn, they can pass their authority on to your site.
However, as stated, this is not just applicable to individuals - an entity is simply something about which you can inform Google and offer 'proof' about. As such you can build entities of anything within reason and establish their relationships to your brand and the industry.
3. Structured Data Will Become More Important While Machines are Learning
We know schema is important – while it is not essential at the moment, its inclusion is invariably helpful, and it will become increasingly important (until, at some point, like with prev/next, we realize it is no longer being used).
The reason for this is that, as things stand, this kind of machine-readable information is incredibly helpful. While the algorithm will get smarter, it is going to rely on structured data for some time to help it parse the information we place online. For that reason, it will be important to ensure your brand is keeping up to date with industry appropriate mark-up. While the advice has always been to write for people rather than robots, the truth is that we need to do both.
The ability of the Google algorithm to understand natural language far exceeds what it was capable of even after the roll-out of Hummingbird, but it is still imperfect – so it is important to use 'em' for emphasis and 'strong' for importance, to indicate which pieces of content have been written to be spoken, to indicate which content is an article and which is a product description. While none of these on their own are ranking factors, or likely to improve rankings in isolation, the process of making your content more easily parsed by the algorithm by implementing all of these changes certainly can.
The vocabulary schema has developed for communicating a wide variety of information clearly to machines is already extensive, but it is growing all the time. Every effort should be taken to ensure you and your site are up to date with the latest schema available in the industry in which you operate, as well as keeping up to date with HTML best practice.
While the call may or may not be coming from inside the house, while it may or may not have been an inside job, what we can say for certain is that the Google algorithm either is, or will be at some point in the near future, be virtually incomprehensible except by inference from the job it is doing. Google may be able to steer it – and this is why the QRG should be a manual for digital marketers – but Google will not be able to offer advice about improving your rankings, that will be up to digital marketers who will need to analyze and compare notes to determine the best strategies for brands and their industries.
There are a limited number of techniques that will be viable across the whole gamut of industries, however, and among them are the establishment of ontologies, the creation of entities and the correct implementation of structured data. Everything else will need to be bespoke to at least each industry, if not to each client.
[This piece was adapted from a BrightonSEO talk given on the 12th April, you can find the slides that accompanied it on my Slideshare]