How to Increase Your Traffic through the Knowledge Graph
The Beginner's Guide to Semantic Web
- Knowledge Graphs and The Current SEO Landscape
- Google’s Shift to Entity-Based Search
- Examining Wikipedia as an Example of Entity-Focused Content
- How to Align SEO with Google’s Entity-Based Approach?
- The Importance of Semantic SEO for Building Knowledge Graphs
- eCommerce Websites and Knowledge Graphs
- Can You Overuse Structured Data?
- Tips for Finding Entities That Complement a Brand
- Scaling Your Outreach
Peter: Hey, it's Peter, and welcome to the Australian Search Marketing Academy webinar for SEMrush. Today we will be talking about how to increase your traffic through the knowledge graph with Dixon, and our co-host Nik.
Dixon is the CEO of inlinks.net, a tool that uses topics and entities rather than keywords to help optimize sites and content. He's probably best known as the guy who brought Majestic to the market, which has even won awards from the Queen and has personally received multiple awards including SEO Personality of the Year. He was the first-ever winner of a Lifetime Achievement Award at the UK Search Awards. He started SEO in 1999 and has run businesses on both the agency side and the tool side.
Nik is an SEO specialist overseeing digital strategy, data analysis content inside architecture for large enterprises and local businesses. Nik focuses on data-driven results and is known for her strong technical SEO auditing ability, researching user intent, and finding opportunities in competitive industries. Nik's also big in the Australian SEO community, organizing meetups and the new DMU select group for us here in Australia.
Nik: Lovely to be here. If you're ready to start, should we just jump into it?
Knowledge Graphs and The Current SEO Landscape
Dixon: Okay, full screen. We're going to talk about entity based SEO, or approaching SEO from the point of view of strings, not things. This all really goes around the idea of the knowledge graph or the semantic web.
What the knowledge graph is really trying to solve is this idea that the web has got so big that relational databases and the old style of trying to put all the webpages in and reading all the webpages...suddenly returning results that go directly to the web pages is getting a little bit problematic even for a mammoth beast like Google. The idea of a knowledge graph for me anyway is ... it is a massive encyclopedia where it can create a single point of truth.
There can be many pages on “how to tie a bow tie”. But in truth, there's only, one or two ways to tie a bow tie. If the world's information is stored more in an encyclopedia sort of way, in a way that machines can also read quite easily, then it changes the nature of the information. And so I see the knowledge graph as a list of topics, things or entities rather than webpages.
I think for SEOs, it's all about how we get our information interacting with that single point of truth, and how information moves between those two data points. The interesting thing about this idea of a single point of truth is that it makes things much quicker for the search engines but it may lose an awful lot of color in the information that it's got because it's dumbing down information.
One of the pluses, though, of a single point of truth is once they've got their own database, they can then also use other data sources to augment that as well. There's a huge amount of other ways of storing and moving information once they are using their own semi-structured data set, rather than an unstructured dataset like the internet.
For SEO... I think the challenge right now is getting data into the single point of truth. Trying to get your brand into there or try and get your name into there, or trying to get your product into there. Once we figured out how to do that properly, it's getting customers back.
The problem with this whole knowledge base idea or the knowledge graph, Google's knowledge graph, is once they've sucked all the information into this massive, great big single point of truth, what incentive have they got to give the customer back to us?
Google’s Shift to Entity-Based Search
I think that that's going to be the ultimate strategic problem for SEOs as we move through 2021. Entities is, I think, a big part of Google's game. They've been moving towards this for some years, and it's been a philosophical argument or discussion or meritocracy between competing ideas. The old PageRank idea versus the entity idea within Google. Different technologies have been fighting in the SERPs and fighting in the halls of Google.
If you look at who's now running Google, now that Larry and Sergei have moved on and we've got new people in Google, their backgrounds at the head of the thing are much more semantic and entity-based than previously. It would appear the entities that are starting to win out.
You can see the way in which Google is changing just by going to the old Google Trends system. If you go to Google Trends and start typing something in, I typed in internet marketing without a “G”, and now you can see that underneath, you've got different ideas. You've got search terms, but you've also got topics coming back as well. Google Trends is now delivering a lot more information around topics than it used to do before.
As soon as I get to the concept of internet marketing with the “G”, it knows that there's three different ideas all around internet marketing and they're considered topics. They're considered entities and that's the thing ... it really depends on whether you're going to be an entity.
Bill Hart's a good friend of mine and he was bright enough to go and get himself into Freebase before it got bought by Google, and that's all translated into him being an entity. He's identified as an internet marketer. It's not easy though, because type in Dixon and there is an entity, but unfortunately, it's a set of architects in London. The fact that they designed the Royal Opera House really doesn't impress me very much.
What annoys me is that it's them that's the entity for Dixon. Just knowing this stuff doesn't make it easy. It doesn't make me an entity just because I know this stuff.
Peter: This is a source of anxiety for a lot of us who want ourselves to be there with our name there instead. Right?
Dixon: Yeah. I think we have to get over it a little bit. We've either got to become famous enough to deserve an entity or we've got to get over it.
I mean, I haven't got an entity and it's very annoying for me, especially when you're trying to talk about entities, but also it's an exercise in humility I think. If anybody's a Wikipedia genius over there and wants to just set me up as a Wikipedia page without me getting banned, that'd be great. But I suspect it's not going to work.
Nik: I do have a knowledge graph for my band, Dada Ono.
Dixon: This band entry is a really good way. But the band entry is really interesting because Lily Ray has done the same and Jason Barnard has done the same. The SEOs that are in there, a lot of them have got in through their music connections. It's a very good way into the knowledge graph. Or write a book: that's what I've got to do next, I think.
I think you've got to do something a little bit more than webinars probably. But that's all I've done so far. On to question one, “why is the move to the entity based model so pressing for Google?” It's really this idea that every URL has multiple ideas or topics or concepts in the content, but there are much less topics and ideas in the world it turns out than there are pages on the internet.
If you use something like Majestic, for example, it's seen 8.5 trillion URLs. Whereas, if you use InLinks to look at entities into ... InLinks has about 2.5 million entities. Wikipedia, probably the largest entity database on the planet has around about 5.9 million entities in English as of the end of the year.
The difference is, you've got one database with let's say 6 million entities. The other one with 8.5 trillion entities records. Looking at the world with an entity based viewpoint is 1,400,400 times smaller in terms of the number of records. It's a huge saving and scale. It's taking the database from trillions of records to millions or billions of records. This is why I think it's such an important thing for Google to work at.
Secondly, entities do help Google go much deeper with a lot less repetition. You can start to create schema and once they've got a record, then they can enrich the record and find out a lot more information about that individual record.
The third thing is as we've all been told for the last few years for things like voice responses and for AI decisions, once Google or a search engine has a single point of truth, then it can put things into a much more machine-readable form and then they can convert that into text, so you can have the answer whilst you're driving your car and that sort of thing.
Examining Wikipedia as an Example of Entity-Focused Content
Entity-Oriented Search was a book by Krisztian Balog. I found this really interesting book when I was trying to get to grips with a lot of this stuff and I'm going to put him down as a world authority on this stuff because I think he's a lot cleverer than I am.
But one of the things he did was a chapter on Wikipedia's page structure. It's really interesting. This is the search results for the phrase virtual reality on Wikipedia. Everything on the right-hand side is the knowledge graph and Wikipedia is there. The first result in the organic results is a Wikipedia entry.
Wikipedia is prolific of course. is the actual page for virtual reality. If you would like to guess how many words are in the virtual reality article for Wikipedia here, put them down. I can tell you it's more than 2000. Put a number there between 2000 and ... any number above 2000. See how many words.
Okay. So this article has 27,000 words in it. I don't know how it ended up that long, but it's pretty well developed. The interesting thing about every Wikipedia article is that there is an incredible amount of structure to the whole thing. It all starts with an entity and it defines the entity, and the URL and the entity are almost always exactly the same.
Then there's a little info box with an image and stuff if they can. It's something that fits in a nice little featured snippet. Then there's some disambiguation and that's all, even before they've just put the summary in there. And that's on every single article.
Every article is trying to create the same structure. There's all this before you even get to the main content. There's a heck of a lot of content here. When you start looking at the content within there as well on Wikipedia, the thing that strikes me looking at the source code or the DOM is that all of the internal links ... there are so many links on a Wikipedia page.
In fact, they like linking out. Admittedly they are linking to their own pages, but they also are quite happy to do external links as well. There's usually quite a lot of external links and stuff and some of the external links are actually in the main body in this article as well.
We're not paying enough attention to outgoing links. Bill Slawsky, keeps on showing us patents, the Google patents and stuff. But one of those is the reasonable surfer patent, which is ... he took the paid rank idea and said, "A random surfer doesn't click on links equally. They do it in a reasonable manner and they're more likely to click on some links than other links."
I have asked myself how might a computer change the value of each outbound link on the page? What kinds of things can a beast like Google measure? One of the things that the Majestic does now is it divides a webpage into 40 different sections and it looks at the internal links in each of those sections and the external links in each of those sections.
It turns out the different kinds of websites have very, very different patterns. On the left-hand side you've got a Wikipedia article where it is very, very happy to do internal linking and it does do external linking, but most of those external links are near the bottom of the page. Whereas on the right-hand side you've got the other extreme, the Drudge report, and just looking at this, it's a Drudge report appears to be a complete list of external links.
I don't know anything about the Drudge report, but I'm already saying that I think that that's a bit spammy. I think that you see things this way and it suddenly takes context into things. One of the ways in which you can start measuring external links and how it might be important is by segmenting the page.
Analyzing content in sections is better than analyze to the whole of a page of content. As SEOs we tend to look at the whole page and don't think about each paragraph in isolation, which becomes a bit important I think further on.
But here's the thing, here's a big, big thing that changes the game for me. I think Wikipedia's internal outlinks are all based on entities. They're not linking based on the anchor text. We've all, as SEOs been thinking about anchor text.
But I think that the fact that Wikipedia always links to a very defined entity, because all of their articles are by definition entities then they don't have to worry about anchor texts. We've got to get our heads around that. We're so used to analyzing anchor text, that we don't think about the content on the page we're linking to.
How to Align SEO with Google’s Entity-Based Approach?
What does this mean for us as SEOs? I think the first thing this means for us is, we need algorithms that will convert text to entities.
What we're trying to do is we're trying to not analyze keywords anymore. We're trying to analyze paragraphs and ideas in terms of topics because that's what Google is doing. You can go to Google and Google, have this natural language API demo and you can cut and paste text into this natural language API.
That breaks things down into entities and ideas and I think this is interesting. It's another demonstration of how much Google's moving towards an entity based analysis of things. This is trying to break things down, but it's not especially good getting entities right.
It's surprising, you'd think that Google is going to be the best natural language processing system in the world. But the interesting thing is that they miss an awful lot of stuff it seems.
If you take this API, for example, and run it over that 27,000 words of the virtual reality article in Wikipedia and break it down, then the most significant entity that it sees on that content...is funny enough, not virtual reality, but Sega VR, which I find really interesting. I went back to the article and had a look and Sega VR, are mentioned 12 times on the virtual reality page on Wikipedia including links and goodness knows what.
This article has been out for years. Sega seemed to have been able to move the needle or what the concept of virtual reality means. It's taken VR from I don't know, from the scientific world to the gaming and to the arts and entertainment world. I
It also demonstrates just how important Wikipedia editors are and how a Wikipedia article editor can bias the knowledge base. I'm envious of Wikipedia editors, but I'm also very, very suspicious of Wikipedia editors right now.
There's no doubt that we can manipulate the internet with the Open Directory Project if you're an editor there. It's the same thing. The problem is, as soon as something goes into Wikipedia in error, the machine learning tends to then take things away further. If there's an error in Wikipedia, then all the machine learning based off of that trusted data set is going to also be incorrect.
If Nik was in a different band or listed as being in a different band, then all of the knowledge data that was going to come off of that would also be related to a different band. If the trusted data set is wrong, then, so is a lot of the machine learning as a result of that.
As SEOs, I think the future is that we need to think of our websites as a knowledge base, as an entity database. You've got to think of your website as an entities' database and connect things appropriately. All of the content on the site needs to link around and talk around the same sort of subject if you can.
You have headline instances of entities. I think then you need to start working out what you need to talk about on different headline entities and ideas. By using a natural language API (I'm just using the InLinks, not Google's one), we can break down the pages into individual topics and then we can combine that all back into a nice little knowledge graph so we can see what kind of topics we need to talk about to be able to be relevant.
I can sit there and say, "Right, I've talked about lager, I've talked about Germany, I've talked about beer, I've talked about all sorts of bits and pieces. But Hey, I've not talked about hops. Maybe I should also include some information about the hops that are different within German lager."
I can click on the hops, have a look at the Wikipedia article for hops, find the topics related around hops, find related topics and various other bits and pieces. I can use that to modify and inform my content so that I can create better content.
This is still about content. It's just approaching the problem in a different way. Then we can link ideas and linking ideas is much, much better than linking keywords. Now I can have a page on hops that relates to hops, influences of flavor, which influences the beer style or blonde is a color of beer that's orange. We're relating ideas instead of relating keywords. Does that make sense guys?
Peter: Yes, it does. At some point will you also explain how the anchor text will differ?
Dixon: The anchor text becomes much less relevant. Yes, the links may still be within anchor texts, but you could still, you can now still have the anchor text of “click here”, but because of the context in which the landing page is there and the context of the words around the anchor text, the meaning is done in that way.
What it does mean is that you've got to then on your website decide what is the topic that is important for each page. You've got to link that to an entity.
I have a page that's associated with the concept of search engine optimization on Dixonjones.com. And then I can link topics to those pages. Anytime I talk about SEO or search engine optimization or synonyms around that on other pages, I can then link through to the page on SEO.
The Importance of Semantic SEO for Building Knowledge Graphs
Part two we found out that semantic SEO communicates topic details and we're trying to get data from to and from the knowledge graph. Wikipedia suffers heavily from human bias. I think this is going to be really important. That also means that Google's knowledge graph can also suffer from human bias as well.
Create tightly themed content-rich sites and they're going to be easier to build knowledge graphs for because they're going to be seen as experts in their field, much easier than a broad site and that goes to new sites as well. It's harder for new sites, although new sites have huge amounts of content which adds to their ability to connect the dots compared to an eCommerce site, for example, their disadvantage is that they tend to be broad.
Tips and takeaways that I've got are build your site as if it was a miniature knowledge graph. Build content pages around the knowledge graph. Build content pages around the knowledge graph created by the top pages. Find out what is happening in your vertical. It's still a keyword, but the keyword is “put into Google, find out what's coming back and then analyze the concepts and topics that those people are talking about”.
Create proper topic and landing pages on your own site. These should be your main landing pages and your main target pages for SEO. I think that internally linking ideas rather than keywords is better than just playing around with anchor text.
If you want the slides, email email@example.com, with SEMrush in the title and that'll be there. That's my presentation guys..
Nik: That was great information around the anchor text because I think there's been a lot of penalties specifically around like exact match anchor text. If we just take the foot off that gas pedal for a moment, we can acknowledge that it actually doesn't matter. If you just linked to an entity that is just as, if not more, authoritative than you would have if you were to rely on anchor text.
Dixon: It's always good if you can think about it from the user point of view as well as Google’s point of view, because Google doesn't like it if you try and explain something from Google's point of view.
I think if you're writing a research paper, you wouldn't dream of writing a research paper without citing authoritative sources. If you're writing a blog article, you would cite where you're getting your information from. It makes sense to have outgoing links to back-up your article. But if those out-going links don't back up your article, you're not actually helping the user, you're not adding a trust signal to the user.
If you're saying, "The best thing to do is to go and buy that over here." That's not really helping the user. Well, it's helping the user to go and buy something, but it's not adding to your own credibility. Yes, we can move away from the idea of anchor text, but it's also you put a link in there to support the argument that you're making in your own content.
eCommerce Websites and Knowledge Graphs
Peter: Going back specifically where you're talking about entities, the knowledge graph, the way that Google understands through the NLP...what I'm trying to work out here is, not all websites are really interested in publishing information on their site. A lot of websites are really business sites, they're not really in the business of trying to inform Google about their entities.
Dixon: I think if they're a product site and they're an eCommerce site, I think that they have two choices. I suppose we've always had two choices, but choice number one in that case, pay for the traffic. That's Google's, "You want to tell us this is a marketplace, pay for the traffic."
Choice number two is to accept that in order for you to own the visitor first, instead of Google owning the visitor first...you need to create huge amounts of content, knowledge, expertise that is not directly related to a purchase.
The best sales companies that are selling products online that are getting organic traffic also have a huge arm that says, "Well, I can spend £100,000 a month on PPC, or I could spend £50,000 a month on PPC and then £50,000 a month on content." You can buy it a heck of a lot of content for £50,000 a month. But that content has to be educating an audience and getting them to engage with not necessarily your product, but engage with the idea of your project.
Some guys who do that really, really well, best in the industry are SEMrush. We all hang around SEMrush's videos. It's not just the videos it's, they're at events, they're writing content, they're writing articles and yes, it feeds into their product, but a lot of people are looking at these videos without ever having a SEMrush account at all and it still becomes hugely valuable and they have a close affinity with their audience. In doing that, they win the audience over.
Nik: The companies that buy a lot of content often do it badly. Dixon is right though. Good quality rather than volume.
Can You Overuse Structured Data?
Peter: Okay. I've got one that I've been getting asked and it's sort of a “best structured data.” This idea about structured data and probably one of the most powerful functions in structured data is the sameAs function, right? This concept of just keep putting more sameAs into your schema and referencing entities within your schema. I mean, is there a limit? Should we keep doing this?
Dixon: No, of course you shouldn't. I don't think you should. That's exactly what InLinks is doing on this schema; looking at the page, breaking it down into entities and then putting the main entities in as the sameAs schema and it's saying, "This is the same as this Wikipedia entry."
Because we know that Google can understand the Wikipedia entry, therefore, if we say this content is about this, then it makes it very, very clear. But if you'd start to turn that into the equivalent of keyword stuffing...I think that's pretty quickly going to break down because if your website web page is about 50 things, then what you're going to try and do is make all of your 50 pages all about 50 things and you're going to completely cannibalize any concept of your content anyway.
Cannibalization is an important part of this. It's about quality over quantity. If you write two really good pages on how to tie a bow tie because your site is on ties and fashion, then that's a real problem because you have just completely wrecked it. Then you put your schema, sameAs going to two different things, and it becomes a big challenge.
At InLinks...you can only associate a topic with one page, so you can only have one page that's about a bow tie. Assuming that there's an article on Wikipedia about a bow tie, you could have it.
If you are going to just push your sameAs schema in the same way that you push keyword schema, keyword stuffing or meta keywords as it was back in Peter and my days, then you're not going to get very far because you only want to use that sameAs once in your whole website for each one.
Tips for Finding Entities That Complement a Brand
Peter: Thank you for helping us to think of that now. Matthew Hayes says, “how can I find entities that already exist within Google knowledge graph, which compliment/relate to a brand? Is there a way to leverage them to push a brand into the knowledge graph?”
Dixon: In InLinks if you put the brand into InLinks or probably into Google then we'll basically go and have a look at all the things that are coming back and find all the topics associated with that brand. The other thing that you could do is put the content that that brand has got into Google's knowledge, NL API text box and also find the kinds of things that they're talking about.
You can also go to SEMrush of course, and put the brand in there and that will show you the kinds of web pages that are coming back. Also they've got some topic information there as well or you can find the keywords. Once you found the keywords that they're ranking for, you can use as a proxy for topics if you wanted to.
Is there a way to leverage these ideas to push the brand into the knowledge graph? Again, we go back to what are the data sources for populating the knowledge graph in the first place? There's no doubt that Wikipedia here controls all the cards. They are the biggest player.
If you've got a brand that's worthy of being in Wikipedia, then go and get it into Wikipedia. I find it really, really, really hard. I haven't put InLinks into Wikipedia because I'm just scared that somebody’s just going to kick me out. Well, Simon says “you got into Wikipedia in January”. I got stamped and removed in February. Wikipedia editors hate us.
What they're not realizing is that in their hate of us they're also manipulating the knowledge graph as well. That's a double standard that I don't think is acceptable. I think it's a problem for the knowledge graph as well as for us, in that the bias is going to carry on if we can't get around it.
In answer to Matthew's question, yes there is. I think that there's a lot to be said for working around those things and getting yourself into crunchbase and getting yourself into IMDB and stuff.
But writing a good book is a good thing with an ISBN number because you can then use that as a reference within another Wikipedia or a Wiki-data article. Then you can do the same sort of thing with scholarship papers and those kinds of things.
If you start by referencing authority material that's defined as an authority, that way, that is related to the company, then that is a way to generate the actual valid Wikipedia entry, because when you start recreating a Wikipedia entry, it's starting to talk to knowledge articles that are already within the Wiki-data environment.
That's the trick, is getting data sets that are already in the Wiki data environment so that the references within Wikipedia are self-reinforcing. But I'm just learning too.
Scaling Your Outreach
Peter: Jacob Stanley from Studio Hawk, he's got a very straight forward question. This may be related to what we're doing, but it's link building related. How do you scale link outreach? Do you have any approach for that?
Dixon: Yeah, well, my approach is don't. My approach is try and scale influencer outreach and then allow the influences, the ability to say something about you before you do. My game plan for that is timing. If I've got a new product that's about to launch, then I'll tell the influencers first.
The influence is very specific. I'm not talking about Beyonce or people like that. I'm talking about very, very specific people to my industry and give them the information first, tell them the landing page of where it's going to launch, give them exclusive demos if they want them.
I think that if you see that and you try not to be the person that promotes your own stuff, then that's the game. You need to work on scaling the communication and then the network, your network of influences rather than scaling the links per se.
Nik: I think what I was trying to get at is, how does one measure that? Because like one would do that with brand mentions. Correct?
But I guess traditionally and maybe I'm a bit naive in saying that, that's sort of seen more as an affinity metric, whereas backlinks is something that we can measure. We can't necessarily measure the direct influence all the time, but we can definitely see that, "Okay, with a better qualitative backlink with organic traffic and with real organic users that that equates to some benefit to a site."
Dixon: I use a Citation Flow Trust because I know citation flow is pretty damn close to the paid rank methodology. I know Transflow has some extra elements in there, which I like as well. You can plug Majestic, by the way, if you've got a Majestic account, and a SEMrush account, you can fill out the Majestic stuff into the backend of the SEO within SEMrush. They built that functionality here.
Nik: To be able to get a knowledge graph for your brand, it is a long, arduous road, but I guess if one were to lay down a roadmap per se, that would be with optimizing your content, ensuring that technical onsite is well looked after. You've got a community and an audience that know you and that care about you and wants to basically link to you.
I'm trying to look at this instead of like from a backlink perspective, more to like maybe an influencer perspective because essentially you're wanting to just grow your brands and that's what you're wanting to do and that will like eventually get a knowledge graph.
Dixon: I think that we should all start by thinking of our own websites as their own knowledge graphs, not just a website actually, but our own ecosystem as its own knowledge graph. Then trust that becomes so well defined that it starts to get into the main knowledge graph.
But getting into the main knowledge graph is all about being seen as the world authority in a given topic, so that it would be absolutely wrong for anybody to talk about that topic without using the sameAs schema and then linking to you if you see what I mean, metaphorically as much as anything. It's about being the world's best on a given topic, but that topic can be very, very narrow. Once you're in there, then you can expand out relatively quickly.
Nik: I think that's, that's all we've got time for, unfortunately.
Peter: I think it's a great note to end things on right there.
Dixon: Well, Dixonjones.com, you can find me or Dixon_Jones on Twitter or come to inlinks.net, obviously, I'm happy to give them help. Yeah.