#SEOisAEO: How Google Uses The Knowledge Graph in its AE algorithm
- Two definitions of a Knowledge Graph
- Where does Google get the facts about entities it answers questions about from?
- What are Fraggles in SEO?
- How does Google update information in the Knowledge Graph in an automated fashion?
- The Knowledge Graph is the hub of Answer Engines, Intent, Understanding, and Credibility. Does that flip, flop, or fly?
- How did the Knowledge Graph and language make Voice and Cross-Device search smarter for Google?
- Linked data Knowledge Graph for the future of search, if the web becomes a database, would we still need Google as it's the main gatekeeper?
- Danny Sullivan and Nic Fox mentioned the Topic Layer a few weeks ago and it all sounds terribly groovy
- Are machine learning geeks and Knowledge Graph geeks best chums?
Jason: Welcome to all three of you, welcome to the audience. I think this is going to be an astonishingly good webinar
Two definitions of a Knowledge Graph
"Knowledge Graphs are large networks of entities, their semantic types, properties, and relationships between those entities."
"A Knowledge Graph has entities that are connected by nodes to other information about them. George Washington [is a] US President."
Where does Google get the facts about entities it answers questions about from?
Bill: Google has been building up information in a knowledge graph since at least 2012. They actually started back in 2005 with something called the fact repository. Back then, Andrew Hogue headed a group of about thirteen or so people who were building things where structured data matters, such as Google Maps. Subject, action object Triples "Bill Slawski lives in California" is vital - that ability to take these properties, evaluate what's missing, and continually update them is how Google might manage its knowledge graph. An example would be "Jerry Brown is the Governor of California". After an election, he's no longer the governor of California. Google could look up and see who the new governor of California is, and update its knowledge graph. It could be a continuous thing. Being able to update facts like that is something that's possible, just like Google updates links to web pages. It could theoretically continuously update the Knowledge Graph and continuously learn. They acquired Wavii where they have crawlers read the web and learn from the web, regardless of the topic and add to an information store, a data store, a knowledge graph, in other words, and continuously update those things. Then we have queries that are not the results of what we normally see as regular search results. And if you ask Google a question like, "What movie did Robert Duvall say, 'I love the smell of napalm in the morning?", you get the top two results or two YouTube videos with him saying that. And it is an ordinary search result, not a featured snippet that we normally associate with a knowledge graph answer. That is important. It is what Google is evolving towards. And that leads us to Machine Learning. Go to the Deepmind website. They have machines that learn how to play things like Go and chess. And machines that dream. Machines that simulate human activity and how humans learn. It's kind of interesting to see computer scientist try to learn from humans to build machines.
Jason: Great stuff, Bill. So, Bill mentioned triples. And you've talked about Fraggles. I thought that was a kid's show, where they live in a rock somewhere. Are they related? How are they related?
What are Fraggles in SEO?
Cindy: Fraggles are something that I've just started talking about, and they are named after Fraggle Rock, the muppet show. For SEO purposes, I named this idea Fraggles because they're a mix of a fragment and a handle. A handle is something like a bookmark that used to allow you to click on something and scroll down the page. A fragment is just a piece of a page, an app, or a piece of text that probably occurs in the form of a triple: an answer. The reason I think that they're so important is Google appears to be starting to rank pieces of text on a page independently of each other. So they're not just ranking the page anymore, but they're ranking the bits on the page, the Fraggles. We see this being tested a lot with results from forums and things like that where you have the best answer. Google scrolls you to that answer, even if it's a really, really long page. Importantly, that bookmark (handle), isn't coded into the page. Google put it there. Google is finding where to scroll to. So Google's not crawling looking for bookmarks, it's crawling looking for things like H tags and schema to know when there's a snippet. Google isn’t interested in reading a list of blue links or reading an entire web page. It needs to find the answer to the query or the question and read it right off the bat. Not start at the header and read through the navigation. To be a good experience, it needs to directly read the thing that you're looking for. And that requires Fraggles.
Jason: Oh, that's brilliant. Thank you very much. That's something new that we hadn't heard of before. That's perfect. And that makes me think of what Jono Alderson was saying about now thinking in blocks rather than having web pages. Web pages are a block within a site and within each web page, we have blocks.
Cindy: What Jono talks about with Gutenberg fits completely with what I talk about with Fraggles. Jono and I are on the same page and that is brilliant. But this idea also fits with knowledge graph, because with the way blocks work, is nested. They can have vertical or lateral relationships. You may talk about many things on one page and those things could be related to many things - they need to be categorized according to the knowledge graph rather than just according to the URL. The URL might become meaningless if we talk about apples, bananas, and oranges all on one page, it might be a fruit page, but it also about those specific fruit, and we might need to filter those into specific ideas that are different from just the page.
Jason: Yeah, that whole concept really makes sense to me. I'm incredibly happy you mentioned it. And Bill did just suggest that using structured data and schema, and I assume HTML5, can be a smart move.
Cindy: Yes. I'm never going to tell people not to do that, or at least not in the near future.
Jason: As you say, it's an extra signal, and every extra signal does help. But Google's getting smarter and smarter and is now able to do it more or less on its own.
Cindy: It's the new SEO, man. You are ahead of your time :)
Jason: I am also wondering if Google will pull Fraggles up into the featured snippet spot. I have never seen that yet. Have you seen that?
Cindy: Right now, the featured snippet is not scrolling to Fraggles deep in the page, but I think it probably will soon.
Jason: Yeah, I'm looking forward to that when you can do that. And you also said we're now looking at things being referenced according to their place in the knowledge graph rather than the URL or that place on the web, and that comes into what Aaron Bradley was saying was, "Don't put your money in websites." Which I kind of liked.
Cindy: Yeah, I agree with him, too. Absolutely. Google wants to be the doorway to get between your information and your users.
Jason: So we're going to have to wake up a little bit as an industry. Thank you very much, Cindy, that was wonderful.
How does Google update information in the Knowledge Graph in an automated fashion?
Bill: Google collects information in the knowledge graph and it knows when it's missing information. So if we have an election in California for a new governor, and Google realizes there's suddenly a gap in its knowledge graph - it no longer knows who the present-day governor is since the term has expired from the last one. So it does a search and sees if it can find out who the new governor is. It may look at fresh news sites and start seeing a name and go to the elections page and get verified information. And that takes us onto knowledge-based trust or ways to gauge the correctness of information. Xin Luna Dong worked on Google's knowledge graph and knowledge vault and invented something called knowledge-based trust. That was a metric that involved having about a thousand facts that you run against different websites and see how many of those facts they get correct, and give them scores. And when they looked at top gossip websites like Gawker and found that 14 ranked in the top 50% in the web in terms of page rank, and yet they're all in the bottom 50% in terms of knowledge-based trust. They had lots of information people were interested in, but they weren't too accurate. Importantly those aren't the sites that you didn't necessarily want to update knowledge graph with.
Jason: I really like that idea of knowledge-based trust and the fact that page rank and knowledge-based trust don't correspond. Question about facts. Andrea has suggested it takes twenty or twenty-five factually correct documents to confirm information. If that is true, how much false information or old information can you leave out there when facts?
Bill: One of the areas that we've seen the knowledge graph in action on the web is Google Maps. It's based on structured data, structured information and in that context we talk about NAP consistency: name, address, phone number and how those make a difference in making sure that Google Maps updates. And we want the same type of consistency for the Knowledge Graph.
Jason: We got a question from Arnout: "Can you trigger an update?" I figure we can't trigger an updated to the Knowledge Graph other than the feedback button, but have you got any ideas of how you can actually bully Google into updating itself?
Cindy: Well, if I can jump in here. I don't know exactly how to necessarily trigger it, but smart companies are creating databases and marking them up with schema and allowing Google to crawl the database or ingest it as a feed. We're seeing more and more companies create their own knowledge graph that I think Google is going to take in as special expert knowledge graph. For instance, Spotify has a vast music database, the partner with Google, and their data is structured. As long as Spotify does a good job of keeping their database updated, then Google's probably ingesting that and probably giving it a high confidence and trust for the facts coming out of that particular database. Again, the new SEO.
Jason: I don't think it's only big companies and partners. At my tiny level, I fed the knowledge graph, got myself in the knowledge graph, and I found that now when I feed Google structured data on my own site, it tends to believe me. That might just be me imagining it, but I do get that same impression.
Cindy: Absolutely, yeah.
Bill: Then an extension of a process where you can structure and submit information. FIDO, Financial Industry Business Technology help hold information for financial institutions. So I know some of the people from Google have worked on implementing that and it's done things to help banks actually rank higher and show up in search results. Then there was a process that Google came out within 2014 called Biperpedia which crowdsources queries from Google. Google uses query streams (Ed: series of search queries) to identify topics, then extract other data from the web to update ontologies about different topics on the web.
Cindy: They had Freebase, too, right Bill?
Bill: Freebase was an acquisition from that Google got from Metaweb when they acquired them back in 2011, which was the year before the knowledge graph, but Google's fact repository started in 2005. Then they added to that with the acquisition of Metaweb and Freebase. Now Freebase has been taken over by Wikidata. You can update Wikidata information, and it gets used in the same way that Freebase information was being used. In short, Google’s interest in knowledge goes way back and has evolved considerably.
Jason: PerfectQuickly coming back to the internal knowledge graphs and having your own structured data. That's exactly what WordLift does. I'll give them the plug so that Andrea doesn't have to do it. I had a quick play with it the other day on my own site, and it's really, really, really nice to use. Creating my own little knowledge graph in my own little corner. Cool stuff. So you don't have to be a big enterprise to create your knowledge graph, you just need the right tools. Now, next question, lucky for us, is Andrea's. Now, here's my favorite little diagram which is incredibly simplified in the idea of we have a question with intent, we have Google understanding the web, and then we have the credibility of the different answers allowing Google to rank the answers and choose the best. For me, the knowledge graph is the hub of that entire thing, which is why I'm so keen on it.
The Knowledge Graph is the hub of Answer Engines, Intent, Understanding, and Credibility. Does that flip, flop, or fly?
Jason: Are you going to flip it and tell me there’s another way of looking at this. Or flop it and tell me it is complete rubbish. Or is it going to fly? Is it brilliant, ingenious and nobody's ever thought of that before?
Andrea: I'm a very positive person, so it's going to fly :) In truth, it's way more complicated than this. The knowledge graph has to support a lot of different processes. Obviously, connecting entities with each other requires the support of the knowledge graph. But there is a lot more comes into play. My own experience with playing with data is that, the more we publish data, the more we connect data, then the more we facilitate the work of a search engine in understanding what the facts are. For a search engine to understand who is the right Andrea Volpini who talks about SEO, and differentiate the digital footprints of this Andrea Volpini and that of the swimmer is a very hard job. Google does this, but we can provide a lot of help when we start connecting our own structured data across all the different websites that talk about us. Gennaro is a case in point. When he started to work with us, and he didn't have a knowledge graph panel, and we got him one by publishing information on his blog using WordLift. We leveraged one or two simple properties. One is the Schema.url and the other one is a Schema sameAs. Plus we also use another property that it's called “owl sameAs” that is a more straightforward version of the schema sameAs. By doing this in all the pages of his blog, we were able to let Google understand that Gennaro was the author of different books and contributes to articles on different blogs. After three or four months, this information was gathered into an entity in the Knowledge Graph. In terms of corroboration, I had a chat recently with some of the people in Bing and they said that they usually leverage between twenty and thirty websites before allowing a fact to enter into their knowledge graph. When it comes to understanding facts, think about the diagram of the semantic web. There are two layers. One is called proof and one is called trust. Trust and proof are what is needed for a machine to understand, "Yes, this is true or it's not." Can we falsify that? Of course, we can. I was able to falsify several entities by leveraging a mix of sources: the more the information is out there, repurposed on different outlets, the more it becomes real. Obviously, reality is more complicated than that, but that is the basis.
Jason: Brilliant stuff!
Just look at that screen again. It is a very simple representation of a complex process that (I hope) helps everybody see the overall picture. As you say, Andrea, the reality is much more complex than that and the knowledge graph is integrated into so many different processes within that simple idea that I've put out.
Andrea: There's also a lot of volatility, meaning, of course, the information needs to be confirmed by the use of behaviors. So, you might have ... also you see these in feature snippet, but you might have a feature snippet and then it disappears, and then it comes back again, so there is a lot of things that actually come into play.
How did the Knowledge Graph and language make Voice and Cross-Device search smarter for Google?
Cindy: So, it's a mobile-y question, but it's also a machine learning question. We have a five part series about the relationship between entities and language.
Editor’s note - here they are:
The Entity & Language Series:
Entity-First Indexing with Mobile-First Crawling (1 of 5) https://mobilemoxie.com/blog/entity-first-indexing-mobile-first-crawling-1-of-5/
Frameworks, Translation, Natural Language & APIs (2 of 5) https://mobilemoxie.com/blog/entity-language-series-frameworks-translation-natural-language-apis-2-of-6/
Translation and Language APIs Impact on Search (3 of 5) https://mobilemoxie.com/blog/the-entity-language-series-translation-and-language-apis-impact-on-search-3-of-5/
-Translation and Language APIs Impact on Query Understanding & Entity Understanding (4 of 5) https://mobilemoxie.com/blog/the-entity-language-series-translation-and-language-apis-impact-on-query-understanding-entity-understanding-4-of-5/
Cindy: What I theorize in this series is that Google had a pretty great understanding of the world when the world was written in English because Google was built for English. Lots of this stuff that Google’s algorithm and machine learning stuff are built on is English based. English works for them. They get it. But if you've ever done an international SEO project, you’ll know that the algorithm gets worse and worse based on the density of language speakers. A small language tends to be dealt with very badly. That's not scalable and makes smaller languages easy to spam. So Google looked to draw parallels between language so they can get where they want to be faster. Think of the way a human learns a language. When you're a little kid, you learn that “this is a table”, “this is a chair”. They often occur together; you sit in the chair and eat off of the table. Then when I teach you a new language, you don't have to re-learn that part. You already know you sit in the chair, you eat off a table and that they often go together. But Google was re-learning all of it. Google was crawling the web in separate languages. Remember when all the separate languages had separate algorithms? That changed just before mobile first indexing launched: they consolidated and they made everything very international. John Mueller changed his language, as it were… instead of saying the right language will rank based on the algorithm, he says “we switch in the right version of the page if there's a href lang markup”. They're understanding all of the stuff together at the same time, and they just switch in and out - because that's faster and resource intensive from a machine learning perspective.
Jason: That makes a lot of sense. Yeah, I'm in France, and I do French SEO and it's easier.
Cindy: Yeah, it totally is. Remember we were talking earlier about Fraggles? If we parcel everything out into Fraggles, Google can translate them on the fly. They don't have to translate the entire web, they can do that in the cloud: use voice translation and just read it to you in whatever language you need. And that is key because Google has put so much time and energy into popping out something like 400 million Google voice enabled devices in just six months. Problem is there's a shortage of content, especially in smaller languages. So what do they do? Instead of searching harder for the content, which is resource intensive, they can use their translation APIs to just find the right answer and translate it.
Jason: I think it's the Fraggle idea is absolutely brilliant. I'm really into it and what you just said makes incredible sense.
Andrea: I like the idea too. It's a good framework. In a formal way, what you call Fraggle is a notation. You simply have a piece of text that belongs to a specific meaning or to provide a specific answer. There's currently one thing in Schema which is still being tested and it goes in the direction of your Fraggle that is the speakable item. We expect the personal assistant to read aloud. And you provide a good framework, I like it.
Bill: So, I'm a little disturbed by some of what I'm seeing with machine learning at Google. They’re concentrating on the Pixel. But they’re making the sample size smaller and they're missing things. It’s probably good for mobile devices because you're bringing machine learning to lots of mobile devices and you're making it scalable. But they’re losing some opportunities, I think.
Jason: Great.Last question. The linked open data cloud, you mentioned this to me a few weeks ago. One thousand two hundred twenty-four data sets in the linked open data cloud. Google's knowledge graph suddenly starts looking a bit smaller than it did a few weeks ago.
Linked data Knowledge Graph for the future of search, if the web becomes a database, would we still need Google as it's the main gatekeeper?
Andrea: I've been working a lot on this idea, starting from the original presentation by Tim Berners Lee about the semantic web. I think that the fundamental search problem is that we have to deal with billions of pages. But now these billions of pages are becoming a database, so I really encourage everyone to publish data in an interpretable format and feed the machine outside. Now, remember, that as this data becomes available, the search problem changes, because we are now capable of, for instance, answering the question about the governor in California by running a very simple query on Wikidata. And Wikidata or DVP or Wikipedia are updated in a matter of a few minutes. So answering these questions becomes a question of simply traversing several graphs. We have an infrastructure that provides answers that is more open than Google is right now. Right now with SEO, my focus is Google. We're creating a tool that helps companies get more visibility on Google. But the bigger picture is to help people understand the value of publishing data because there are multiple machines out there who need your data. And the more control you have on this data, the more important it gets. Earlier we discussed - how do we trigger an update on the Google knowledge graph? But does that really matter? It does matter today because of the number of people that are going through Google. But isn’t it more important that I become the primary source about myself? And then this primary source is reflected in different datasets. I think it's more important to start looking at ourselves as data publishers, rather than trying to find a way to trigger an update.
Jason: Yeah, I really get that. Today, we're feeding Google, but tomorrow we might need to be feeding somebody else. I know Martha Van Berkel is terribly keen on Bing, Arnout was talking about Amazon and Alexa being interesting because it ranks with the aim of making a sale… whereas Google aims to bring an answer to any question that somebody that will ask. So get control of your data to be able to give it to Google today and other machines tomorrow
Andrea: There is also another interesting area of research. The interconnection between structured data and blockchain. What if I start kind of claiming the ownership of the triples I publish, and what if we could create a ledger? Right now, the dialogue between content publisher and the search engine is: I publish content, they get to index it, and I get the traffic back. And it works well. I get traffic from Google, and so I'm keen on exposing more information and making it more accessible for them, so they can create a service that in return brings me back profit. That picture is changing. There are significantly less organic opportunities. It would make sense for us to claim the provenance of the triples that we create and make available not only to Google but to other machines. I believe that, with AI coming into play, the control of the provenance of the data that we produce becomes more important. Not only because of scandals like Cambridge Analytica, or the way that Facebook uses the data… but in economic terms. It is important to create an infrastructure for publishing data and claiming provenance for this data.
Jason: Okay, that's a really, really big questions. I would love to hear more, but time is running out.
Danny Sullivan and Nic Fox mentioned the Topic Layer a few weeks ago and it all sounds terribly groovy
The idea is that they've got these activity cards, collections, and a dynamic organization in search results. Great. But to enable this new functionality, they've had to understand how we as users evolve over time within highly granular topics, which I think is important.They now say they have updated the Knowledge Graph with a new layer called the topic layer, which is engineered to deeply understand a topic space and how interest can develop over time as familiarity and expertise grow. I like the idea. But is it just Google talking marketing to try to get a point out there, or is it, as they say, a fundamental transformation in the way search understands interest and longer journeys to help you find information. My initial thought about this was that it hooks into the context cloud. Bill?
Bill: What it reminded me of when I first read about it was a project from Microsoft they did involving what they called "refinding" information. Where they would talk about helping people track search trails through prior search histories to enable them to see things that they had looked for before. Then the search activity cards and the collections seem a lot like Google Plus... it all just seems like a new way for users to interact with Google… Like “people also ask” questions on search results is simply a way to use the Knowledge Graph to keep people on SERPs longer. It's marketing.
Cindy: I think that it's definitely deeper. The topic layer hooks your historical searches into what it knows on the topic and allows you to disambiguate the query. Importantly, it does that cross device. So if I've searched for Monty Python on my phone, my Google Home, or my web-enabled TV, the topic layer knows not only what device you're on, what you've searched for historically, but what the relationships are within this topic and which directions you can and cannot go. It'll try and funnel you down a direction that you can go in the device that you're searching on.The Topic Layer is a mix of the topic as a whole, understanding your relationship to the topic, what your historical preferences are and also what format you can consume on the device (video, audio or text). Historical searches are vital. Remember, to get there, they've got to aggregate and they've got to use machine learning to aggregate all of your searches into one canonical search history across all devices.Add to that, in a home with multiple people who are searching… who is searching? You? Your wife? Your kid? Very, very complex.
Jason: Brilliant stuff, thank you. Andrea, what do you think about the topic layer?
Andrea: It is one way of looking at ontologies. Ontologies are built to create a context and knowledge requires context. As Bill and Cindy say, the Topic Layer creates context for specific searcher intent and potentially enables a more conversational interaction with the search engine. Increasingly, we don't get on the search engine once and then get off. We have a dialogue, because we start looking for one aspect, and then we move on another aspect, and then we move on another aspect again. In that context, we want to use granular ontologies like the topic layer, so that we can restrict the focus of the smart machine and help it provide the right answer.
Jason: Great stuff. I got all over-excited about it, because I figured they had a human curated set of ontologies and that they are now using Machine learning to create incredibly granular ontologies, based on your user history. So, under the hood, they are using machine learning to build very granular new ontologies, based on our own user experience, which I find incredibly excitingAnd that leads us into the last question.
Are machine learning geeks and Knowledge Graph geeks best chums?
Andrea: In the past years, in academia and conferences, there has been two gangs: The AI people and the machine learning / deep learning people. There has always been some level of conflict. But in the last year there have been a lot of great papers that combine the knowledge graph with machine learning. I think the new frontier is the combination of these two.
Cindy: Interestingly, at about the same knowledge graph and the mobile first indexing launched, AdWords did a major update and changed their platform. Maybe the topic layer and knowledge graph met Google's ability to monetize. In short, I'm not sure there's much machine learning on the topic side. They are investing in how to market to you: what is the appropriate ad to send to you. There's probably machine learning to build out the topic layer, but Google is mostly trying to build out knowledge based on individual users and what they want, when they want it. Sell.
Jason: Brilliant stuff. That takes us nicely into the next season - machine learning in AEO. We're going to have some great guests for that. B Bill, Andrea, and Cindy - I thought that was astonishingly interesting. I loved all the stuff you've come up with and that you've pushed our way. That was intelligent, interesting, and fun, which was the idea. Thank you for coming along, and thank you for sharing.