Jordan Choo 0:00 Hello, everyone. Welcome to this week's agencyAutomators Hangout. I'm Jordan, your co-host, and we have Noah here as well. And on today's Hangout, we have Andrea Volpini from WordLift. And we're going to be chatting about using machine learning to conquer knowledge graphs. So I first learned about Andrea through Twitter tweeting a lot about artificial intelligence machine learning. I know he recently was tweeting a lot about Bert. So this is going to be a very, very interesting talk. So Andrea, minute introducing yourself to the audience. Sure.
Andrea Volpini 0:41 I am the co founder and CEO of WordLift and we automate SEO, starting with a WordPress plugin. And I've been doing digital marketing for quite some time now but I'm really focus on bringing together divisional intelligence and search engine optimization.
Jordan Choo 1:03 Awesome. So how did you get started in, you know, in SEO and in AI what what brought you together and
Andrea Volpini 1:11 I'm a longtime web guy, let's say I started with building website back in 1995. I know it sounds like a long time ago. It is a long time ago. So I worked for many years with with content management system. And so I had a company to build content management system. And so when, when the year 2000 was approaching, we had a lot of content to manage because we were working for the Italian parliament, because I'm from Italy. And and so we started to look at semantic web technologies as a way of improving our existing content manager. And, and then I started to do research work on semantic web technologies.
Noah Learner 2:02 When did you start to think of it as semantic web technologies? What year was that for you?
Andrea Volpini 2:07 I think, you know, we got the first research grant in Italy, on semantic web technology in the year 2000. So, Tim Berners-Lee it come with, with this idea of transitioning from pages to data. And, and we thought this is going to be, you know, a thing, because it's going to solve the problem that we have in enabling, you know, millions of users on a site that is, you know, thousands of pages that have to be linked together. But, you know, there is the result there is a lack of proper structure.
Andrea Volpini 2:46 But it didn't happen. It took a long time before you know, this vision of of creating a web of data became a reality and at that point and I sold my previous company, so the CMS company, and because I wanted to purely focus on creating a product on on semantic web technologies that this was in 2011.
Andrea Volpini 3:13 So I started to do research work with European within European projects. And, and one project was very successful in creating a knowledge stack for content management system. And so I started to play with this idea of creating an editor, so a tool that could create out of text data. And that was 2011. And then 2013, I started a company in Salzburg in Austria by doing a spin off of a department of a research institution that was working on you know, Knowledge Graph and link data.
Andrea Volpini 3:55 And so I I got there because I was working in research. I was coming from you know, the startup background web background? They were researchers. And so they wanted to create a startup they say, Why don't you help us? And I say, Yeah, sure. And, and so I co founded and created a company that it's called RedLink. And now it's five years old. And and we do a lot of enterprise work with information extraction and artificial intelligence. But then, with my semantic editor, I started to envision a product. And I started to test this product as schema markup was coming into into the market.
Andrea Volpini 4:38 So 2011, the search engine got together and decided, Okay, we should agree on, you know, vocabulary to describe the web. And I thought, well, that's that's a commercial opportunity for my editor. And so so I started I went on on to my biggest client of that time, and I say, hey, there's going to be something interesting in the you know, search engine world because search engines have agreed on a common vocabulary. And and I have an editor that that can boost SEO, this was 2011 so it didn't really happen.
Andrea Volpini 5:14 So I started to mark our pages back then with the first version of schema.org with my editor, but I was still thinking I have a semantic editor in my hands but then as as as the year progress, all of a sudden SEO community started to get some real interest in what we were doing. And then, I would never thought about SEO because I mean for me I still wasn't a thing. I never thought SEO was a thing because I always had built you know website of for very large organization and never focus on marketing and SEO in general. But then, but then, you know, the market kind of told me Hey, you know these things for SEO. I had the idea was good, but I didn't thought that he was the thing.
Andrea Volpini 6:00 So 2017 to cut it short, we incorporated a company and we got founded by Woorank with a Belgian leader to for SEO auditing. And, and so and so we we got started in, you know, in this process of creating a product for out donating SEO with knowledge, graphs and semantic technologies.
Noah Learner 6:24 Wow, that's I asked journey. So in in 2011, you were trying to push semantic data onto websites. Yeah, envisioning consuming that data, if not SEO.
Andrea Volpini 6:40 Wanting the vision of the Semantic Web is is broader than then SEO. It's about you know, creating a web of meanings, you know, are there to not adding, you know, search engine crawling the data crawling the pages to extract the data, but having the data made available to smart agents. So application that can, you know, traverse this information and, and provide answer without, you know, crawling the pages. So the which is, in a way it's happening, not the way that we expect it to happen. But, but it's happening. Of course, I realized when I was doing the pitch to my client that the only angle was SEO.
Noah Learner 7:22 Right? Because I'm sitting here thinking like, I get the I, like we, we entered the world of struct. I entered the world of structured data in 2015. So I'm definitely late to the game. But I also feel like back then it felt cutting edge. Like none of my competition was using JSON LD and any their markup when I started doing it, which doesn't mean I was ahead of the curve, but it's but to hear you thinking about Semantic Web in 2000 kind of blew my mind. I was like, Oh my god, let's use let's use when we're talking about Semantic Web I was thinking about, let's not design with HTML tables. Like that's how we all talked about it in 2005, six and seven, right? It was like, hey, let's use, let's use stylesheets. Let's use the external stylesheets. That
Noah Learner 8:16 Semantic Web,
Andrea Volpini 8:17 there is I mean, if you if you work with with content management system, and you build content management system, you know that then there is a rendering layer, but then there is, you know, a content model in the back that makes the thing work, and how could we make this content model available to the outside world? It was through data. So I also went through a phase which was the open data phase. So I started with, you know, publishing all the data that we had in the CMS open data because we were working with government. So it was kind of you know, the thing to do.
Andrea Volpini 8:52 But then I also was able to launch an open data partner portal. I think in Boy, there was probably something like 2015 or 2016 for for Forbes for the second largest utility company in Italy. So and there was the first corporate or multinational organization that was publishing open data, because I was convinced that in order to make marketing you should mark up the data and not only the web pages and that's that's exactly what what's happening now. When when, you know, you're pushing, let's say old all the retail store to to Google My Business or you know, all your products as an XML feed, and so on and so forth. So why don't we kind of get to the source, make make it available so that others can build apps and these would you know, kind of market our products and services?
Noah Learner 9:53 That's amazing.
Jordan Choo 9:55 Yeah.
Andrea Volpini 9:59 So then Then, of course, now we call it artificial intelligence, which is a kind of a broad definition that that has inside a lot of different components. But, but back then it was it was semantic technologies. And it still is as far as I'm concerned.
Jordan Choo 10:17 So let's, let's actually talk about Semantic Web Knowledge Graph and SEO, then I think this plays well into our first question. So for those of, you know, for those of us who don't know, what exactly is the knowledge graph, what exactly is the Semantic Web?
Andrea Volpini 10:33 Right, um,
Andrea Volpini 10:34 if you can distill it into layman's terms,
Andrea Volpini 10:38 right. So so a knowledge graph is it's it's really a data bank, right? So it's, it's, it's a place where, where, where you can bring your content in a way that that machine understand it. So it's a programmatic way to to model whatever knowledge domain you're in. So so if you if you are into, let's say cars, you want to build a knowledge graph that it's capable of making, you know, your, your, your expertise on cars, machine friendly. And in a way, it's foundational when you need to start creating whatever AI driven product or service. So you need to start from from the data you have in order to to model the domain, where you have your product or your expertise in order to allow others to create application on top of it, or, of course to allow yourself to create application on top of it. So as SEO is one of these actually can build on top of a knowledge graph.
Jordan Choo 12:01 Because I'm guessing, like in our case, then Google is kind of crawling your own knowledge graph, understanding it, and then using that data to rank your site accordingly. Which Ryan?
Andrea Volpini 12:13 Yeah, Google has its own knowledge graph. And its own knowledge graph. It's It's It's build of facts. So there are billions of facts inside Google Knowledge Graph. But where is this data coming from? Because of course, as a marketer, I want to market my data. And I want to understand what are the sources that you know, the application of today? So one of the sources is, of course, after the acquisition of Freebase was was, of course, this large repository of comments and data sets that today was created by by meta web. And so, you know, Freebase is a kind of a Google is it's a knowledge graph that then it's also I'm getting information from Wikipedia, and it's getting information from both your website as well. So my my plan with WordLift is to democrat Knowledge Graph technologies in order to let everyone create his own knowledge graph. And the reason I want to do this is that until we eat, we want to have full control on the data, we cannot really, you know, kind of play a role in the business, right? Because if you're all the data, how can I create a business model on top of it? And so, the plan with with WordLift was really in the beginning. And this is why I didn't think about SEO was was that I wanted to create, you know, a technology to let everyone from a CMS like WordPress, create a knowledge graph, like the one that Google was building and I wanted to do this because Also, I want to express my opinion, and I want you to check it, right because, you know, fake news were going to happen. And if I if I am structuring my information, then it gets easy for you to validate if what I'm saying is right or wrong. So there is a lot of elements in, you know, the importance of creating a knowledge graph, but the most important element was in the end, helping Google, you know, improve the service by crawling not only the structured data in the page, but the data behind the structured data. Interesting,
Noah Learner 14:36 interesting. What do you mean by the data behind the structured data?
Andrea Volpini 14:40 Right, what do I mean? Let me let me share, can I share my screen with that?
Jordan Choo 14:44 Let me stop sharing right here, it's all yours.
Andrea Volpini 14:48 So basically, I mean, in order to to create a knowledge graph from content, if we jump on slide 28, you see that, you know, we can start with something unstructured text and then we can apply natural language processing and extract features out of a text.
Andrea Volpini 15:10 And we do these in different ways of course, and the research you know, all kinds of features that can be extracted from from a text, but most of the time, you know, we end up in a structuring entity and, and in recognizing entity, so these are concepts that that that machine can understand. So with these concepts, then we start building a knowledge graph that that is derived from the content that we have on, you know, in our, you know, website or CMS or, you know, PDFs or whatever textual format you you want to use. So, if you jump to the slide, just below, then we see that this knowledge graph, it's interlinked with other graphs. And that's the beauty of these web of data that it's you know, it's kind of the web, but it's made of data points. And, and so when we build the knowledge graph, we don't build like just a database, but we we build the metric of data points that are connected one to another. And everyone can express his own point of view, while we are all talking about the same thing, which kind of helps if you want to start to, you know, verify a statement, if you want to validate a statement, if you want to kind of interference or, you know, get new data from from these links.
Andrea Volpini 16:39 And, and that's a little bit of, you know, kind of the stack deck that we have built, that can be used for many different, you know, use cases. You know, you can build your own semantic search engine, you can, you know, label of your content, you can you know, classify all the information you have available. But but but then we kind of specialize them, you know, doing these for SEO.
Andrea Volpini 17:09 And what does this mean is that we we, we help Google and Bing and other semantic search engines out there to get this data by providing a pointer inside the JSON LD that we published on the web page that allows the crawler to see the entity that it's referenced within the text. So when a crawler gets on your webpage, you will read the JSON or the but we are structure link data, he finds a unique URL that points back to a knowledge graph. And so all of a sudden, we started to see that that search engine like Google, where not only crawling the JSON-LD but also there data graph that we had in the back, right? And so we started to think, okay, hey, what can we do to market this data even more. And so if you move to the next slide. So here we see basically our works, right?
Andrea Volpini 18:16 So we have a text, and the text is actually transforming to an entity. And there is a unique ID in this case, this is my unique ID on the Google Knowledge Graph. And and out of these, you know, unique ID if the entity and the entity details, you know, the story about a person that then can be used by Google with, you know, the knowledge card or dance or that that that Google can provide , but but let's move to the next slide. What can we do to help Google you know, kind of see more things about these entity? So this is the entity about myself on wiki data. So this is another public knowledge graph that we know Google like Facebook and Bing and you know a lot of other organization are using in order to create more data and bring, you know, better data within their graph.
Andrea Volpini 19:09 So much like in Wikipedia, we can contribute to to these graphs and here we can see a representation of the entity for myself, that states that I am the CEO of the organization WordLift that I born in in in Italy, and male and so on and so forth. But the interesting part is that there is an accent match property, there is a property that links back to another entity. And that's the entity on data that were left on y'all, which is the knowledge graph that we create.
Andrea Volpini 19:44 Boom. So we have, you know, data inside Google knowledge graph that it's link. And it's using its fat also with data from wiki data, but within wiki data and we are able to create links to our own knowledge graph. Right.
Andrea Volpini 20:00 So if you go to the next slide, we see that we can now we have also contributed on a change a new property in the wikidata ontology, which is called the WordLift ID. And the word lift ID allows us to create backlinks from wiki data to our knowledge graph. And of course, you know, you can create your own knowledge graph your own custom domain and everything, of course.
Noah Learner 20:29 Yeah, I got a quick question. So I've been fascinated by the same as definition inside structured data, right and asking people for a while and I neglected to test it because I'm a dummy. But when you look through the documentation in with in a number of places you'll typically see same as for entities linking to like Yelp or Facebook or Twitter. I noticed in your slide Deck when I read through it, that you used it the way that I had postulated being able to use it, where you could say that your entity is the same as cyberandy same word lifts same as, etc. Have you tested that can can use any property, you want to link all the different places where you are in the web as an entity back to you?
Andrea Volpini 21:23 Yeah, I mean, there are there are, you know, different ways in which, you know, these links can be can be used. So, so, is that
Noah Learner 21:31 as a dumb question, I've, you know, this isn't my
Andrea Volpini 21:37 It's pretty essential in the SEO that that we do. Yeah. And there are there are, let's say two layers. So one layer is kind of the loose definition that schema.org provide of a same as link. Yeah, the loose definition is that same as can point to a URL, so a web page.
Andrea Volpini 22:05 And that is super powerful, because it allows a search engine to get more triples about that person about that entity. So the more we can confirm that, you know, this web page is about that entity, the more we provide, you know, kind of, you know, support to, you know, getting the data from that page about that entity. And I see these happening, right.
Andrea Volpini 22:33 So if I can create a lot of, you know, links, all together for the same entity and information across these different web pages is consistent. It would be easy to add or to see new statement getting into, you know, Bing or Google.
Andrea Volpini 22:52 Now there is another layer, that is the data layer where, you know, definition are more strict. So you don't want to point you know, an entity to a web page, but you want to point an entity to another entity, which is, you know, kind of a more strict version of of the same as that we see in schema.
Andrea Volpini 23:14 And so in the, in the ontology of wikidata, these can be called exact match. Or now, you know, you can use the WordLift ID if you're publishing data using what are left, or we were using our graph a property from the owl ontology, which is actually our same as now these names are like slightly different because you can compute data with this link, whether if you kind of provide a machine with a link to a web page, you know, the machine still will need to kind of, you know, either get the structured data from this web page or try to understand what the web page says, when when I'm dealing with an entity on a graph and connecting this entity to the, with the equivalent entity on another graph, I can make a federated query. And I can use the data for that entity on these two graphs to let's say, validate a statement or to enrich a statement, right? So these are the two layers, one layer, you are connecting an entity with web pages. And on another layer, you're connecting one entity with another entity, and then you can compute the data that comes from these two entities.
Unknown Speaker 24:29 Wow. And we're
Noah Learner 24:31 going my mind. I don't know how Jordan feels about it.
Jordan Choo 24:34 Yeah, I'm no I'm, I'm speechless right now. Like, so.
Noah Learner 24:36 This is not stuff that we talked about with other SEOs all the time. So, which, which layer? Have you been able to tease out which layer drives, you know, ranking lifts more if you're connecting via the stricter data version where you're connecting entities, entities with other entities or is it the Collecting entities with web pages,
Andrea Volpini 25:02 I think that there are so many different use cases that it's hard to, to kind of, you know, tell you what strings should be pulled. So, and it's more complicated than that ranking, it's not, you know, impacted by a same as link, unless, you know, there are some kind of cases where, where, where, you know, we are providing the information that that allows, you know, Google to better understand the query.
Andrea Volpini 25:35 So, let me, you know, one of the example that I usually give is, you know, the syntetic query. So, if you go if you go on the deck, there is a, there is a slide about that one don't know that one down that that talks about that one, up, up, up, like 37 so if you go on on slide 36, or Yeah, or this one, yeah. 36 Here, you basically see that, you know, I'm making a query that is, you know, the seal were left. So the query that I am, you know, using as a user, the CEO of WordLift and and the results that we see on the on the previous slide on slide 36 are indeed a combination of the results for SEO WordLift with the results of unraveled meaning. So, this is a case where the search engine is creating a syntetic query. So the search engine is given the query of the user is creating a query that he thinks he can be, you know, beneficial for creating a better result page. So rather than just displaying everything about the CEO of Wordlift, it also adds pages from the query Andrea Volpini. And, and, and this is happening because there's an entity behind that. And there is a relationship between the entity and Andrea Volpini, the entity WordLift in Google's Knowledge Graph and in wiki data and in every, you know, WordLift created knowledge graph that confirms that Andrea Volpini is the CEO of WordLift. So in the independent that I discovered because of peace lasky we could see that structured data is actually user can be used for creating these syntetic queries much like in this case. So a user looks for Andrea Volpini, or for CEO, WordLift and Google is capable of interpreting this query and getting the results also about untraveled PD, creating a better experience for the end user but also helping us be found, right.
Noah Learner 27:53 Okay, so I have seen syntetic query spelled a couple different ways is that is the proper way to spell it. In the in the main part on the left side of the slide. I thought I had seen it synthetic, like a fake query.
Andrea Volpini 28:08 No, that is correct, right? No, no syntetic is correct. I means like that it's a machine generated it. It's part of these query augmentation process that Google now as dead that allows, you know, Google to understand the query and to get results better than using the query that the user has created.
Noah Learner 28:32 Can you feed syntetic queries to the to your knowledge graph?
Andrea Volpini 28:41 Well, I mean, potentially, we, you know, if we have to use the Knowledge Graph for creating a semantic search engine, that's that's what we would do. So we would kind of, you know, expand the entities by by kind of taking a loop and Look at you know, the entities around in order to create, you know, more synonyms for it for that entity which is, you know, in this case what what it's happening so yeah, potentially Yes. I mean it's not it's not what we do because we're not building with WordLift semantic search engine yet but but we just created
Noah Learner 29:19 Yet Can you show us how your tool works? Is that is that okay Jordan that we jump in?
Jordan Choo 29:28 Yeah, of course,
Andrea Volpini 29:30 it would be beautiful if I could share the screen
Jordan Choo 29:35 there anywhere on the deck, na?
Unknown Speaker 29:41 And if not, maybe we can.
Noah Learner 29:44 We can log into a site or something like that you can help us log into something and we can check out
Andrea Volpini 30:18 So, I have, I have a text and and, and with NLP I'm detecting entities over here. And and for each one of these entities you know I have some information that were like this either pulled from the user vocabulary or from from wiki data I'm not going to use you know, these images in order to make my my content richer. I can also decide that you know, I want to for instance, so I can move the images into the into the editor, but I can also decide to create links, or or just use the entity for annotating the content.
Andrea Volpini 31:00 So it's like a like a tagging system really. But, but behind these labels, there are not tags, but they are real world entities that are interlinked with other entities in the Google Knowledge Graph in other in other publicly available data sets.
Andrea Volpini 31:17 So when it when I do this, and I'm going to, you know, simply publish my article that's I've done a radio lot for my SEO because I have kind of disambiguated my text and created a lot of structured link data. So we can have a look at it using the Google structured data testing tool. And in here, we'll see you know, the data that WordLift automatically creates after the user s enriched the the article we're down to this.
Noah Learner 31:17 So it's, it's generating the data before the DOM is executed. Right.
Andrea Volpini 31:58 Running well. These one eight Depends on on the way that we do it. But usually we we tend to create the JSON-LD asynchronously so it doesn't affect the payload of the page. Okay. But but on some large websites, then everything gets cached. So we do a synchronously. So there are different ways of doing it. But we want to make sure of course, that all this data doesn't kind of slow down the rendering of the page.
Andrea Volpini 32:22 Now, here, you see already that there is a there is a major difference compared to the structured data that we usually see is that we have a unique ID, which is actually a URI. So this is a unique identifier in the web of data. And if I go there, with with a machine, I can get an RDF representation of this article. But so the crawler can can jump here, and it can be you know, the JSON-LD that that were left that's created, but he can also jump, you know, beyond every entity and get more data about it.
Andrea Volpini 32:59 So it's Like, if we would provide kind of a sitemap of the meanings of the site within the link data infrastructure. So, the structured data is also, you know, where this is also telling, you know, that, you know, the main entity there about property is, you know, Tulsi Gabbard, the politician, blah, blah, blah, and, and then it's also, you know, kind of using these attributes, which is called mentions, that says, hey, desease, also mentioning, you know, United States custom unit, and George W. Bush.
Andrea Volpini 33:39 And you can see that all these data, not only as, of course, the description of it, the unique ID that points back to the knowledge graph that we are creating, but it's also it's also as all the links that you know, with the same as that not only go to the web page, just be we could also create the links to, you know, a web page, but it also has links to data sets. So you can see that here where we've automatically as interlinked entity against the equivalent entity on dbpedia and Freebase and Yago. And so we have a different data sets that we can configure in order to improve their you know, automatic interlinking. And, and there is a lot of these, but
Noah Learner 34:26 Can you show us one of those? Can you show us the? Oh, yeah. Okay.
Andrea Volpini 34:33 So that's, that's the, that's the the unique ID when see from the browser. So this is a there is something happening in the back end of these people store that it's called content negotiation. So I get to the browser and I get an HTML, but really, as a machine, I would get, you know, RDF, or title or entry or JSON-LD, much like I do in wiki data or dbpedia. So it's a data set.
Noah Learner 35:00 Can we see it in JSON-LD, just because I'm super familiar with that structure?
Andrea Volpini 35:04 No? Sure. So this is something that let's see if I can get it this way.
Andrea Volpini 35:17 So that's, that's a little bit of the representation of the of the entity in the JSON-LD from the graph, which is, of course, you know, this is just related to the to the Democratic Party. But this is on the graph. But then we saw that if I go here, you know, the Democratic Party, you know, I have pretty much the same property, but not necessarily the same, right? So we can play with these two level one is on site and the other one is off site.
Andrea Volpini 35:47 And so if I go and see the Democratic Party's over here, and you know, I have, you know, these information which is encapsulated within the web page itself, but it's also available as link data within, you know, off site on our triple store. So we have a lot of flexibility.
Jordan Choo 36:09 How do you determine whether the entity should be referenced inside or off site, then
Andrea Volpini 36:15 you know it down to these always reference on the Knowledge Graph. And the JSON always points to the Knowledge Graph. So, here, you could, you know, have an ID, which is, you know, just non existent, but in our case, the ID it's actually a link data resource. So, here you could create, you know, whatever string but rather than doing that we use it as as linked data because JSON, Link data uses, you know, the same link data infrastructure, so we provide a pointer back to the graph and then in the graph, we can have the same or different information depending on the way that we can figure things out.
Noah Learner 36:58 You're blowing both of our minds. So have you seen any large scale e commerce websites successfully utilize this technology? And have you seen them crushing the competition because of it.
Andrea Volpini 37:12 Um, I'm not too familiar personally with e commerce because we put a lot of work in the news and media sector
Noah Learner 37:21 That's what I'm gathering
Andrea Volpini 37:23 and bottle sector,
Noah Learner 37:25 we have to kind of
Andrea Volpini 37:28 start working more on scale with e commerce, we have a couple of projects that we are working on. One is particularly interesting, we're working on a framework for building ecommerce website and they they they are, you know, kind of integrating well lift within their stack, but it's still kind of under development. So I cannot give you much information about results in that specific area. But, but of course, we have pretty impressive results on on news and media and in travel,
Noah Learner 38:02 so do you consume, like, I'm seeing all of these unique IDs are reference your knowledge graph? So how are you? Are you consuming all the open data that you can get in order to build yours? Or is it sort of building it on the fly? Understand how can you tease that out a little bit?
Andrea Volpini 38:24 Yeah, when when, when we create an article
Andrea Volpini 38:29 you know, I am detecting an entity like Washington. Yeah, I can create and in this case, for instance, is not it's not the right Washington so the disambiguation is not properly working here. So I, I would need to change these and pointed to the place. But when I select the entity in and I go and save, then these automatically it's going to create the entity into the knowledge graph.
Andrea Volpini 38:56 So look, here I am, you know, getting it wrong because the New York Times Actually, you know, so that the NLP is not properly find it but so I'm going to go like this. Yeah. And then I want to create these entity and then I'm going to ask WordLift to think it again? And and then it's gonna you know give me the list and then I say yeah, that's I want to talk about the paper and then the entity has been created. So the article now is unrotated as the Knowledge Graph but the entity is also within the Knowledge Graph encapsulated.
Jordan Choo 39:31 So how do you create entities from scratch then,
Andrea Volpini 39:35 Like this, I just created now the entity for for the New York Times, but I can also create an entity for something that doesn't exist, right. So so I don't know like, like these one. Yeah, I created and and i will look first on the different data sets that we have in order to see if you know If there is something already, and if there isn't, I'm just going to create it from scratch. And then I'm going to say, Okay, this is probably a person. So it's a who, and then it's a, you know, this is the entity type. And then I'm going to say, you know, yada, yada, yada, whatever. And I'm just going to create these. And now, you know, I'm bringing these into the link of Data Cloud.
Jordan Choo 40:24 Well, so then that saved to the word lift your knowledge graph, that's what this article is published, the JSON-LD will have the entity that you just created with a reference in the Knowledge Graph.
Andrea Volpini 40:36 Then as a user, I can come back on the vocabulary where I finally will the entities and I can, you know, kind of get back to the entities that I've created and curate them that I can decide I want to index these or I don't want to index these, but let's move to to kind of real world use case for us. Of course it gets a lot simpler to see a website that made it so easy. Wherever if rather than our test site. So let's go on the next web.com.
Andrea Volpini 41:09 So we have articles here. This is a large site. So let me get an article that has been updated already, because that's fairly new. So I'm just opening aside. So what you see here are not tags, though they look like dogs, but they are actually entities. So that that work that's created for structuring the content within this section. So this is an entity page, which looks pretty much like a tag page, but he has its own equivalent data point in the data the nextweb.com. And if we go and look at at the JSON-LD with with with our
Andrea Volpini 42:10 sniffer was that
Noah Learner 42:11 in what open link structured data sniffer? There's a tool to share.
Andrea Volpini 42:17 Yeah, yeah, that's, that's, that's great tool that, that allows you to to look at the data behind the page. And and sorry about that. Yeah. So you see that here, you know, this is not just, you know, a page, but but it's an entity. And that's the unique URI for that entity within, you know, the knowledge graph of the next web.
Noah Learner 42:51 So this is kind of this is amazing. So let's say someone doesn't have WordLift, and they and they want To take, what are the takeaways that someone who's not on the WordPress platform? Like how could someone get started using this, this technology, how long it takes to build this stack.
Andrea Volpini 43:16 So the stack is not is not easy to build, but but it's all open. It's all meaning that it's all infrastructure that it follows the WCC recommendation for creating a link data platform. And, you know, an RDF based Knowledge Graph.
Andrea Volpini 43:39 And of course, he uses a content analysis API that can you know, process content and do named entity recognition and identity disambiguation. And, and, and, of course, a lot of other other features that that can be built.
Andrea Volpini 44:18 But let me show you a little bit of why why we want to use entities in the first place because of course, so we saw that, okay, we can, of course, I'm not a the article created JSON and the get the images create these kind of content hubs. So these, you know, entity pages acting as content hops, but but it can also start, you know, providing also widgets that that allows you to find find out more about the site. So I can, you know, leverage on the graph. And, and, and create you know, navigational weidget that helps the user stay longer on the page. So
Noah Learner 45:06 Oh, so it'll build a word. Okay.
Andrea Volpini 45:09 So. So here you see that I have added a widget on the page that were which is created by the relationship between the concept within my graph.
Jordan Choo 45:28 Yeah. Wow. And this is all the entities that are found in this article, then.
Andrea Volpini 45:35 That's correct. But so there is a lot of these use cases and widget that we can play with that all of what they do is, you know, helping the user understand the context of the page. So that's an article. So these are another widget where we have you know, the list of entities that are related to go, but for instance, here, I go to long form article then we create like these hide that helps you to understand what do we need my own form articles.
Jordan Choo 46:08 Wow and then that's pulled from the word lift knowledge graph that you're wanting
Andrea Volpini 46:14 and then I also have the page that brings me you know to the definition of it and then you know I can start you know, linking and moving from from from one concept to another, using like
Jordan Choo 46:32 Jees.
Noah Learner 46:33 Jordan you look like someone just punched you in the kidneys.
Jordan Choo 46:39 I feel I feel so overwhelmed in the best way possible right now
Andrea Volpini 46:45 against Oh, be silent.
Andrea Volpini 47:05 We really is that just with the plugin, we're bringing these widgets here that you see, which is, you know, the content recommendation, we're bringing it easy cloud version. But we don't have yet these other tools. But these are also easy to build, because they're built on the JSON-LD. So the cloud will bring the JSON-LD and then and then these can be done on the front end. So of course there is it's very easy to install the word with cloud but we were still working on on all the navigation components.
Andrea Volpini 47:39 But let me give you another another use case of you know why you need to create your graph and why it helps SEO because you see now that I am connecting articles in kind of an effective way. And so user stays longer usually when when we add where lift into the into the picture. We have this navigational widget that kind of extend the session time and two pages per session, and so on and so forth. But, but also, we now have data that we can use for understanding more the behavior on our site. But if we are a publisher, we can also sell better the advertising. So this is the knowledge graph, the Data Studio template that we have created using the data and the graph. So here we see that the traffic from Google Analytics, not in terms of pages, but in terms of entity. So I can see for instance, for a concept like link data, all many articles we have on the website and, and what traffic it's it's generating these concepts. So here I can see that you know, I have 32 articles, traffic, it's going well, but you know, that's that's the that's the overall pattern and And then maybe, you know, I can look at, you know, machine learning and see, okay, how is traffic going? So I'm not looking at pages, but I'm looking at the concept within the pages. And so I can kind of get a different understanding of my content. I can also use these entities in order to understand for instance, the user behind this concept. So here we'll see a different Well, I can see that for instance, for for structured data, traffic, it's coming. Let's see if
Noah Learner 49:38 I think of your entity titles as topics like just big topics, and okay.
Jordan Choo 49:48 So is this data being fed into GA then through like js events or something?
Andrea Volpini 49:53 Right, right. So we get the data from the page into Google Analytics, and then here, we're just you know, creative a dashboard that that gets the data from from from Google Analytics, and allow us to kind of understand and get across. So you see that out of that concept, you know, 73% of the traffic is for organic, but maybe, you know, I'm sure that only data and better because there's less competition, so Well, maybe not, but we can see that. So I can see you know, what concept I'm so see I'm better. Yeah, I'm slightly better at least. And so
Noah Learner 50:30 how, let's let's pretend you're a content marketer for a second. Yeah. How would you use this report to drive performance? How do you could you clearly use this as a tool?
Andrea Volpini 50:44 Yeah. So, one simple way that I do is that you know, we can we can look at we can look at the concept where we are better in order to understand if if we need more content or if we need to kind of revamp existing content to keep on improving.
Andrea Volpini 51:03 And I my, you know, remove content that it's not relevant for my audience, because I see that it's not performing well. In other cases, for instance, in travel sector, I can see that some events are made focused in some events are female focused. And so I can kind of distill this information to the editorial team to say, hey, you're talking about a person within this age that it's, you know, interesting mean, let's say dining or technology. And that's different from you know, for instance, the person that you you, you can reach when you talk about the Google Knowledge Graph. Now, it doesn't make much sense on our side because it's kind of too narrow, but if you think it on a broader site, it makes a lot of sense.
Andrea Volpini 51:52 The other way is that as a publisher, I can go to the to the brand and I didn't say hey, I have these amount of traffic for you. Much like, you know, Facebook or Google do when when they sell us the topics, right? So so you can see the debate, the breakdown here is slightly different. So this is another use cases that I can use these in order to go to the to the to the advertisers and say, Hey, I have these concepts, and I have these patterns of traffic. Do you want to do something with me? So that's that's the other way.
Noah Learner 52:29 Got it?
Andrea Volpini 52:32 So yes, we've seen we've seen a lot I mean, and I hope you like it.
Noah Learner 52:38 by by presenting structured content for every piece of content on a website. It helps Google understand all the entities on that website. And because it understands all the entities, it understands how to present it better as answers to search queries. That's mine. That's The high level right?
Andrea Volpini 53:01 That's that's one of the of the use cases, the other use cases that I'm going to keep the user longer, possibly longer on the site by providing contextual information. Right? So by preventing the user from jumping off the site in order to understand who is this guy on what they're talking about, and so on, and so forth,
Noah Learner 53:22 and tons of interlinking too right, I mean, live,
Andrea Volpini 53:25 and then that, of course, and then of course, the other thing is that I can, I can, you know, create my create my vocabulary, but then I can, I can also ask, WordLift to automatically create all the links on the archive. And so I can create, you know, programmatically a lot of backlinks to these entity pages. And for you know, a convergence on on, you know, pages that talks about a specific topic. Like Libra Yeah.
Noah Learner 53:57 So how Everybody's talking about Bert right now, it seems like in forums and even when I talked to SEOs, they asked me questions about it. How How has Google's new announcement changed anything and how you're working or how you're thinking about your tool?
Andrea Volpini 54:19 So let me let me let me give you a practical example here.
Andrea Volpini 54:23 So this is just a test that I'm about to publish I haven't published yet but but hopefully we'll go live on either our blog or maybe you know, some some some SEO outlet that it's interesting in having a look at it but here so this is a Google Colab, not a book and I'm using Bert for creating meta description from the WordLift.io website. Cool. This is a very basic use case I've been working on it for quite some times in in the area of automatic text summarization on these implementations specifically, this is really a demo. But so we're starting here and we're loading bunch of libraries that that are used by these are not a book articularly we are going to use the transformers. These are this is a very powerful library that allows practitioners to use these new fancy language models like BERT. But But there is a lot of you know, more things that can be done with with the transformants. And we're going to use also spaCy, which is an NLP open source library that that I also use a lot.
Andrea Volpini 55:52 And, and then, once everything gets loaded into the Google Cloud, we can start rolling our script Here, we're going to get data from a crawl that I made off the website. So I have I have crawled the website using WooRank because where the his partner with WooRank. And so I have put the the crawl data to CSV in, in on Google Docs, so I can play with it. And, and now I'm going to create a data set using the the crawl data. And you can see here that, you know, there is, you know, all the, all the different files that have been crawled. And you can immediately see that, you know, some of these files, you know, don't don't deserve to be processed, you know, there is the robotis.txt, the of course, in the crawl, I mean, I get all the URLs that that have been crawling. So here I'm going to kind of play the SEO. So I automate the process of SEO by by saying, Hey, give me everything that is a page because I don't need to create meta description for me Or for the robots.txt file, give me everything where the meta description is absent. So there's no meta description. Give me all the URL from my blog, that I have a position that it's, you know, less than 15.
Jordan Choo 57:16 And are you pulling that data from WooRank or from GSC?
Andrea Volpini 57:21 I'm using I'm using rank for creating a CSV of the Chrome. So WooRank crawls the website generates me a CSV, I upload the CSV on the Google Docs, and then I'm running the script.
Noah Learner 57:36 WooRank is getting positioned data through search console API.
Andrea Volpini 57:40 Yeah, also. Exactly. So that's, that's my, that's the data from from the Google Search Console. So it's very nice and easy. I mean, I have the same scraper with with of course, Screaming Frog or, you know, whatever color you you want to use, you can use it. So we have 25 URLs that kind of match these parameters. After, you know, filtering everything that is not a page that you know, as a decent position, and so on and so forth. And I'm just going to, you know, run a very simple function here that will, will crawl the site, right. And, and, and, and we'll get the text out of the webpage. And, and here is where the magic happen because I'm going to invoke Bert in order to create a summary. And I'm using a very, you know, simple way here. I'm testing of course, a lot of different parameters. There is a lot of testing going on these days on these. Yeah, I want to generate, you know, a nice description of the pages that I can potentially use for improving for instance, the the context card that I show you here. Sure. So I want to use Bert for rather than, you know, getting the first line From the web page, I want to create a summary that fits with this card.
Unknown Speaker 59:08 Okay,
Andrea Volpini 59:10 and so now it's running.
Noah Learner 59:12 This looks like expensive processing, like we're still on one page, right?
Andrea Volpini 59:17 Yeah, it's starting. I mean, it will. It's expensive. And, and it's a very important point that you highlighted. Yeah. That it's, it's only expensive. It's also, you know, my beat my become unsustainable from from the environmental point of view. Yes. Because if I have to do it on large sites, I might consume a lot of resources. But start from the the point that, you know, in the beginning, I say, I'm just going to process on the crucial URL. I'm just going to work on my blog English variation and, and the pages that don't have a meta description that you know, maybe have a good position or whatever it's going to run it's going to take take some time, and and it's going to Kind of speed up as you know, things get get into memory. So the first one is on now it's gonna it's going to continue and build the other I think we have 25 in this example. So, it will take a few minutes before generating you know the, the results but you can see that the results are quite good, you know, so according to Wikipedia fact checking is fact checking is that of checking factual assertion and not fictional facts in order to determine the veracity and correctness of factual statement in the text.
Andrea Volpini 1:00:34 And it's probably too long to be you know, a good meta description, but it's, it's, you know, the the quality of it is, it's quite surprising.
Andrea Volpini 1:00:44 Now, this is extractive summarization. So I'm not creating new text, and I'm reusing the text in the article. So Bert is using the existing our packs that I have in my article. So look into each HTML for granted, what about HTML, HTML is the main markup language for creating web pages and other information that can be displayed in the browser. Wow super, this is good ready to go.
Unknown Speaker 1:01:11 Of course, if I would
Andrea Volpini 1:01:15 use GPU power or even CPU, I could use an abstract motor. So I could generate new sentences rather than using the sentences that I have on the page. But this would be even more resource consuming. And so I don't want to apply here, but we could do that also.
Noah Learner 1:01:33 So how do we make this sustainable?
Noah Learner 1:01:38 Well, first of all, we have to do decision like I'm not going to use abstractive summarization for this content. Yeah, that I use extractive because it's going to be you know, less resource consuming.
Noah Learner 1:01:54 I'm just trying to imagine a website like the New York Times, trying to use Yeah, and I'm sure they are. But I gotta think another thing is that you have
Andrea Volpini 1:02:05 to focus on what really matters, right? Yep. So you have to focus on things that that make, you know, the site different today. So you're not going to process everything that it's not, you know, from from the last, now, you can see pretty much the the time it takes. So it cannot be really done in real time, but it can be done in near real time, which means that as the user preloads the page, you might trigger this, and then cash the results and prevent these from happening again. Right. So there's a lot of you know, engineering that that needs to take place in order for these to kind of scale and use in real world environment but so you have to take decision about what type of you know, usage of the model you want to make and you have to be cost effective and efficient. And then you have to, you know, really decide where you want to apply this technology because you cannot do it, you know, everywhere because it's just going to be pointless and consume a lot of resources. And then and then you have to do a lot of engineering.
Andrea Volpini 1:03:31 Um, I don't do code.
Andrea Volpini 1:03:38 I give up. I give up writing code for production system a long time ago. Yeah. And my close friend, best friend and CTO. It's it's doing that they are be lifting where I'm just left, playing with things around and understanding how things could work. So I can experiment with a lot of technologies. I have ideas, but I don't I don't commit code that will go in production. So this one now, I'm playing with these. And I'm seeing, you know, the potential of it, I might create the model at the end. But then this is going to be engineered by a proper development team. But I tell you, I mean, my background is that I dropped out of school and I started to build website in 95. So I mean, you can imagine.
Noah Learner 1:04:28 Yeah, yeah. And I took my first coding class in 98, or nine. And it was literally a class that taught us how to link three pages together insert images and paragraphs and h1 tags.
Andrea Volpini 1:04:45 So you got you had enough experience just like I have. So that's, that's pretty much the background. So we've done this, we just got to, you know, kind of, we can store the data, and you know, we can see the results. Nice and clean. And then You know, I can just save the file and then import it into the CMS or, you know, clean it, revise it. There is also an important aspect here that we don't expect, you know, an algorithm like this to be under percent reliable. So we want to always kind of look at it. Sure.
Unknown Speaker 1:05:19 Very cool.
Jordan Choo 1:05:21 So we've, we've hit that hour mark, if not past it, Andrea, and, you know, we do want to be respectful of your time.
Unknown Speaker 1:05:28 Yeah.
Jordan Choo 1:05:30 So one last one last question. How can people get in touch with you
Andrea Volpini 1:05:36 linkin in Twitter.
Andrea Volpini 1:05:39 And really always open conferences, and I'm not doing too many this year. Because I want to stay focused on the product we really need to put a lot of effort in in bringing this technology outside of WordPress and in the most simple possible way. Of course, a lot of a lot of the work goes into you know, kind of large scale enterprise projects that that we are now starting to follow. But But I'm always happy to to meet new people and understand also new approaches there is there is so much going on.
Andrea Volpini 1:06:11 And what I see and this is will be my closing statement is that right now for me, you know, deep learning, machine learning, you know, knowledge graphs, and you know, ai in general is as interesting and as exciting as the web was back in 1995 when I started.
Andrea Volpini 1:06:32 So it's like that we are building a plane while while flying it. And there is so much going on. And there is you know, so many things changing that I am, you know, back 20 years ago, excited as I was, and of course, I'm older. But, but the feeling that I have, it's exactly the same.
Andrea Volpini 1:06:53 There's something new happening. People don't understand it, much like they didn't understand the web back in 95. And we have to jump on we don't know what it's heading we know that it's powerful we know that we must have a lot of ethical concern because we also made a lot of mistakes with the web as a whole and we saw it no privacy breaches and data ownership so these are kind of you know, also founding principle for the work that I don and I want to keep on doing with WordLift is that you know, we want to make sure that we get these rights so we don't want to you know, kind of burn the planet because we want to ride meta description to entice more clicks, right. So think about it. So we have to be smart, be sustainable, but but also we need to you know, kind of understand something that it's clearly not yet defined.
Noah Learner 1:07:49 This was one of this this presentation or hangout broke my blew my mind the, the most of any of the ones that we've had. There's a lot in here for people And I know they're going to be really excited about it when they watch it.
Unknown Speaker 1:08:03 That's good. That's good when it's gonna go live
Noah Learner 1:08:06 within a week. Okay? how long the transcription takes. I mean, that's usually
Andrea Volpini 1:08:12 Sorry about
Noah Learner 1:08:14 that. I, I've got a ton to learn. Um, if after the fact there there's any links or decks that you want to share, can you email those to Jordan so that we can include them because it's inside the presentation? Okay.
Unknown Speaker 1:08:29 Okay, awesome.
Noah Learner 1:08:30 Awesome. So everybody, we really appreciate you joining us today. And we're looking forward to our next episode, which is coming up in another two weeks or so. Between now and then keep automating. Keep it up. We look forward to hearing your successes. Thanks so much, everybody.
Jordan Choo 1:08:50 Take care everyone.
Share and learn about automating your digital marketing agency.