Jose Hernando, Technical SEO Builtvisible
Alvaro Fernandez, Lead Developer Builtvisible
Join us as we hang out with Jose and Alvaro, the duo from BuiltVisble as we dive deep into their Indexation checker tool.
Noah Learner: Welcome everybody to another edition of agency automators I'm your host Noah Learner here with my co host and great friend Jordan Choo. How's it going today?
Jordan Choo: Yo yo yo doing well excited for our guest today.
Noah Learner: Today is super exciting because we're here with some really amazing tech SEO guys. Jose Hernandez and Alvaro Fernandez. Did I pronounce your names? Right guys?
Alvaro: Perfect
Noah Learner: All the way from wintry London. Is it foggy
Jose Hernando: Not foggy just cloudy. Just the regular weather in Lond
Noah Learner: Yeah Nice.
They're from BuiltVisible, an award winning, top ranked amazing SEO firm in England that works on some of the biggest brands in the world. They've just built and released a brand new node based tool that we're really excited about because it'll help Let's get a sense of what on our websites is indexed and what is not. It's automated. It's really amazing, you can schedule it, they're going to walk through the tool with us and give us a lot of insight into how they approach automation, at BuiltVisible, and then they're going to walk through a bunch of use cases. And at the end, we'll even share our feedback as to what we think is cool about the tool and some kind of bonus execution methods that you can use to visualize the data on the tail end. Guys, we're so stoked to have you when we read the article. What I loved about it was how simple the execution was. You know, it's like npm install NPM run, you know, it's like
Jose Hernando: yeah
Noah Learner: we just have a list URLs and it runs. And when you watch it run, it's, it's super cool. I saw stuff. I plugged it into my money site. And I saw a bunch of stuff that wasn't index that I was really shocked. And You've probably had that same experience. One of the first times you ran it on one of your sites, right?
Jose Hernando: Yeah, yeah, definitely. I think I, it definitely opened my eyes to, to what, like, what areas of a site were index and actually made the case in some in some locations to actually talk to the implementer to address those issues. So so in reality, it is not something that was just created for the sake of it is something that we actually use internally.
Noah Learner: Yeah
Alvaro Fernandeza: You have loads like 30,000. URLs, that's when we realized like level
Noah Learner: I noticed that as the tool ran, we had it took a while, right, number one, it takes a while and number two, it does multiple passes through the data. And I'm hoping you can walk us through that when we actually get in and look at The code, but um, can you give us for those of us who don't know about the BuiltVisible Can you tell us a little bit about your firm what your specialties are? And what you love about working there?
Jose Hernando: Yeah, so I think so BuitlVisible is a specialty SEO agency, but we we specialize in technical SEO content, and analytics. And we've been, we recently celebrated our 10 year anniversary. So that was, that was definitely a milestone. And, I think generally people know us from our technical capabilities but also also content analytics are doing are doing really well for
Alvaro Fernandeza: But recently, we'll be investing a lot of time in automation and running, you know, to applications internally to give us an advantage for optimizations like python and AppScript. It is really good because out here real innovation. So those kind of latest technologies and for whatever means.
Noah Learner: What's it like being embedded in an SEO team as a developer?
Alvaro Fernandeza: This is the same thing. I mean, obviously, as developer I've mostly been doing JavaScript for a few years. Interactives and calculators and also sort of web apps, small concept apps, obviously content on the website. But recently been, I've been immersing myself into all this optimization is quite exciting because you know, all the full stack apps have the real ability of many of the things that these guys do and me. There's so many areas in which we have made full stack apps. React, node and it works brilliant offers a front end that they can possibly use as non technical people in terms of whether they're on the back end, all the sort of things that otherwise to do on Excels, like, you know, marketing sucks. So it's a good experience to build these internal full stack apps.
Noah Learner: And as you're working together as a team, how would you describe your workflow? Or are you in an agile environment? Or what how would you describe the process?
Alvaro Fernandeza: It is a different hand because obviously, the way it is with these guys is maybe through a miniature course, it was kind of like steering the innovation and having ideas. And then basically I just kind to see what the environments are, we'll administer the thing and just put together an idea a neccessary of security and all those other other environments a bit more of my focus as to how we come about. Because then once the goal is to define it says pretty much take it from there and we'll use the same full stack apps will react they all kind of follow the same pattern as the differences but it's is pretty much no need to watch the technical SEO because is clearly defined. It is different comes to content and what it is that engages is a bit more creative. but technical SEO provides a very clear path.
Noah Learner: And yet when you're coming up with solutions, like what we're about to discuss that there's like a huge amount of creativity in that.
Alvaro Fernandeza: Yeah, I come from front-end web development.
Noah Learner: Yeah
Jose Hernando: I think, I think in general is more about understanding what are the pain points. For us as technical SEOs on the whole team and we were really aware of when something is is really painful and when you have to do something like many times therefore like for us we just have a pipeline of ideas or or things we we spend too much time on and then and then it isn't talking to Al to see which ones we can make happen depending on how painful it is. And then secondly, how feasible it is
Jordan Choo: with your prioritization matrix and when making the when is prioritizing which things are automate which things to hold off on. Love to hear a bit more about that.
Unknown Speaker: Yeah. So I think in general, it depends on on situation because one time and for this one specifically
Jose Hernando: for the a decision to and we have specific case I would have we had a plan was we saw, weird hits in the logs and we're like, no And therefore, like we were wondering, like, Okay, so we're definitely within crawl budget, but we're not, we're not this is actually just going hitting it or Google indexing. Co for us, it was, it was like, we need to know, we need to know because this is a massive waste of resources, and I will need to address so for that one specifically. And like both Al and I talked about it. And he started just like, you know, doing his own thing, creating his own with his own way of doing this creation
Alvaro Fernandeza: This skill in particular is really, really, you know, it was an idea. It wasn't like we kind of decided to do this from beginning to the end. I thought what he was asking for the beginning it was going to be so easy to just get the screen compare it, and it became really resource hungry. We started with the side operator instead of what we actually do in the application now. Because if you can use the site operator is doing already so I saw it as a simple thing. So I believe that it was easy then things started getting more complicated and complicated escalating soone or later we have this solid
Jose Hernando: Yeah, very solid.
Alvaro Fernandeza: So it was a bit it wasn't really touching the problems that were occurring. And every time we did that, it became more like a mission like handling weird URLs, you know, and we had to do this the illegal way. But we did it anyway So it was like, narrowing down until we didn't find URL that we couldn't check. Yeah.
Noah Learner: Cool. How long did it take to build the tool?
Alvaro Fernandeza: I mean, obviously, the tool was not a priority as such because there is still work so whenever we had some time we would do it and we enjoyed it. So it's hard to say but, maybe six months of weekends, that's for sure. It certainly take some time if we focused on it more intensively. But I mean, you think of it like, the promise kept breaking and once we find out what's wrong and suddenly after a few week it's not correct. And then we revise that so
Jose Hernando: Maybe if we put it all together maybe maybe 30 days in total 45 days of actual work, like in reality like we've been like on and off like sometimes sometimes like to Al's point that it was it was good enough it was good enough for specific cases I was working on and then I would run into like different different clients and therefore we have to evolve. We have to create new ways of tackling those issues. Yeah.
Alvaro Fernandeza: I mean, one of the biggest was the the biggest problems was encoding. English is the easiest language you have.And it's easy. You don't have to go down. It's just the way it is it comes to Spanish, French, you run into these characters that Google has its way to interpet it. The clients have their way to do. So it was this big problem with parsing. You have this incorrectly parsed URLs we default positive is that Google was doing anything about it. I was like, All right. So it is finding out it sometimes happens to have these kind of like badly encoded and we had to get them nice clean, and what we discovered that Google was an accident. And this was a reall big moment for us.
Noah Learner: So if I'm hearing you correctly, the major challenge or problem that we're trying to solve is getting your head wrapped around what was getting indexed, and how much crawl budget, if any, was getting wasted on parameterised URLs that you didn't necessarily want to get indexed. And you wanted to know, they're just getting hit or they're getting indexed. Is that right?
Jose Hernando: yeah. So that's, I think that was their initial initial challenge. And that was how the initial idea became something and and in this case, you would actually like because it was a it was case sensitive so like they had URLs that had lowercase as the canonical ones and then we saw that it's going to uppercase URLs or URLs that contain capitals. And that was super weird. That was super weird, and I actually did some site operators. It didn't give you everythin indexed the attribute was wrong. And also the parameterized it was. We could do nothing with the site operators
We didn't know that the site operator would basically ignore everything after that is parameterized which is basically solely the domain. Yeah. And obviously, you're gonna have an index, so we thought there was basically we just stopped using site operators. And basically just compare this thing to the actual source code with the results from Google.
Noah Learner: Can you take us through the thought process of how you defined your strategy of how you wanted to solve the problem, and how that led you down the road of picking a different technology to do it.
Jose Hernando: Well, It was that moment, when we realized site operators was letting us down. Then it was like, right. So we can't rely on this anymore, we have come up with different ways that I was trying to go to the root of the hardest thing I saw let's look at the source from Google. Because at the end of the day I'm making a request, HTTP request. it's easy to just basically find a match. Basically, I mean, the idea to be honest, came from days. It got pointed out that if you basically pointed out to if you if you just Google the keyword
Alvaro Fernandeza: and you get a result on the document, you got it. There is nothing to stay with me like Alright, so then there's no need to do something just go for the actual source code and try to match it. Now, the challenge then became How do you match the site link because the URL that you are opening is getting parsed and Google searched. It was about how does Google deal with certain weird URLs? It was all about it was basically just kind of trying to find the way to mimic Google. Yeah, it's basically just dealing with a string of encoded URIs, which is a feature from JavaScript. This was another breakpoint once we realized that it wasn't encoding. the URI gave us a closer match. And then very late, very late before the post discovered this imposes upon in order to send it to the URL, and that's when we started my job. Now. The last thing really that was left was the unpopulated SERP was also challenging how Google did things it was finding, okay, so this is not working what are you actually getting and then manually checking that URLs in the results. Alright, so I put this I'm actually getting this other thing so I need to match it.
Noah Learner: Tons of trial and error?
Alvaro Fernandeza: Yeah, I mean it comes down to matching data on a screen. There was no good way
Noah Learner: Why did you settle on Node over other technologies? Was it because it was already kind of in your wheelhouse?
Alvaro Fernandeza: Well, that definitely is part of it. I mean, we do use Python as well for some of our processes of using the pandas and it's applicable models daily to deal with volume, CSVs, data, processing, you know, all this kind of manipulation of data set. So that's what we use Python for now, in terms of everything else is an experience. Now the thing I chose Node over Python is because we are doing a request. We're dealing with HTTP requests. And then from the requests, we might string. These are things that JavaScript is really good at. It works is perfectly well with the scenarios that this can provide. And, you know, I didn't think that JavaScript was the only way it is that it was the most appropriate thing to use. On top of that, the idea of this application entirely speaking is developing it into a full stack app, that can use react
Jordan Choo: I'm curious why you didn't build it on top of Google Sheets using Apps Script. Before for diving kind of straight into a Node JS, standalone script, and then eventually building a UI around that.
Alvaro Fernandeza: I mean, to be honest, I didn't cross my mind. Because I once did a, a community connector for Data Studio. There was an API service mode and then there was an AppScript that I developed to connect to the backend. And then obviously, that was connected to Data Studio. And it was very painful, because I had to use Ecmascript 5 and the limitation of that was horrible and also the factor, you know, and I like to do these codes on my Sublime at the time. So To be in an environment on on a top browser to do my code is already not talk to me. I mean, I hear that they now have Ecmascript seven is six, which is really good to know. It's.
Noah Learner: They announced it yesterday, I think
Alvaro Fernandeza: That's amazing. I couldn't believe it is five. Why? And so, yeah, definitely, it could be done. I mean, like I said, it could work. But it is that ownership thing, you know, because we're doing it. Inside, in-house, we're gonna have our own screen, you know, repo internally when we didn't want to have it like somewhere else. I there. There we go. It's Google. So we'd kind of scraping Google with their own Google products. You know, It's too dangerous. So I mean, like, it didn't feel fitting, in a sense, but I mean, it's perfectly true, I guess. We're ready right at the very beginning of this. So we tried to do this with the Google Sheets. Yeah. And running, obviously. Yeah, he's absolutely right. The stability they have on AppScript. So yeah, we did try to run some functions in there to do this. yeah, that was the beauty. We thought, Oh, it's Google API. So we got, we throught that we were smart. But they have a limit and a timeout that is incredibly low.
Noah Learner: I think it is less than six minutes, it's got to be I think it has a six minute runtime right?
Jose Hernando: Ah,
Alvaro Fernandeza: but it was definitely missing out on the CPU will skyrocket. Yeah, so that is what we thought Okay, we need to a do a custom scrape for this.
Noah Learner: Okay, so what we're hearing is a couple different things. So picking the tool of choice. So Python for you is a data manipulation data analysis tool Node interacts with the web in the appropriate way. App Script you found limited by runtime like it timing out after a specific amount of time and you're limited by Ecmascript five
Alvaro Fernandeza: It is a combination of things I mean, yeah, I mean, the terminal connection with EcmaScript by my personal experience for AppScript wasn't correct, it was the environment. When I'm on. I'm on a white screen, no shortcuts or I can't use my normal tools, my snippets, my things obvious code, and I guess I could code it there and then put it in but you know, It just felt off. I mean, somebody who codes they have to be on our on a browser to code it felt really out of place. And I said, I'm pretty sure Python would be very capable to do this script. But it says the way, that I wanted to do it
Noah Learner: Right. So it just made sense. Okay, I think we've already taken a dive into your tech stack, but um, do you leverage, like, tell give us more insight into what that stack looks like, like, are you? Do you have a lot of your apps running? Do you have lambda functions running or using Amazon or using Google Cloud like is it all internal
Alvaro Fernandeza: Our tech stack all the way through would be Nginx as a server on a Linode Box located in the US for the hosting. We us AWS for our domain management because it's got an amazing fast prepartion. It's really good. But we typically use Linode with an Ubuntu distro on Nginx with all my tools there on the server from the beginning. And running EC2 for apps running the background management and processes. And then obviously on the front end it will be React for the most part I do a lot of vanilla JavaScript here and Sass you know, Webpack and then the back end with Node and we do PHP as well since some our sites are on WordPress. But yeah, but the current stack is always moving. You know React, Nginx, Gatsby and Python for data automation
Jordan Choo: So on the Python for data manipulation side, are you interacting directly with API's? Or are you like storing it in a data warehouse? What does that look like?
Yes to the request we do API requests with Python at some point on on this on the tools that we use but not for the database because what we do basically is all about usability so we get a CSV file to process it. Since we're really good the Excel creation and capture and we are table functions and then output. Yeah, so it's not only for storing, in a database since it's in-house it protected. So here so we haven't. I mean, we could take Python 2 with Flask or Java and basically just do pretty much the same thing we do with Node mobile video. But the way we have it is everyone is running it on their computer
Noah Learner: very cool. Are you gentlemen ready to share your screens and walk us through the tool?
Jose Hernando: Definitely.
Noah Learner: And everybody listening you're in for a real treat because they're going to walk us through the whole process of setting it up and running it. And we'll look at the code as we go because there's some really neat fun things about it. Okay.
Alvaro Fernandeza: so here we have a list of URLs. Can you see my screen?
Noah Learner: Yep, looks awesome. Look at all those cool characters.
Jose Hernando: Yeah, we're hackers
Alvaro Fernandeza: right, so obviously as the script says we need to first get a key to run it so I basically just want to request that it happens before so this is a test if I was a new person doing so let's see if this work. Sign up for ScraperAPI
Jose Hernando: Yeah, that as quick
Alvaro Fernandeza: So now I am supposed to get an email from ScraperAPI which will give me a link get myself a key.
Noah Learner: So just for everyone watching, what we're doing is this tool relies on scraper, scraper API calm. And so we're setting up a scraper API account. So we can grab an API key to run, run our tool. And to do this, which cool is that they have a really generous free trials. So we can do a bunch of API calls.
Alvaro Fernandeza: We have 1000 API calls with five concurrent requests. This is important because you mentioned before about speed, and obviously this goes hand in hand if you go 50 concurrent requests the script will go faster, so it's all about me trying to do it with you. So we need to be financially aware, so I'm just going to with the free one right now and now it give us this key Now, like I said before, and you see that here at the bottom, we have 5000 request, but we'll have five maximum current was important because the way this script goes. The script obviously deals with a promise that it doesn't know yet. So the promise is basically doing parallel requests, which means give me all you can at once opposed to doing it one by one. And so I set the number five here to meet parallel requests. This is always done through the control access view where you can adjust the parallel requests or current limit outside of Axios, but Axios focus is this possibility for script Actually, I know the guy who developed it which is quite cool.
So,
Noah Learner: So can I use that library to run puppeteer five different versions of puppeteer at the same time?
Alvaro Fernandeza: No because Pupeteer uses its own request method
Noah Learner: Okay. Sorry.
Alvaro Fernandeza: Axios is more kind of like getting the HTML as a screen, basically is not aware of what the things means. Pupeteer is different, it renders it. It knows what the elements are that's what I meant by the other kind of problems have somewhere else where you have promise with the concurrent limit and getting it to work with Axios and in this case it is tailored to us.
Noah Learner: cool so I got it. I just hadn't used that library I hadn't used axios
Alvaro Fernandeza: Oh yeah. Axios is great. I mean it's is getting along for the ride because. but I've always liked Axios because it's very light. So here the limit is which goes in hand with this. It doesn't get passed this step by that because as soon as you request access to the tool you start the trial. If for a reason you pay for the lowest tier which is 10 concurrent requests. you can adjust it to that.
When we were running it people will say are getting a little nervous because initially we lost the thing with will use was 10 parallels obviously people were running in at 10 per request but the maximum that they have is 5 and then there won't be any errors. So we said five because that's what the what the trial will allow us. And we're going to copy that key and then I'll go the API key file and paste it there and obviously as a string. And now we have the key in that. And we have a bunch of URLs that we already put in there that is basically is a CSV. And no need to do anything but separate by lines
And we're ready to roll is just going to go straight for these URLs.
Yeah, I have the terminal handy in VS Code. So I just want to do a NPM
Jose Hernando: Install?
Alvaro Fernandeza: It's installed now because it's already installed. If you just copy these files are also I have to say that all my VS code version I had no audit because I don't have any use for it. But not mistaken with the fact that they're actually here. So when you first get this application, you will have to do NPM install, it will create a folder on for your models, but I myself hide it all the time. But, I have it installed therefore I don't need to do that. I just need to do NPM start with obviously it's a shorthand for npm run start which will call the script with this long name so I'm gonna run it hopefully it's working really well. And it breaks obviously on the live test
Jordan Choo: yep Murphy's Law
Alvaro Fernandeza: Ah it's fine
Noah Learner: What does this thing mean?
Jose Hernando: Maybe you need to install it?
Noah Learner: Yeah, I was waiting for that.
Alvaro Fernandeza: Oh god yes I know because when I put this slide, I remove the models. So let's do that
Jose Hernando: even better now we're doing the whole thing.
Alvaro Fernandeza: Alright, so let's do that
Noah Learner: I just gotta say this makes me feel so much better about all my explorations in coding.
Alvaro Fernandeza: Alright, so NPM install
Noah Learner: what what tool are we looking at by the way what what it is this?
Jose Hernando: This is VS code
Noah Learner: VS code? okay.
Alvaro Fernandeza: Yeah, it's customize to great length but, it's VS code.
Noah Learner: I heard you like to share your clips
Jose Hernando: What? Sorry?
Noah Learner: I heard you like to share your clips all of your you know, like little code clips. I'm joking.
Alvaro Fernandeza: Oh yeah, on the podcast. Oh yeah, we'll I'm not going to code
Noah Learner: I'm totally kidding. I'm totally kidding.
Alvaro Fernandeza: Let's see. Yeah, it's working now. It definitely ran and we're installed.
Noah Learner: tell us about the really cool looking console output.
Alvaro Fernandeza: Oh, that is the addition I turned to. I am using which is
Noah Learner: is that Chalk here?
Alvaro Fernandeza: Yeah. Chalk it is obviously, it's just a coloring tool. I just found it so dull to see everything in white but yeah, you know, especially when you have like, you might have errors, you might have positives, so it was a clear chance to color some things for
NPM is great
Jordan Choo: I love the organization and and by the way I love the fact of how detailed your comments are within the code
Alvaro Fernandeza: oh yeah I'm since I was working public I thought I should go the extra mile and it's not normall that I do that.
So we got some errors as we are going there and we can see it by the color that the URLs are so that's when they came like it's just the name of the URLs and obviously with getting the results of each URL in a CSV fashion with the coma
Noah Learner: Mountain bike shoes Whoo.
Alvaro Fernandeza: Yeah, I mean the speed like I said is obviously the turning on. On on the planned roadmap. To try and go for 50 concurrent requests. I'm a
Noah Learner: Is this now part of your tool kit? like did you subscribe to to scraperapi.com? Because of how you built the tool list?
Alvaro Fernandeza: Yeah, it's well, yeah. We've gone for the 10 concurrent request plan and it works really well for what we do. We use it for our clients and for lots of clients actually and with a big volume because this is a feature of this app. It's definitely the name features that as opposed to anything else, this is not scared about 20,000 URLs
Noah Learner: Are you using SERP API, SERPAPI.com in-house?
Alvaro Fernandeza: So I think whenever we started to offer these services. We did test and Scrapper API was much better in terms of value for money that it worked really well. In reality, even if you go for one of the paid tiers is it's really, really cheap
Absolutely. And also, in my, from my perspective, one more appealing things. With Scraper API was the cost that of money when he when he brought it to me, I was like, I really like this because it's so basic. So these basically, you know, because I was here, I was going through the API twice.
Noah Learner: So you liked the basic API connection. You didn't want to deal like with oAuth 2
Alvaro Fernandeza: Yes, exactly. I didn't want to use an API that had a way of doing things. I wanted to do things my own way and Scrapper API provided this super simple way of putting it in URL. Whatever you searching for, that's concatenated with Scrapper API, and that was it to me that was it because you have the site you are going towards. The API Key and the URL for Scrapper API. So it was very easy putting it all together. And you know, Axios is happy it is getting somewhere and getting some results. The fact that we can actually finish all we have here is our search done successfully. To us that was it.
The last was say, You're either
Noah Learner: Can you show us the output. Can we look at the results?
Alvaro Fernandeza: Sure. So obviously, everything's clear the URL file system was renamed from the error file. So all of the error files are gone. All of the URLs from better with their respective indexation status this obviously this is the log form but that will be here which you can open using Excel as a table. As you can see herewe have 37 that are indexed and that
Noah Learner: You've been talking about Excel a lot. Do you use Excel as your as your as your spreadsheet tool in house are you guys using Google Sheets more?
Jose Hernando: We're using definitely more Excel just mainly because Clients are also like more familiar with it. And therefore, we we we give outputs to clients Excel is what they were what they work with. Therefore, we need to do to them in that format. And we then we internally we work with excel as well. We also work with Python for example for data manipulation and we only take the output of what we want to show. So like a later state or an export essentially. So if if I can put like a, like an analogy in it, we have this way of data and we just take a chunk of that and do everything that we want with Python and then we export it to excel.
Noah Learner: So for your enterprise clients, are they How are they analyzing data, are they looking at it with Excel are they looking are they are any of them using Power BI
Jose Hernando: Nope, nothing the ones that I work with. I know like any any type of like, data visualization or database gathering. The one that I work with is uses BigQuery. We use Data Studio for example for visualization. But, yeah no one uses Power BI
Noah Learner: I was just curious because, um, I was in a conversation with someone about how SEER interactive has really pushed Power BI in a big way. And another SEO was telling me that the reason for that was that their clients were using Power BI. And I was curious, like, you get to deal with a lot of enterprise folk, and I was wondering if that's, if that plays into the decision at all. Okay, so we have, we have, I just want everyone to understand what we've just accomplished by running the tool. We try took a full list of URLs on a website or variety of websites to see what's indexed, what's not. So what? What's the value help us understand what the value of that is?
Jose Hernando: Yeah, starting in reality, it obviously depends on the use case that you, you're going for, like in reality, and it will be it will be very differently, for example, in the first place that I mentioned, and you know, you see a lot of like, hits from Google bot, and then you want to see those if those are indexed and then you have to analyze, okay, Do I want this non canonical URLs index, then you have to decide either I put a non index tag for example, I tried to get rid of those. And so maybe don't you see or, and on the opposite case, if I have lots of like money pages, for example, lots of product pages that are indexed that I want those indexed in there where I have to create a sitemap for example, and then And then submitted on GSC so I have those uses cases
Noah Learner: I thought that execution was really interesting. I've had scenarios where the ecommerce platform that I do most of my work on by default, they don't include collection pages in their sitemap. So and all the files on that platform are on an on an external CDN. Okay, I don't have control over in Google Search Console. So for me to get category structure or collection pages indexed, I have to build a custom sitemap hosted on my own agency websites, Search Console, like are you having to do that silliness in house?
Jose Hernando: Well,we haven't had to go through that point yet, but I'm sure at some point we'll have to deal with that. No. For the moment, and it's been really well received from development teams and clients to, to deal with this kind of issue because like this is a personal for the money pages specifically for product is not the index, that is the trading money that you live in on the table. Users are not able to access your site and to products that they actually want to purchase then that is a massive business loss. So it's a it's a really easy case to show the clients and they understand it
Noah Learner: can you can you talk us through the process? Because I have the same money pages use cases.
Jose Hernando: Yeah, yeah. Yes, I wanted to show I wanted to share like a few a few use cases for the tool that we use internally. We also like know that we can use it for. One of the most painful ones will solve lots of issues for many folks out there is that understanding if you have all Money pages index. So like going going from like how to look for for product pages, for example, you just go through like a normal crawl that you get if you use Screaming Frog, just identify all the product pages that you have, in this case is specifically a use BhooBhoo, because it's just an ecommerce site. And I saw that, for example they use on all of their product pages contain, an html file type, therefore, easy, really easy to identify those on a crawl. And so when you when you actually look for those in your crawl data, I use Screaming Frog because it was just for an example. But you can use anything that you want. You can use Oncrawl, DeepCrawl any of cloud based providers as well. So here we looked this URLs on your crawl and then you export it, and then you put that as your URLs.CSV and then you get al the data from the index tool. So you get in this case, I think I got like 40,000 URLs for product pages but, I didn't crawl the whole site. And so you put that as your URLs.csv and get the index data.
Noah Learner: I want to reframe to make sure that I understand this. So you run a custom extraction in Screaming Frog based on a tag, a class, an ID, a meta tag something, you find a hook, and you run your extraction to populate your URL list. Right? Yeah,
Jose Hernando: Yes, this case it was easier, because because all of their product contained .HTML file type So you can just search for it in the internal search bar, you just go HTML done you export your URLs with HTML, and that's it.
Noah Learner: Okay, so you're saying that all the product pages ended in dot html?
Jose Hernando: Correct.
Noah Learner: Okay, got it. Okay. So um, so then you populate your list, you check if their index. And if you have just a few that are not. You do what?
Jose Hernando: Yeah. So it depends on, on on how many those are like you might think that, for example, for some reason from from the 40,000 URLs and you don't have, let's say 2000 URLs are not indexed, for example. So if you have many URLs not index, you create a new sitemap XML that says, let's say like products dot XML, or new products or XML give it an XML file and you you put it on the server and put it on the site, and then you submit it on Google Search Console. Because Google Search Console will always try to crawl URLs that they know so you actually said to Google please crawl these URLs, they will do it and opposite to try to find them through the site architecture. If you have a messy site architecture, then it might be the case that they won't find those those URLs.
Noah Learner: And what level of success? Have you found using that method to get? Like, what percentage of those URLs are you finding or getting indexed when you create a custom sitemap?
Jose Hernando: Massive
Noah Learner: what's that?
Jose Hernando: Massive, massive success. Because at the end of the day, like think about it, like Google obviously has its crawl queue but if you tell him, I have all these bunch of URLs that I want index, it will live for those. So in reality might take a few days to get through all those but you just making their job easier. So like massive success.
Noah Learner: And would you keep that sitemap live forever inside Search Console, or would you delete it after you see different outcomes like you see that stuff get indexed.
Jose Hernando: I think that like that becomes like really like it is it is very customizable because like it depends on the way that you that you remove products from your your website, for example, you might get that that specific shirt, they create that shirt in blue, for example, so that one in black is no longer there. It will have they will have any blue next month. Therefore, if they if they recycle the URLs in order to stop bloating, the site architecture then keeping them doesn't hurt anyone. And I think if this is something that you do on a regular basis, it will get to the point that you have too many sitemap on GSC. Therefore, if you're just trying to input sitemaps all the time, then yeah, definitely once you see those index, then just remove the Sitemaps and create new ones.
Noah Learner: Got it! This Yeah, this use case really caught my eye.
Jose Hernando: Yeah, yeah. So so it's really common for ecommerce sites to, to have faceted navigation. So navigation essentially gives exponentially increases the number of URLs, of crawlable URLs that you have on your site, therefore, and each platform essentially deals with faceted navigation in a different way. And so for example, using the same same website, Boohoo uses it adds parameters depending on the on the on the item that you choose. So for example, in that specific example, I added size L, I think so that one just adds a parameter that says, v1 equals L and but you can do that exponentially, I can do as many parameters as I want. So that is a massive issue. So when I look at these I was like, Okay, I want to find how because all the parameters look the same it says pre fn or pre like pre something, then I can use the pre thing to tell me all of the URLs are on the site. So I did the same thing just once we turn off the pre event and then I am I took all the URLs out and and then I input those into Google Indexer. And then I got my I got my use. And so in this case is specifically because it's faceted navigation. It depends a lot on on how how malleable the platform that you're dealing with is and therefore there are a few a few ways that you can tackle this. So one is using the the configuration of the URL parameters in GSC which sometimes works and sometimes doesn't. So it's it's a lottery, and you can always kind of canonicalize it, but canonicalizing doesn't mean That you're going to get the version that you want indexed, therefore, it has its drawbacks. You can use noindex nofollow. But obviously you have to remove the canonical otherwise is just sending mixed signals to Google. And you might not get the result that you want. And also you can you can block specific directories through robots.txt in order to prevent Google from crawling those URLs. But again, robots.txt is it prevents crawling, it doesn't prevent indexing. And therefore, it might happen that even even though our URL is blocked by robots.txt, you get that URL indexed.
Noah Learner: Do you have any great research resources to share about best practices for configuring URL parameters? I find it's like super hard. It's hard to get it just right.
Jose Hernando: Yeah, yeah. I feel your pain man like it. Each platform has its own way of doing things. And sometimes you can customize the response that you want. And sometimes it's so hard to do. So you you have many tools, and you use the best one for for each case in terms of resources. I think like one of the reasons that I use personally from our blog, a team member of ours Maria, she did an amazing guide on on dealing with faceted navigation. She works with a lot of e commerce clients as well. And that one is really, really good. I can send you the URL as well.
Noah Learner: I'll share it in the deck. I mean, I'll share it. I'll share it as a resource. That's super cool. how are you guys doing for time? I know that we're running a little long here, but I feel like we got more to cover. Do you have time to keep chatting?
Jose Hernando: Al are you good on the time?
Noah Learner: Okay, this sweet this. This caught my eye in a big way too.
Jose Hernando: Yeah, so another use case and you can you can check Through GSC, essentially, using the GSC export API, you can just check how many how many URLs, you can use, how many of those have organic organic clicks, you can also use Screaming Frog, because they have connector to GSC that you can get the information easily by URL. And so so you can get, you can get a full list of all the URLs that have some clicks, the ones that have nothing. So from the ones that have nothing you want to know, if Google has first of all found them, you need to look for that. And so that you don't have those, or if those that are zero clicks are indexed, so I'm using the tool, you can get those zero clicks URLs, put it your target URLS into the tool and then get get all the data. And then then you can make the call like if the URLs that have zero clicks are indexed, it means that that particular URL is not satisfying any user query or is not good enough. Therefore, you have to rethink about how to target a specific URL that one makes sense to have in your architecture. And that will serve any any user. And then the other one is, if and if those are not indexed, it means that it's probably, like really buried in your architecture and is really hard to find for Google has not has not founded or has deemed it like not not unique enough in order to index it. And so the approach is really similar. If you think that the URLs are URLs are not index are good to go. And Google has not found them then make it easy for them just if there are a few just use your inspector to request their submission manually. Or you can create a sitemap XML as in like, organic dot XML, for example, and then submitted to GSC. And that's it.
Noah Learner: I love that. I'm that new Sitemaps concept, I already have a tool that I use that's puppeteer based it logs into the back end of websites, It then goes to the page that has the whole catalog structure, it scrapes all the links, creates a list of all those links, and then loops through them and create a sitemap and uploads it to my server. Automatically every day for all my client sites in case a new collection gets built or whatever. And so I see how to take the technology I've already built in kind of bolted on to what you're talking about doing. So that's super cool.
Jose Hernando: Amazing, and actually just thinking about Puppeteer for developments. That is one of those things that we want to develop, I want it like that, that process of creation. Next step. What's the next step, the next step is creating that sitemap file so we can just automatically because we do have access to the GSC properties, from most of our clients definitely will be willing to give that file to the development team and they're uploading the file so we can we can ease the process from the ground up.
Noah Learner: Do you want me to share my code with you? drove me to start aside. It might help. Yeah. Do you feel like on the automation front, I feel like on the automation front, it's just like being in California in 1978 working in Xerox. Like, I feel like, I feel like it's there for this weird little niche that we're in. You know, it's it's pretty exciting. To me anyway.
Jose Hernando: It is pretty exciting. I am always interested with that with the Puppeteer because it sounds so good. It's almost too good to be true. And we seen a case where it was a script in Python.From Hamlet, Hamlet Batista that he had have in your show.
Noah Learner: Yeah. He was
Alvaro Fernandeza: funny. He was puppeteered by the way to access GSC, but we recently found out that it's broken, so this is the thing with Pupeteer property
Noah Learner: I shared this in one of our Hangouts. So Hamlet's tool is off the hook. Amazing, right? There's a lot of coding to build. And there's a lot of different stuff happening. It builds its own UI and all that stuff. I basically did the same thing and puppeteer, like not PIPeteer. I did it with it with nodes version. and execution was a lot simpler. I was able to log in. I was able to paste URLs into the tool, I was able to submit URLs. It then throws up a little pop up saying, Oh, this is in the index. Do you want us to are you like, do you want us to index it? I click the button and it works. And it works for like the first couple of requests and then it's Throwing up captcha's. And if you're stuck, if that's where you're stuck, Dan Liebson in was saying, Oh, so did you do something with the mouse, like inside puppeteer, you can make the mouse do things to mimic mouse movement. And I think that's why I was getting captures. It could have been an IP. He said, was it an IP address block? Or was it a was it a like, like, human mimicry block, and I'm gonna test that more I don't really need to spend to much time inside their tool but because I really want to want to get I don't want to get blocked, like
Jose Hernando: really cool. Just to, just to like, like over the previous like, just quickly I can or many other versions like many other use cases for this too like checking if a canonical URL is indexed that we use with clients as well and with case sensitive URL. And also there are other use cases like understanding which 404s and 500s are indexed because you can prioritize those, for example, and give it to your to your development team and say, Look, there are many protocols on this side. But these ones are the most important ones, because these are still index. Therefore, Google will probably crawl this at some point, realize that this is a 404 they're going to get it all day. So fix those first and then go, I wouldn't go through the rest. And then understanding how big of a priority that your staging is, is like we see that a lot. And I also like more and more advanced uses, like using the logs using the server logs to determine which URLs have been hit by Googlebot are also in our index, which ones are not.
Noah Learner: Yeah, that's, that's really cool. Have you figured out how much time This saves your team
Jose Hernando: So I think I think this one is is is a tricky one because like before we couldn't get this data, like it was so hard to get this data it was it was impossible to report on it or to do any kind of strategy in order to add value to our clients. Therefore, I think in this case, more than saving us, like time, it is more about added value. We're bringing more value to our clients by giving this this information.
Noah Learner: So you basically went from a completely opaque problem to one where and you have visibility and can act on it.
Jose Hernando: Exactly. Yeah, totally before it was a it was an educated guess. Like, for example, before, you could say Googlebot is not hitting this area of this of your site. Therefore, we can we can conclude that it's very likely that this area is not been seen by Google and it's not index. Therefore, we act on it based on the assumption that because we haven't seen Google bot crawling, particular area of the site, your site is not indexed. But now, now we know now we know for sure.
Noah Learner: So Jordan, we haven't ever really thought about this. We haven't really talked about this on the Hangouts before, in that way, you know, so much as like time savings. We don't really, you know, like, when I created that sitemap tool I told you about, I didn't really have a way of doing it otherwise, right? And that's the same thing that you're saying. It's like, sometimes you have to use tools and automate them to get them to, to execute, because otherwise just doing it by hand just isn't feasible at all.
Jose Hernando: Yeah. It's not scalable.
Noah Learner: How do you visualize the output?
Jose Hernando: So at the moment, because it's a simple sort of like a binary check, like, is that one or zero? It is a binary check. So in reality, for productivity it has a has been really, really good. And because you can just get a pie chart, and at the end of the day, what you want to show the client is, here's the problem. And here's the solution. And visualizing the data for them is not it's not that important for you to show how big of a problem it is, resolving it is all they care about.
Noah Learner: Yeah. So when I walked through your code, and I saw that it pushed the results into a CSV, I started to think to myself, okay, so how would I use it? And this is just Noah's thoughts. That doesn't mean that they're relevant or useful. But so usually, I do a lot of stuff with Apps Script when I'm processing CSVs, because it's really easy to process CSVs and then pull it into Google Sheets. And then from there, I can visualize it in Data Studio. I thought it would be useful for someone if I share shares ideas. I don't know if it is or isn't. But basically, when you run the tool, I would then have at the end of the tool when it's done, I would have it POST to my app script file location, when it and then inside your App Script, you would have a do POST function that then triggers the function that pulls in the CSV, processes it and pulls it into sheets. And then you can use that as your data source. Other ideas, I thought you might think this is cool just to share, trying to get the data into BigQuery. And there's two ways that that I will be doing number one, same idea where at the end of your script, you would have it POST to App Script location, and then have App Script, pull it into sheets. You can use that sheet as an external Big Query table. Then you can do a couple things, you can have another SELECT clause inside Big Query that pulls all that data, and then saves the results of that query into like a real Big Query table. And then you can visualize it in Data Studio even faster. And then lastly, which is less steps, but probably harder for most people would be to just use the node Big Query library and have that just pull straight into Big Query so that you could visualize in in Data Studio I thought that would be useful for someone I don't know what your thoughts are.
Alvaro Fernandeza: I mean, the tool is obviously is is open source Yeah, because because that's the way you can do things with it. I mean, like the way we're gonna do these internally, by running React front-end. So the would be uploading a CSV through our web app, that will be processed by Node on the back end. And then it will do all of the request and then the output will, will be because it is so it's such a long time, we will be there with an Node backend. So basically the front they will say back to you say we'll get back to you. Obviously, you put your email address beforehand, so we know which one to send it to and then the front end says by to you and you close the window. The back the server deals with it when whn it finishs it will email you with the attachment on it. So basically the CSV attached from the CSV form the server. The reason why we didn't o the app that this way. Because we would have to host everybody CSVs. So that's why that is such a raw form because that way you can you have the pieces you can run in your PC, you can do this because you have all the possibilities, so many possibilities by any of the three that you mentioned will work. I will definitely go for the last one. And yeah, basically, Data Studio you know, you can do all this data segmented by rows and see whatever you want. So that definitely can be used in more fashion.
Jose Hernando: Yeah, I think it's also important to bear in mind that although you won't let you definitely want to like visualizing some of these data that you want to, to have this is raw data be as insightful and as actionable as profitable as possible in the least amount of time. Therefore, like the moment that you can get from getting the data, give you the insight to the client executed, you want that to shrink to the minimum expression. Therefore, I think if we ever use visualization, it would be something that we will part of that React app front end
Alvaro Fernandeza: From that, obviously, with Data Studio, it's a visuliazation environment is ready for you. It will be affordable, you need to do some sort of action. Yeah, definitely. Speaking of CSVs, I have some things to explain to you what it is about because of the time that it takes for the data process. until you receive your results, you have to end communication with the front end because you can't be waiting for two hours o your front end or taking anywhere from that, obviously,because of the results of the CSV will come through the email as an attachment. Then we lost the ability to be so nice for you front end wise and give you a link to show you visually.
Noah Learner: One One last thing that I wanted to share about Screaming Frog which I'm sure That everyone listening probably knows. But if you hook up Google Analytics and search console and you also maybe hook up Maz or majestic, the thing that's really cool about the export is it unifies everything for you by page, so you don't have to, like, deal with all of this URL mapping bullshit that you have to deal with generally, to unify GA and Search Console data. I didn't know if everyone knew about that, but I thought it'd be useful to share. Gentlemen, this has been great. You've spent so much time with us. Is there anything that you want to share with us kind of as a parting shot? Because I feel like we've already taken so much of your time.
Jose Hernando: But I mean, like not much we shared this. This is honestly just like on one of the innovations that we do here do at Built Visible as we said before, we we have innovation at the core of what we do every day, and therefore, we try to share with the community As much as possible, and we've done that to the blog, just if you want, like, if you let us plug it in with every new innovation that we do that we want to share with the community when we share it even in our blog, so that's always a good resource for everyone to check what we're up to.
Noah Learner: Gentlemen, this has been amazing. Jordan, any last questions from you? Know,
Jordan Choo: I'm good on my end, very insightful
Noah Learner: I mean, I'm really excited to share this with the community. I think that they're going to get a huge kick out of it. We really appreciate you and we appreciate your time. I know it's getting late there. any big plans? You're going to go hit the pub?
Jose Hernando: It's actually my birthday
Noah Learner: Time right.
Jose Hernando: Yeah, it's my birthday, my 30th birthday, so I definitely gotta hit the park today.
Noah Learner: Oh, that's super exciting. Well done.
Jordan Choo: Happy birthday today.
Jose Hernando: Yeah.
This is really a real top
Noah Learner: you know? Yeah, yeah.
Jose Hernando: A great way to celebrate.
Noah Learner: This has been a great episode of agency automators. Again, I'm your host Noah Learner Jordan shoe. We're so stoked to have the crew from built visible with us. Hosanna varro. You guys are amazing. Thank you so much. Have a great weekend and we'll talk soon. Thanks, guys.
Jose Hernando: Bye
We built this Add-on to help you manage GMB Questions and Answers in an easy familiar way with the aid of Google Sheets.
Share and learn about automating your digital marketing agency.
AgencyAutomators is a Bike Shop SEO and Kogneta company devoted to helping your agency become more profitable with automation.
AgencyAutomators © 2024
Sitemap
Terms of Service
Privacy Policy