This video is part of our Weekly Wisdom series which features experts on a variety of topics.
Hi guys, it's Ross here from Type A Media, welcome to another Weekly Wisdom video. Type A Media are known for our four day work weeks, and the way we can get away with that is by cutting out all the fat from our daily processes. So in this Weekly Wisdom video, I am going to go through ways I can save a second here, a minute here, an hour here. With some of these tips and hacks, as well as some tools that we used to kind of cut the fat and get straight to the point — so we can get the data in and analyze it, and more importantly, get it live on our client's site so we can start ranking them.
So without further ado, let's get into it. One of the things I find that people spend a lot of time on is finding all the URLs that ever existed for their website. Now typically they may crawl the site to find what is on there and maybe look at the XML site map. They may be jumping to Search Console, have a look at that. Maybe jumping into Majestic to see all the pages with links, and that is cool but what if the client has been migrated like 6 times over the last 12 years? Do you have that data? Is it sitting anywhere? Of course, you could go to something like archive.org, and you could search that and start pulling that out, but that is a bit slow as well, so I am going to show you a really fast way to put all this stuff together.
On the topic of archive.org, did you know that there is an endpoint to pull CSVs from it? So what you can actually do is construct this entire URL. We are using my website, typeamedia.net, match type is a domain. You can see here a URL limit; I can actually say 'give me 10,000, 100,000 —you name it — as many URLs as you want or put it in a CSV, and do it from 2007 to 2018 and show me only things that had a 200 status code had a response. That is kind of cool, but I can't really do anything with the information unless it is in a spreadsheet; we all love a little bit of Google Sheets. What we are going to do is we are going to import the data — I need to put an equal sign at the start of that, so it knows that it is actually a formula — and once you do import data, make sure that you wrap it in parenthesis and all of sudden there are all the URLs. So what's next?
I am going to get my sitemaps, if you use Yoast, and I absolutely love Yoast, you will probably get multiple site map URLs. What you want to do is set something up where you can just blast that in a spreadsheet. Now Import XML does that for you, but the problem with Import XML, it wouldn't give me a lovely clean list like this if I go 'Import XML'. What it is going to do, is it is actually going to give me the entire thing with all of the formatting, or it is going to just throw up a big ol' error. So we don't obviously want that, so when I do Import XML, get a little bit of RegEx in here to chop some of that out. Now would be a good time to pause the video and just take a note of what this is; I am not going to explain it, it is a little bit outside of the scope of this video. But ultimately it lets you strip out all of the unwanted stuff from your XML site map.
Next up, Majestic. Now I really love Majestic, and it is mostly because they have got APIs into pretty much everything, so there is an add-on for Google Sheets. Go into the add-on, put your domain name in and we want to see the top pages — both historic and fresh. Hit 'Get data' and then you can see these new tabs appearing because it is pinging the API and it is dumping everything into Sheets. Beautiful.
But those are two separate sheets; I want them together, so what I am going to do is use this formula called Unique. So if we go 'Unique', because we are stacking two different things on top of one another and not just looking for one unique list, we need to turn this into an array. We are going to go 'curly brackets' and I'm just going to take the first three columns — 'semicolon', which we use inside of array inside of Sheets. Go to the next one, it is the same thing, close our curly brackets off like this, and then on we go. Alright, so that has pulled in all of the Majestic data in there which is fantastic.
Next, the fan favorite, it is, of course, S-E-M or should I say SEMrush. So looking at add-ons, I go into super metrics and launching my site bar, and what we are going to do is we are going to drop our domain name in. The report that we want is the "domain organic search keywords" and then we hit 'apply', and that is going to pull everything in for us.
Google Webmaster Tools
Alright, so next up, we want to get Google Webmaster Tools, note that I said 'Webmaster Tools', not 'Search Console' because I have been doing this for more than two seconds. Okay, so how do we get Search Console in? Again, it is our favorite tool; it is going to be super metrics, but we are just going to change the data source to Search Console. Okay, dropping in your website, pulling it in as normal, make sure you put your dates as last year, so it pulls in loads and loads of stuff.
I want to get the search queries with the full URLs, hit 'Apply changes' and in it comes. Alright, and here is all the stuff that we rank for; I'm actually bothered with that and bothered with this landing page data. Look at all of that lovely duplication. So we have got all these different sources and now what we want to do is bring them all together in a nice kind of singular format and remove all the duplication, so the question is how do we do that?
Well, we are going to go back to the wonderful formula, my favorite formula, Unique. We are literally just going to go 'unique' here, open with a normal bracket and then remember because we are about to do an array, which is multiple formulas stacked on one another, we are going to have a curly bracket here, and we are literally going to go to absolutely everything. We need to start with the archive.org; pull that in. We are then going to go into the sitemap; pull that in. We are then going to go to all Majestic; pull that in. Next we are then going to go into SEMrush and pull all that in, and then we are going to go into Webmaster Tools, formerly known as Webmaster Tools now is Search Console, pull that in, and we are going to close that off with a curly bracket and a normal one, hit the 'enter' button and there we go.
So what we have now got as a completely ordered list of every single URL that has ever existed on our website and every single duplicate removed. I think I can probably say with a high degree of certainty that that is all the URLs that have ever existed for my website. I can now do some really cool things with the list. So an example of what I would do with this data, well I could probably go to the frog (Screaming Frog). I would paste in a list, and I probably would want them to crawl it because when they finish, I am going to pull a report and I am going to see all of my redirect and canonical chains. After tons and tons of redirects before lots of site migrations, I can see where all the problems lie.
That is SEO speed hacks, tips, and tricks. Done.