what technology do search engines use to crawl websites

13 Screaming Frog Features and Tricks You Didn't Know

Technical SEO is certainly one of the most demanding (in terms of technical knowledge) and time-consuming parts of any SEO campaign, especially on large projects where reviewing thousands or millions of URLs becomes almost impossible.

That's why understanding the use of automation is so important to technical SEO professionals: they can focus on analytics instead of data collection.

In today's article, we're going to explore thirteen Screaming Frog features that will save you hundreds of hours and make working with large websites a breeze.

1. Crawl large websites with database memory instead of RAM

By default, Screaming Frog uses your computer's memory to store your crawl data. Although this makes the process really fast, it's not the best option to crawl large websites because your RAM doesn't have the capacity to store so much data.

However, switching to database storage mode is very easy and, if your computer uses a solid state drive (SSD), just as fast. Go to Configuration > system > storage mode and select the database storage mode from the drop-down menu to make the switch. You can also change the path where you want to store the data.

2. Use the Request Data API feature to retrieve retroactive data from Analytics, Search Console, and SEO tools

Linking Google Analytics, Search Console, Moz, Ahrefs, PageSpeed Insights, or any other data source gives you additional insight when performing a site audit – and saves you a lot of time.

However, it is very common to get access to these tools after performing the first crawl, or simply forget to link them to Screaming Frog.

For these cases, Screaming Frog has a "Request API Data" button that allows you to drag data from APIs and then match it to the corresponding URL in the current crawl.

3. Audit redirects during and after site migrations

One of the most time-consuming and delicate tasks when migrating a site or domain is to verify that all URLs have been correctly redirected to the new site. A pure review could mean hundreds of missing pages, 404 issues, broken internal links, and more.

However, doing this manually can be tedious or even impossible with millions of URLs. To automate this process as much as possible, you can use list mode to upload a list of old URLs and track the forwarding chain.

For this to work, tick the box "always follow redirects".

Another option is to upload the list of old URLs after migration and look for a response code other than 301. For example, any URL that returns 404 still needs to be redirected.

4. Create and debug sitemaps in Screaming Frog

Screaming Frog (SF) makes handling sitemaps faster and more efficient by enabling the automation of otherwise time-consuming, manual tasks.

A common technical SEO problem for non-tech savvy businesses is the lack of a sitemap. This XML file helps Google and other search engines discover new pages — even if they're orphaned — and provides additional information such as reflag tags and relationships between pages.

To create a new sitemap, crawl the site you are working on (in spider mode) and then go to Sitemaps > XML Sitemap and highlight the pages you want to include in the XML file. If the pages already have hreflang data, then Screaming Frog will add these links to the XML file.

By default, SF only adds pages that return a 2xx response. However, you can add or exclude pages based on status code, index status, canonical, pagination, PDF files, last modified, and so on.

Although you should check the resulting file again, it is a good place to start. And for small websites, the raw file created by Screaming Frog is sufficient.

But what if there is already an existing sitemap and you need to make sure it is implemented correctly? In such cases, you can switch Screaming Frog to list mode and upload the sitemap.xml URL of our landing page for a crawl.

The same goes if you want to manually edit a sitemap and check it for redirects, broken links, etc.

5. Find defects and redirected internal links

In many cases, cleaning up and optimizing your internal links can be a big quick win for your SEO campaign. However, there is a lot of crawl data that can be distracting.

To help you focus on this task, simply go to Configuration > Spider and disable images, CSS, JavaScript, and any other files that don't contain internal links. This is mainly to save computing power and speed up crawling.

Going a step further, you can even restrict the data extracted from the URLs, but that shouldn't be necessary unless you're working with a large amount of URLs.

Then crawl the site in spider mode and sort the internal tab by status code. Click any URL to go to the Internal tab in the lower window. You can now select a URL and check the links pointing to it and the anchor text on the "InLinks" tab in the lower window.

Finally, export all internal links with problems by going to Bulk Report > response codes and exporting:

Redirection (3xx) Inlinks

Client Error (4xx) Inlinks

Server Error (5xx) Inlinks

You can also export the full list to InLinks and filter out the non-200 response codes later.

6. Exclude or include subdirectories or subdomains to limit your crawl

If you're working with a large website, you'll probably want to crawl the entire site to check for problems. However, this can be very stressful for you and your machine when you see the enormous amounts of data you need to sort.

In these cases, it is better to split the site into smaller parts and crawl only those pages. You can use some RegEx expressions to tell Screaming Frog which subdirectories and subdomains you want to crawl, limiting your crawl to one section at a time.

Let's say you only want to work with blog content that's all in the /blog/ subfolder. Go to Include Configuration > and use the placeholder expression https://example.com/blog/.*so that SF can crawl only the URLs that match the beginning of the URL.

You can do the same with the Exclude function to crawl subdirectories that match your parameters.

7. Check only "clean" HTML pages

If you want to optimize your pages' metadata, create a list of URLs, or perform any other task that requires only the HTML pages, you can instruct Screaming Frog to crawl only those pages and ignore other file types.

Go to Configuration > Spider, uncheck all the boxes in the Resource Links section, and start the crawl as usual.

However, if you have a lot of URLs with parameters, you may want to go a little further and use a RegEx expression to exclude all parameters, or a specific syntax such as:

to exclude all URLs that contain a question mark (the question mark must have a backslashed character for it to work)

You can also exclude a specific parameter, such as a price filter, by adding its syntax. For example, ?price.

For more Exclude ideas, see the Screaming Frog configuration guide.

8. Audit your structured data within Screaming Frog

Screaming Frog can help you quickly find problems with structured data (SD) by comparing your implementation to Google's and Schema.org's policies. Before starting your crawl, go to Configuration > Spider > Extraction > Structured Data and tick the format of your SD or all if you didn't originally implement it.

After crawling your site, the Screaming Frog reports are populated with the pages that contain SD, pages that lack structured data, and pages with errors or warnings.

Finally, under Reports > Structured Data, you can export all validation errors and warnings.

9. Checking for crawlability issues

Search engines need to be able to crawl, index, and rank your pages in search results.

There are two main ways Screaming Frog can help you check how search engines perceive your websites:

On the Configuration menu, change the user agent > user agent and select the bot you want to impersonate. For example, you can change it to the desktop version of Google and collect the data like Google.

10. Using the Link Score algorithm to improve internal linking

Screaming Frog's Link Score is a metric that calculates the value of a page based on the number of internal links pointing to it. This is a helpful metric for finding pages that could make better use of internal linking to improve performance.

To get this data, you need to perform a crawl analysis after the first crawl. Simply go to Crawl Analysis in the menu and click Start.

The link rating can be found on the Internal tab as a new column.

11. Find Page Speed Issues with Screaming Frog

Any page speed optimization starts with determining the pages you need to focus on, and Screaming Frog can help with that. Crawl your website and click on the Response Code tab. The speed data is displayed under the Response Time column.

However, if you want to do an in-depth analysis, you can generate a Page Speed Insights API key (free, with your Google Account) and connect it to Screaming Frog. Once the crawl is complete, you'll have access to a new Page Speed Insights tab where you can see all the page performance data for the crawled URLs.

12. Check that the Google Analytics code is set up correctly throughout the location

Screaming Frog can also make it easier to verify the Google Analytics (GA) implementation by crawling the website and finding all the pages that contain the code snippet.

First, create two filters. One filters for all pages that contain your UA number, the second for those that don't. To speed up the process, you can also exclude all files that are not HTML pages from crawling by unchecking the boxes in the Configuration menu > Spider.

However, if you have your code as a link, you can also add the link to the filters to find the pages that contain it. In this example, we search the /bath-body/ subfolder of Zogics to find the snippet of code.

13. Crawl JavaScript pages to find rendering issues

To end our list with a climax, Screaming Frog can now render JavaScript pages by ticking just one box. Go to Configuration > Spider > Rendering, select JavaScript from the drop-down menu and tick Enable Rendered Page Screenshots.

This feature allows you to make a comparison between plain text and JavaScript crawls. This is important because you want to make sure that all important links and resources can be downloaded from search engines.

That said, Screaming Frog can render JavaScript much better than search engines can, so it won't be a fair representation of what Google can see IF and WHEN they eventually render your JS.

That's why we recommend using a solution like Prerender to ensure that your single-page applications and JavaScript-heavy pages are crawled and indexed correctly. By simply installing our middleware, users and bots get the correct version of your website without any additional effort.

Ad Code

Ticker

what technology do search engines use to crawl websites

what technology do search engines use to crawl websites

13 Screaming Frog Features and Tricks You Didn't Know

1. Crawl large websites with database memory instead of RAM

2. Use the Request Data API feature to retrieve retroactive data from Analytics, Search Console, and SEO tools

3. Audit redirects during and after site migrations

4. Create and debug sitemaps in Screaming Frog

5. Find defects and redirected internal links

6. Exclude or include subdirectories or subdomains to limit your crawl

7. Check only "clean" HTML pages

8. Audit your structured data within Screaming Frog

9. Checking for crawlability issues

10. Using the Link Score algorithm to improve internal linking

11. Find Page Speed Issues with Screaming Frog

12. Check that the Google Analytics code is set up correctly throughout the location

13. Crawl JavaScript pages to find rendering issues

Posted by SEO MASTER KING

You may like these posts

Post a Comment

0 Comments

Social Plugin

Most Popular

Subscribe Us

Facebook

Tags

Categories

Boxed Version

Default Variables

Link List

Link List

Link List

Social Plugin

Footer Social Widget

Report Abuse

About Me

Search This Blog

About Us

Footer Menu Widget

adsens

Facebook

Popular Posts

Labels

Popular Posts

Popular

Footer Menu Widget

Contact form