Technical SEO influences how your site is accessed and how your content is consumed.
Technical SEO is defined as the configurable components of a websites structure that influence how search engines and users are able to access and interpret the content of a site.
For search engines, these technical components can impact how your site is crawled and indexed. It can also affect how your site displays in the Search Engines Results Page (SERP).
For humans, these technical components can impact the user experience on your site as they consume your content.
Some of the Technical SEO components I review are:
- Page Speed
- Robots Meta Directives
- URL Structures
- URL Redirects
- Duplicate Content & Canonical Tags
- Structured Data
- Site Status Codes
When you connect to a non-secure website all of the communications between your computer and the server can be accessed by other parties. When a certificate is added to a site the connection between your computer and the website is encrypted so no one can see the data transmitted between your computer and that site.
It is important that your website be configured for HTTPS for these reasons:
- Google still considers HTTPS as a ranking signal.
- HTTPS is important, it allows you to see full referral data in Google Analytics. GA does not show referral data between HTTP and HTTPS websites. The result is, that traffic of this nature will be shown as direct traffic to your website. This in turn could skew your understanding of how people find their way to your website.
- People are becoming more cautious of engaging on websites that are not HTTPS protected.
If your website is already running HTTPS it is possible that some of your site’s assets are still being accessed via non-HTTPS links. If this is the case, Google may send a negative signal to the Google ranking algorithm.
Research shows that most people expect a website to load in less than 2 seconds, research also shows that for every additional second of page load speed on your site, conversions can drop up to 20%.
The faster your site is, the more likely people will visit more of the pages on your site and the more likely they are to link and share your content.
Also, Google spiders will be able to crawl more pages in the allotted time available for crawling your site.
There are 3 important factors when monitoring page speed:
- Time to First Byte – how long it takes the server to respond to the first request for information
- Total Download Time – how long it takes to fully download all of the page elements
- Full Render Time – how long it takes to download everything needed for the page and render it to visual completion
Note: Google has announced as of July 2018, that mobile page speed will be a ranking factor in their mobile search results. Although, the intent of the search query is still a very strong signal, so a slow page may still rank highly if it has great, relevant content.
Google’s search engine spiders discover content on your website by crawling your pages. Another way they do this is by crawling your sitemap.xml file
The sitemap contains a list of all the URLs on your site. The search engine spiders access the file and follow the links to help create an index of your site for Google.
Submitting your sitemap.xml in the Google Search console will:
- Ensure that search engine spiders are able to find and access the file quickly to better understand the structure of your website
- Allow you to compare the number of Indexed pages with the number of submitted pages listed in your sitemap. (A mismatch between the index pages count and the site map submitted URL count could indicate an issue with your site configuration)
The robots.txt file is stored in the root of your website directory. This file holds information that instructs the search engines about which pages they should crawl.
A robots.txt file is composed of disallow and allow statements that instruct which sections of the website search engines should and shouldn’t crawl. Through the use of user-agents statements, you can provide specific allow and disallow statements to particular search engines.
URLs are made up of the following:
- Protocol (http or https)
- :// (colon and with 2 preceding slashes)
- Subdomain (www is the most common – you will notice I omit the subdomain in favour of a domain name only for my website)
- Domain name (your websites custom name)
- TLD (Top level domain name such as .com, .ca, .edu, etc.)
- / slug (can be a page folder or the address of the specific page)
The URL for this page is – https://darrylrobinsonkeys.com/technical-seo
Search engines look at subdomains as separate entities so authority is calculated independently. This means that https://darrylrobinsonkeys.com/technical-seo and https://darrylrobinsonkeys.com/technical-seo are considered separate entities by Google with individual content and link authority.
- URLs should be keyword focused, unique and as short as possible
- When using multiple words in the slug, hyphens should be used to separate the words
- Don’t use the words “and” or “the”
- Don’t duplicate the keyword in the slug
A redirect is a directive that tells browsers and search engines that a page has been moved to a new address.
A redirect statement consists of the following:
- the source (the original URL)
- Type of redirect
- Destination (new URL)
The main types of redirects are:
- 301 permanent redirect that tell the browser or search engine that the page has permanently moved to a new URL (SEO page authority is passed to the new URL)
- 302 temporary redirect (SEO page authority is NOT passed to the new URL. This type of redirect is used only in temporary situations like site maintenance)
- Analyze all past 301 & 302 redirects and ensure they are still valid
- Ensure there are no redirects that point to other redirects
Duplicate Content & Canonical Tags
Google classifies duplicate content as “appreciably similar” [https://support.google.com/webmasters/answer/66359?hl=en] content in more than one location on the Internet.
Duplicate content can occur in a number of different ways such as:
- Print only content – This is content that has been formatted on a page to allow the specific printing of the content. In essence, a duplicate of the page containing the web viewable version of that content
- www and non-www domain versions of a website – in some cases your website structure may have versions of content indexed under www and non-www linked versions of your website. Google registers these as separate locations on the internet
- HTTP & HTTPS versions of a website – in some cases your website structure may have versions of content indexed under both protocols. Google registers these as separate locations on the internet
- Content scraped from your site and used in other locations across the web
How to fix duplicate content issues:
Canonical tags tell search engines that a particular URL contains the original content of a page.
Each page with original content should be coded with a canonical tag.
- <link rel="canonical" href="http://www.example.com/authoritative-page/" />
Robots Meta Directives
Robots meta directives or robots meta tags, are a way to instruct search engine spiders how to crawl or index specific web page content.
This is in contrast to the robots.txt file that instruct spiders how to crawl a website’s pages.
Robots meta directives provide more exact instructions on how to crawl and index a page’s specific content.
A meta robots tag can be added to the HTML head of each individual page that should be excluded from a search engine’s index.
<meta name="robots" content="noindex" />
The robots meta tag in the above example instructs most search engines not to show the page in search results. The value of the name attribute (robots) specifies that the directive applies to all crawlers. To address a specific crawler, replace the robots value of the name attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user-agents (a crawler uses its user-agent to request a page.) Google’s standard web crawler has the user-agent name Googlebot. To prevent only Googlebot from crawling your page, update the tag as follows:
Note: Google cautions against restricting crawl access to duplicate content on your website. (Search engines like to be able to see everything in case you have made an error in your code. It allows them to make a [likely automated] “judgment call” in otherwise ambiguous situations.)
Duplicate Content Note:
- Most duplicate content is unintentional
- Google rewards unique content, it does not penalize duplicate content
Structured data is an on-page markup that helps the search engines to have a better understanding of the information on your website. Structured Data tells search engines what your contents means. This in turn can gain your pages additional visibility in the SERP with elements such as rich snippets and knowledge boxes.
Schema.org is the most commonly used structured data markup. Schema mark-up can be added to almost any data on a website, schema types for reviews, receipts, ratings, products events, and local businesses are just few of the many schemas available.
The structured data testing tool [get url] allows you to identify the existing schema you have on a page of your website.
Of course, Google can crawl your pages to understand your content, however when your content is added to a schema, Google crawlers are far more easily able to interpret your website information.
There are 2 ways to add structured data on a site:
- Microdata Markup (uses inline annotations to the corresponding html page elements)
Google recommends the JSON-LD method.
An expert SEO specialist will always check for errors and incorrect implementations of the schema markup on each page of a website to make sure that it implemented correctly. Most often this will be performed using Google’s Structured Data Testing Tool.
Each page on a site returns a status code to the browser or spider that requests a page, status codes can influence SEO signals.
How the spider interprets a page status code:
- 200 All Good – informs the spider it can crawl and index the page
- 301 Permanent Redirect – tells the spider the page has moved permanently to a new page the spider will crawl the new page and transfer all link equity from the old page to the new one
- 302 Temporary Redirect – tells the spider page has moved temporarily, the spider will keep the original page indexed and link equity does not get passed to the new page
- 404 Not Found – tells the spider the intended page does not exist, spiders will eventually drop the page from the Google index
- 500 Internal Server Error – nothing is visible to spiders
- 503 Service Unavailable – tells spider to come back later to crawl this site
How a web browser interprets a page status code:
- 200 All Good – browser can display the page
- 301 Permanent Redirect – browser will seamlessly redirect visitors to the new page
- 302 Temporary Redirect – browser will seamlessly redirect visitors to the new page
- 404 Not Found – browser will display a 404 Page Not found message
- 500 Internal Server Error – browser will display a cryptic 505 Server Error message
- 503 Service Unavailable – browser will display a cryptic 503 Service Unavailable message
A good technical audit will include a check for the status codes for every page on a site.
Pages with a code other than 200 will be flagged as pages for further analysis to ensure they are not impacting website ranking opportunities.