Audit Lexicon — BotWatcher.ai

Speed

How fast your server responds and how efficiently content is delivered.

TTFB

speed

Developer

Time To First Byte — milliseconds elapsed between the client sending an HTTP request and receiving the first byte of the response. Measured at the network layer, not in a browser.

In Plain English

How long your site takes to answer the door. A fast TTFB means the server picks up quickly; a slow one means it's still getting dressed when visitors arrive.

Why it matters: Google uses TTFB as a direct input to its Core Web Vitals ranking signal. Every extra 100ms compounds down the page-load chain.

Compression / gzip / br

speed

Developer

HTTP response body encoded with gzip or Brotli before transmission. Signalled by the Content-Encoding header. Reduces payload size by 60–85% for text-based resources.

In Plain English

Packing your suitcase tightly before a flight instead of throwing everything in loose. The content reaches the browser the same size once unpacked, but it traveled much faster.

Why it matters: Uncompressed responses waste bandwidth and slow first paint, especially on mobile. Search crawlers have limited bandwidth budgets per domain.

Cache-Control

speed

Developer

HTTP response header that tells browsers and CDN nodes how long to store a resource locally before re-fetching. Key directives: max-age (seconds), no-store, public, private, stale-while-revalidate.

In Plain English

A timestamp on a carton of milk. It tells the browser: "you already have this — don't ask me again for X seconds." Without it, every page load makes a new round-trip to the server.

Why it matters: Missing Cache-Control means crawlers and users re-download unchanged resources every visit, inflating server cost and slowing page loads for everyone.

ETag

speed

Developer

Entity Tag — a hash or version token the server attaches to a response. On subsequent requests, the client sends the ETag in If-None-Match; if content hasn't changed, the server returns 304 Not Modified with no body.

In Plain English

A fingerprint for a file. Instead of re-sending the whole file every time, the server checks: "is the fingerprint still the same?" If yes, it just says "nothing changed" and saves everyone bandwidth.

Why it matters: Without ETags, browsers and CDNs can't do conditional requests. Every cached resource re-downloads from scratch after expiry.

Redirect Chain

speed

Developer

A sequence of HTTP 3xx responses before the client reaches a final 2xx. Each hop adds a full TCP handshake + SSL negotiation + server response cycle. Measured here by counting Location header hops.

In Plain English

Being transferred from person to person on a phone call before reaching the right department. Each transfer takes time, and the more you have, the longer the caller waits.

Why it matters: Every hop in a redirect chain is a real-world latency cost. Google recommends zero redirect chains; each one is a small crawl budget penalty.

Security

HTTP response headers that protect users and signal to browsers how to handle your content.

HSTS

security

Developer

HTTP Strict Transport Security — Strict-Transport-Security header tells browsers to always use HTTPS for this domain for max-age seconds. includeSubDomains and preload extend coverage. Valid threshold: max-age ≥ 31536000 (1 year).

In Plain English

A standing order to every browser: "never visit this address over plain HTTP, no matter what." Once set, even if a user types http:// the browser upgrades automatically before sending anything.

Why it matters: Without HSTS, someone on a coffee shop Wi-Fi can intercept the first HTTP request before the server redirects to HTTPS. HSTS closes that window.

Content-Security-Policy

security

Developer

CSP header defines an allowlist of sources from which the browser may load scripts, styles, images, fonts, and frames. Violations are blocked outright (enforce) or logged (report-only). Key bypass risk: unsafe-inline and unsafe-eval directives negate most XSS protection.

In Plain English

A strict guest list for your page. Only scripts and resources from approved addresses are let in. Anyone else — including malicious injected code — gets turned away at the door.

Why it matters: XSS (cross-site scripting) is consistently in the OWASP Top 10. A strong CSP is the most effective single mitigation. A CSP with unsafe-inline provides almost no protection.

X-Content-Type-Options

security

Developer

nosniff instructs browsers not to MIME-sniff the Content-Type header. Without it, a browser may execute a .jpg file as JavaScript if it contains valid JS. A necessary companion to correct Content-Type headers.

In Plain English

Forcing a file to behave as what its label says, not what's actually inside it. Without this, a browser might open a picture file and accidentally run it as a program.

Why it matters: MIME sniffing is a browser-level execution bypass. One missing tag can turn an uploaded image into an attack vector.

X-Frame-Options

security

Developer

Prevents the page from being embedded in an <iframe> on a different origin. SAMEORIGIN allows same-domain frames; DENY blocks all. Modern equivalent: CSP frame-ancestors directive, which this header supplements for older browsers.

In Plain English

A rule that says "you can't put my page inside your page." Stops attackers from overlaying an invisible version of your site on top of their own to trick users into clicking things they can't see.

Why it matters: Clickjacking is cheap to execute and completely invisible to the target user. This header costs nothing to add and eliminates the whole class of attack.

Referrer-Policy

security

Developer

Controls how much URL information is exposed in the Referer header when navigating away from your page. strict-origin-when-cross-origin sends only the origin (no path) for cross-origin requests and full URL for same-origin.

In Plain English

Deciding how much of your home address to share when you leave a building. You might say "I came from New York" without giving your street number to every shop you walk into.

Why it matters: Without a Referrer-Policy, your full page URLs — including query strings with user data — leak to every third-party resource loaded on your page.

Permissions-Policy

security

Developer

Formerly Feature-Policy. Controls browser feature access (camera, microphone, geolocation, payment, etc.) for the page and its iframes. Syntax: Permissions-Policy: camera=(), microphone=(self), geolocation=(self "https://maps.example.com").

In Plain English

An app permission screen for your webpage. It declares upfront which device features the page is allowed to use, and blocks everything not on the list — even if a third-party script tries to access it.

Why it matters: Third-party scripts commonly attempt feature access beyond what your site needs. Without this header you have no control over what embedded code can request.

COOP (Cross-Origin-Opener-Policy)

security

Developer

Isolates the browsing context so cross-origin windows opened by your page cannot reference your window object and vice versa. Required for SharedArrayBuffer access and mitigates Spectre-class timing attacks.

In Plain English

Making sure that when a user opens a link from your page, the new tab can't peek back at yours and read sensitive data. Two separate rooms that can't talk through the walls.

Why it matters: JavaScript timing attacks can read memory across browser contexts. COOP + COEP together enable the hardware-isolated context needed to prevent this class of speculative execution vulnerability.

SEO

Signals that determine whether search engines can find, understand, and index your content correctly.

robots.txt

seo

Developer

A plain-text file at the root of a domain following the Robots Exclusion Protocol. User-agent directives scope rules to specific crawlers. Disallow blocks paths, Allow creates exceptions, Crawl-delay suggests pace, and Sitemap: points to the XML sitemap. robots.txt is advisory — a crawler can choose to ignore it.

In Plain English

A public sign on your front door telling delivery drivers (crawlers) what rooms they can and can't enter. Well-behaved delivery companies follow the sign; bad ones don't.

Why it matters: An absent or misconfigured robots.txt forces crawlers to guess, wasting crawl budget on unwanted pages and potentially preventing important pages from being discovered.

sitemap.xml

seo

Developer

An XML document that lists canonical URLs with optional metadata (lastmod, changefreq, priority). Submitted to Google Search Console and referenced in robots.txt. Helps crawlers find pages that aren't well-linked internally.

In Plain English

A table of contents you hand to search engines. Instead of making Google explore every corridor to find your pages, you hand it a complete map at the start.

Why it matters: Sites without sitemaps depend entirely on link discovery. Newly published or lightly linked pages may take weeks to be indexed. A sitemap shortens that to hours.

Canonical URL

seo

Developer

In Plain English

Telling Google "this is the real one" when your content might be accessible at multiple addresses. Like officially registering one home address even though you can also be reached through your PO box.

Why it matters: Without canonicals, Google may split the authority of a page across two URL variants, halving each one's ranking power rather than combining them.

Meta Description

seo

Developer

In Plain English

The 2-sentence pitch under your page title in Google results. You don't write it for the algorithm — you write it for the human deciding whether to click.

Why it matters: A good meta description can double click-through rates from the same ranking position. CTR is itself a secondary ranking signal, making this effectively recursive.

llms.txt

seo

Developer

An emerging convention (analogous to robots.txt for LLMs) where a machine-readable file at /llms.txt declares how AI language model training crawlers should treat the domain — including allowlists, contact info, and licensing terms for AI data use.

In Plain English

A note specifically for AI training bots that says how they may or may not use your content. Since ChatGPT, Claude, and others are trained on web data, this file is how you communicate your terms to them directly.

Why it matters: As AI-generated answers increasingly bypass traditional search results, llms.txt is becoming the mechanism for content owners to control AI data use. Early adopters signal seriousness to the AI ecosystem.

X-Robots-Tag (HTTP)

seo

Developer

HTTP response header equivalent of the <meta name="robots"> tag. Controls indexability (noindex), link following (nofollow), snippet length, and image preview size at the server level, before HTML is parsed. Takes precedence over HTML meta tags.

In Plain English

A backstage directive that overrides your visible instructions. If your HTML says "index this page" but your server's X-Robots-Tag says "don't" — the server wins, and your page disappears from Google quietly.

Why it matters: This is the most common cause of mysterious indexation failures. A misconfigured server header can silently de-index pages that look fine in source code.

Open Graph Tags

seo

Developer

Meta tags in the <head> with property="og:*" from the Open Graph Protocol (ogp.me). og:title, og:description, og:image, og:url define how pages render when shared on Facebook, LinkedIn, iMessage, and most chat platforms.

In Plain English

The rich preview card that appears when you paste a link into Slack or iMessage. Without OG tags, the preview is blank or wrong. With them, you control exactly what people see — your image, your headline, your description.

Why it matters: A missing og:image means shared links show a blank preview. In contexts where every message competes for attention, a blank card versus a rich card is the difference between a click and a scroll-past.

Bot Access

How your server treats automated crawlers — and whether it treats different ones the same way.

Cloaking

bots

Developer

Serving categorically different content or HTTP responses to crawlers vs. human browsers, based on User-Agent or IP detection. A Google webmaster guideline violation. Distinguished from personalisation (same structure, different data) or AB testing (declared in robots.txt).

In Plain English

Showing Google's robot a pristine storefront while showing real customers something completely different. Google considers this deceptive and can penalize or de-index the site entirely.

Why it matters: Cloaking is why this tool probes your site with 20 different user agents and compares the results. The divergence between what a Chrome browser gets vs. what Googlebot gets is the signal.

User-Agent String

bots

Developer

An HTTP request header that identifies the client software making the request. Format: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html). Crawlers publish their official UA strings; servers can use these to make access decisions.

In Plain English

The name tag a visitor wears when knocking on your server's door. Googlebot introduces itself as "Googlebot." Your server can welcome, block, or serve different content based on that name tag — which is both the power and the problem.

Why it matters: Most access control rules for crawlers are written against User-Agent strings. This tool probes with the real official UAs of every major bot to test those rules accurately.

HTTP HEAD Request

bots

Developer

An HTTP method identical to GET except the server returns only headers — no body. Used here for bot simulation probes because it's faster, cheaper on both sides, and returns enough signal (status code, headers, TTFB) to evaluate access control without downloading full pages.

In Plain English

Knocking on a door and asking "is anyone home?" without actually going inside. You get enough information to know whether you'd be let in, without the overhead of a full visit.

Why it matters: Running GET requests for 20 bots would download 20 full pages per audit. HEAD requests give us the same access signals in a fraction of the time and bandwidth.

robots.txt Compliance

bots

Developer

Whether a given crawler's User-Agent is covered by a Disallow or Allow rule in robots.txt. Compliance is voluntary — the protocol has no enforcement mechanism. This tool maps each probed bot to its robots.txt directive and checks whether the actual HTTP response matches the declared policy.

In Plain English

Checking whether the sign on your door matches how the doorman is actually behaving. If robots.txt says "Googlebot welcome" but the server returns 403 to Googlebot — something outside your declared policy is doing the blocking.

Why it matters: Policy-response mismatches are often invisible without this kind of cross-check. A WAF rule added months ago may be blocking crawlers your robots.txt still says are welcome.

Crawl Budget

bots

Developer

The number of URLs Googlebot will crawl on your domain within a given time window, determined by site size, crawl rate limit settings, server response times, and historical crawl capacity. Wasted on redirect chains, duplicate URLs, blocked resources, and server errors.

In Plain English

Google's patience budget for your site. If it's burning through it on 404 pages, redirect chains, and duplicate content, it has less left for your actual important pages. Large sites can have entire sections never crawled because the budget ran out.

Why it matters: For sites with more than a few hundred pages, crawl budget is a real constraint. Every optimization in this audit — faster TTFB, no redirect chains, clean robots.txt — directly increases how efficiently Google indexes your site.

HTTP Status Code

bots

Developer

A 3-digit numeric response code in the HTTP/1.1 specification. Semantic ranges: 2xx (success), 3xx (redirect), 4xx (client error — resource missing or forbidden), 5xx (server error). For crawlers, 404 = not found (remove from index), 410 = permanently gone (faster removal), 503 = temporarily unavailable (retry later).

In Plain English

A standardized answer code your server gives at the start of every response. 200 means "here it is." 404 means "doesn't exist." 403 means "you're not allowed in." 500 means "I'm broken." Bots and browsers both speak this language.

Why it matters: Getting the right status code for the right situation is how you tell Google immediately whether a page should stay in the index. Wrong codes leave ghost pages indexed for months.

Result Integrity

How this engine cross-checks its own results and quantifies how much to trust each finding.

Confidence Score

integrity

Developer

Per-category metric (0–100) representing signal quality after weighting data completeness, probe success rate, and cross-engine contradiction severity. Calculated by starting at 100 and deducting for: data failures, low bot probe success rates, and 20/12/6 point penalties for HIGH/MEDIUM/LOW contradictions in the relevant engine.

In Plain English

A self-assessment of how sure the engine is about each score it gave. 95% confidence on security means it got clean data and everything agreed. 60% means there were gaps or conflicts and you should dig deeper before trusting the number completely.

Why it matters: Without confidence scoring, a 73/100 security score from a failed main fetch looks identical to a 73/100 from full data. This metric makes that difference visible.

Cross-Engine Contradiction

integrity

Developer

A finding where two independent engine results produce logically incompatible conclusions about the same observable fact. Example: the security engine observes HTTPS=false but the speed engine observes HTTPS=true for the same response. Severity tiers: HIGH (actively misleading), MEDIUM (notable inconsistency), LOW (informational divergence).

In Plain English

When two parts of the audit disagree about the same basic fact. Like two doctors examining the same patient and reaching opposite conclusions — the disagreement itself is important diagnostic information, regardless of which one is right.

Why it matters: Surface-level scores hide contradictions. A site can score 78/100 on security with a hidden contradiction that means the real state is either 95 or 30. The contradiction panel tells you which findings to audit manually.

Cloaking Detection Algorithm

integrity

Developer

A population-level statistical comparison of 20 bot HTTP responses using the Chrome browser result as a baseline. Flags divergence in: status code family (2xx vs 4xx/5xx), final URL after redirects, key response header set, and response body content fingerprint. Requires ≥2 probes to succeed.

In Plain English

Running a class where every student takes the same test. If everyone gets broadly the same answers, the room is honest. If one group consistently gets different answers than another group, something is directing different content to different people.

Why it matters: A cloaking detection algorithm that only checks one bot at a time misses systematic patterns. Comparing 20 probes simultaneously reveals both individual bot blocks and class-level differentiation.

Data Quality

integrity

Developer

Assessment of whether each engine received a complete, usable response. Dimensions tracked: main fetch success/failure, seo analysis completion, security header presence, bot probe success rate (succeeded/total). Low data quality degrades confidence scores in affected categories.

In Plain English

How complete was the picture? If 15 of 20 bots timed out, the bot analysis is based on partial evidence. Data quality flags tell you whether you should re-run the audit from a different network or at a different time.

Why it matters: Edge network latency, rate limiting, and temporary service disruptions can all cause partial data. Without data quality metrics, a partial result and a complete result look the same to the score reader.