How Lychee Works

High-Level Overview

Here is a talk explaining the high-level architecture of lychee and the broader context of link checkers:

Asynchronous Architecture

lychee is fully asynchronous.

It reads inputs and extracts links into a futures::stream::Stream. Each link gets filtered by an async pipeline and finally gets sent to a pool of reqwest HTTP clients, which check all links concurrently.

Extractors

The extractors do all the heavy lifting. They extract all links from a given input file and return them as a stream. We want the extractors to be as fast and memory-efficient as possible.

Currently we support three main extractors:

Pulldown CMark for Markdown files
html5gum for HTML
linkify as a fallback for plaintext files and other unknown formats.