How Lychee Works
High-Level Overview
Here is a talk explaining the high-level architecture of lychee and the broader context of link checkers:
Asynchronous Architecture
lychee is fully asynchronous.
It reads inputs and extracts links into a futures::stream::Stream
.
Each link gets filtered by an async pipeline and finally gets sent to a pool of
reqwest HTTP clients, which check all links concurrently.
Extractors
The extractors do all the heavy lifting. They extract all links from a given input file and return them as a stream. We want the extractors to be as fast and memory-efficient as possible.
Currently we support three main extractors:
- Pulldown CMark for Markdown files
- html5gum for HTML
- linkify as a fallback for plaintext files and other unknown formats.