emschwartz/html-to-text-comparison

Comparing Rust crates for extracting text from HTML

13
/ 100
Experimental

This tool helps developers evaluate different Rust libraries designed to extract plain text from HTML content. It takes a URL as input, then downloads the webpage and processes its HTML through multiple text extraction libraries. The output includes performance metrics (memory and time usage) and the extracted text from each library, helping you choose the most suitable one for your specific application.

No commits in the last 6 months.

Use this if you are a Rust developer building an application that needs to reliably convert HTML web pages into clean, readable plain text, such as for search indexing or LLM processing.

Not ideal if you are looking for a ready-to-use, end-user application to extract text from a single webpage without programming.

Rust-development web-scraping text-extraction LLM-data-preparation performance-benchmarking
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

11

Forks

Language

Rust

License

Last pushed

Jan 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/emschwartz/html-to-text-comparison"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.