HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

/ 100

Established

This tool helps you automatically extract specific pieces of information and relationships from complex documents like hardware datasheets or scientific papers. You feed it your richly formatted documents, and it outputs a structured knowledge base containing the facts and connections you're looking for. It's ideal for researchers, engineers, or data managers who need to systematically organize data from diverse, non-standard document types.

412 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to build a structured database of facts and relationships from a large collection of richly formatted, unstructured documents like tables, lists, and text.

Not ideal if your data is already highly structured or if you only need to extract information from plain text without complex formatting.

data-extraction technical-document-analysis knowledge-management information-retrieval research-data-organization

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 22 / 25

How are scores calculated?

Stars

412

Forks

Language

Python

License

MIT

Related frameworks

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Explore ML Frameworks

All categories Trending ML Framework directory Insights