HazyResearch/fonduer
A knowledge base construction engine for richly formatted data
This tool helps you automatically extract specific pieces of information and relationships from complex documents like hardware datasheets or scientific papers. You feed it your richly formatted documents, and it outputs a structured knowledge base containing the facts and connections you're looking for. It's ideal for researchers, engineers, or data managers who need to systematically organize data from diverse, non-standard document types.
412 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to build a structured database of facts and relationships from a large collection of richly formatted, unstructured documents like tables, lists, and text.
Not ideal if your data is already highly structured or if you only need to extract information from plain text without complex formatting.
Stars
412
Forks
77
Language
Python
License
MIT
Category
Last pushed
Jun 23, 2021
Commits (30d)
0
Dependencies
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/HazyResearch/fonduer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.