catastropiyush/RAG-dataset-gen

Retrieval augmented generation for building datasets from scientific literature: Contains the notebooks used for creating datasets

30
/ 100
Emerging

This project helps materials scientists and researchers extract specific material properties from large volumes of scientific literature, such as research paper abstracts. You provide a collection of scientific abstracts and a specific query for the data you need, and it outputs a structured dataset (like an Excel file) containing the extracted information, such as hydrogen storage capacity, temperature, and pressure for various alloys. This is for scientists or engineers who need to quickly compile structured data from unstructured text.

No commits in the last 6 months.

Use this if you need to systematically extract specific, quantitative data points about materials or their properties from a large body of scientific text, turning unstructured information into a usable dataset for analysis.

Not ideal if you are looking for a general-purpose scientific text summarizer or a tool to analyze broad themes in literature rather than extracting precise, structured parameters.

materials-science literature-review data-extraction hydrogen-storage scientific-research
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

9

Forks

5

Language

Jupyter Notebook

License

Last pushed

Jun 30, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/catastropiyush/RAG-dataset-gen"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.