geyang/deep-auto-punctuation
a pytorch implementation of auto-punctuation learned character by character
This project automatically adds punctuation and capitalization to plain text. It takes raw, unpunctuated sentences as input and outputs text with correctly placed commas, periods, quotes, dollar signs, and proper capitalization. This tool is ideal for anyone working with transcribed speech, old documents, or any text source that lacks standard punctuation and needs to be cleaned up for readability or further analysis.
141 stars. No commits in the last 6 months.
Use this if you need to quickly add standard punctuation and capitalization to large amounts of unformatted or raw text.
Not ideal if you require perfect accuracy for rare punctuation marks like semicolons, question marks, or exclamation points, as its performance on these is limited.
Stars
141
Forks
22
Language
Jupyter Notebook
License
—
Category
Last pushed
Nov 15, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/geyang/deep-auto-punctuation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
lhotse-speech/lhotse
Tools for handling multimodal data in machine learning projects.
google/sequence-layers
A neural network layer API and library for sequence modeling, designed for easy creation of...
awslabs/sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
OpenNMT/OpenNMT-tf
Neural machine translation and sequence learning using TensorFlow