atnlp/torchtext-summary

torchtext使用总结,从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于PyTorch的可迭代数据等步骤。并结合Pytorch实现LSTM.

39
/ 100
Emerging

This project helps natural language processing developers prepare raw text data for machine learning models. It takes unstructured text, processes it by tasks like truncation, padding, and vocabulary creation, and outputs structured, iterable data ready for deep learning frameworks like PyTorch. This is for NLP engineers or researchers building text-based AI applications.

176 stars. No commits in the last 6 months.

Use this if you need a step-by-step guide and code examples to preprocess text data efficiently for deep learning models using the torchtext library.

Not ideal if you are looking for a high-level API for complex, production-ready NLP pipelines without needing to understand the underlying data preparation steps.

natural-language-processing text-preprocessing deep-learning-data-preparation machine-learning-engineering nlp-research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 21 / 25

How are scores calculated?

Stars

176

Forks

41

Language

Jupyter Notebook

License

Last pushed

May 25, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/atnlp/torchtext-summary"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.