airscholar/RealtimeStreamingEngineering

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

36
/ 100
Emerging

This project helps data engineers build a system to continuously receive and process information, like customer reviews, as it arrives. It takes raw streaming data, analyzes it for things like sentiment using AI, and then makes it instantly searchable and available for monitoring. Data engineers use this to create robust real-time data pipelines.

No commits in the last 6 months.

Use this if you are a data engineer looking for a comprehensive guide to building an end-to-end real-time data streaming and processing pipeline using modern big data technologies.

Not ideal if you are an end-user seeking a ready-to-use application for sentiment analysis without needing to build or manage the underlying data infrastructure.

data-engineering real-time-analytics big-data-processing sentiment-analysis-pipeline streaming-architecture
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

43

Forks

31

Language

Python

License

Last pushed

Jan 04, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/airscholar/RealtimeStreamingEngineering"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.