airscholar/RealtimeStreamingEngineering
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
This project helps data engineers build a system to continuously receive and process information, like customer reviews, as it arrives. It takes raw streaming data, analyzes it for things like sentiment using AI, and then makes it instantly searchable and available for monitoring. Data engineers use this to create robust real-time data pipelines.
No commits in the last 6 months.
Use this if you are a data engineer looking for a comprehensive guide to building an end-to-end real-time data streaming and processing pipeline using modern big data technologies.
Not ideal if you are an end-user seeking a ready-to-use application for sentiment analysis without needing to build or manage the underlying data infrastructure.
Stars
43
Forks
31
Language
Python
License
—
Category
Last pushed
Jan 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/airscholar/RealtimeStreamingEngineering"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openai/openai-cookbook
Examples and guides for using the OpenAI API
rgbkrk/dangermode
Execute IPython & Jupyter from the comforts of chat.openai.com
CogStack/OpenGPT
A framework for creating grounded instruction based datasets and training conversational domain...
Declipsonator/GPTZzzs
Large language model detection evasion through grammar and vocabulary modifcation.
antononcube/Python-JupyterChatbook
Python package of a Jupyter extension that facilitates the interaction with LLMs.