victor-iyi/multi-armed-bandit-with-policy-gradient

A multi armed bandit Reinforcement learning problem using Policy Gradient.

21
/ 100
Experimental

This project explores how a computer program can learn the best sequence of actions to take in a changing environment, similar to choosing the best option from several possibilities in real-time. It takes a description of an environment (like a game or a simulation) with possible states, actions, and rewards, and outputs an optimal strategy or "policy" for navigating that environment. This is for researchers or engineers working on intelligent agents, automated decision-making, or reinforcement learning problems.

No commits in the last 6 months.

Use this if you are a researcher or engineer looking to understand or implement fundamental reinforcement learning algorithms like Policy Gradient for sequential decision-making problems.

Not ideal if you need a high-level library for production-ready reinforcement learning applications or do not have a strong background in machine learning theory.

reinforcement-learning sequential-decision-making artificial-intelligence agent-training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 30, 2017

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/victor-iyi/multi-armed-bandit-with-policy-gradient"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.