victor-iyi/multi-armed-bandit-with-policy-gradient
A multi armed bandit Reinforcement learning problem using Policy Gradient.
This project explores how a computer program can learn the best sequence of actions to take in a changing environment, similar to choosing the best option from several possibilities in real-time. It takes a description of an environment (like a game or a simulation) with possible states, actions, and rewards, and outputs an optimal strategy or "policy" for navigating that environment. This is for researchers or engineers working on intelligent agents, automated decision-making, or reinforcement learning problems.
No commits in the last 6 months.
Use this if you are a researcher or engineer looking to understand or implement fundamental reinforcement learning algorithms like Policy Gradient for sequential decision-making problems.
Not ideal if you need a high-level library for production-ready reinforcement learning applications or do not have a strong background in machine learning theory.
Stars
9
Forks
—
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 30, 2017
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/victor-iyi/multi-armed-bandit-with-policy-gradient"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
WilliamLwj/PyXAB
PyXAB - A Python Library for X-Armed Bandit and Online Blackbox Optimization Algorithms
jekyllstein/Reinforcement-Learning-Sutton-Barto-Exercise-Solutions
Chapter notes and exercise solutions for Reinforcement Learning: An Introduction by Sutton and Barto
cfoh/Multi-Armed-Bandit-Example
Learning Multi-Armed Bandits by Examples. Currently covering MAB, UCB, Boltzmann Exploration,...
matteocasolari/reinforcement-learning-an-introduction-solutions
Implementations for solutions to programming exercises of Reinforcement Learning: An...
BY571/Upside-Down-Reinforcement-Learning
Upside-Down Reinforcement Learning (⅂ꓤ) implementation in PyTorch. Based on the paper published...