Skip to content

This collection showcases various projects focused on Deep Reinforcement Learning techniques.

Notifications You must be signed in to change notification settings

Andres-Ventura/Deep-Learner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning Algorithms

This collection showcases various projects focused on Deep Reinforcement Learning techniques. The projects are organized in a matrix structure: [environment x algorithm], where environment represents the challenge to be tackled, and algorithm denotes the method employed to solve it. In certain instances, multiple algorithms are applied to the same environment. Each project is presented as a Jupyter notebook, complete with a comprehensive training log.

The collection encompasses the following environments:

AntBulletEnv, BipedalWalker, BipedalWalkerHardcore, CarRacing, CartPole, Crawler, HalfCheetahBulletEnv,
HopperBulletEnv, LunarLander, LunarLanderContinuous, Markov Decision 6x6, Minitaur, Minitaur with Duck,
MountainCar, MountainCarContinuous, Pong, Navigation, Reacher, Snake, Tennis, Waker2DBulletEnv.

Four environments (Navigation, Crawler, Reacher, Tennis) are solved in the framework of the
Udacity Deep Reinforcement Learning Nanodegree Program.

  • Monte-Carlo Methods
    In Monte Carlo (MC) methods, we play through episodes of the game until completion, collect the rewards along the way, and then trace back to the start of the episode. This process is repeated multiple times, and the average value of each state is calculated.

  • Temporal Difference Methods and Q-learning

  • Reinforcement Learning in Continuous Space (Deep Q-Network)

  • Function Approximation and Neural Network
    The Universal Approximation Theorem (UAT) states The Universal Approximation Theorem (UAT) states that feed-forward neural networks with a single hidden layer and a finite number of nodes can approximate any continuous function, given certain mild assumptions about the activation function are met.

  • Policy-Based Methods, Hill-Climbing, Simulating Annealing
    Random-restart hill-climbing is often surprisingly effective. Simulated annealing is a useful probabilistic technique because it avoids mistaking local extrema for global extrema.

  • Policy-Gradient Methods, REINFORCE, PPO
    Define a performance measure J(\theta) to maximaze. Learn policy paramter \theta throgh approximate gradient ascent.

  • Actor-Critic Methods, A3C, A2C, DDPG, TD3, SAC
    The key difference from A2C is the asynchronous aspect. A3C involves multiple independent agents (networks) with their own weights, interacting with different copies of the environment in parallel, thus exploring a larger part of the state-action space more quickly.

  • Forward-Looking Actor or FORK
    Model-based reinforcement learning leverages the model in a sophisticated manner, often utilizing deterministic or stochastic optimal control theory to optimize the policy based on the model. FORK uses the system network as a black box to predict future states, without using it as a mathematical model for optimizing control actions. This distinction allows any model-free Actor-Critic algorithm with FORK to remain model-free.

Deep-Learner

About

This collection showcases various projects focused on Deep Reinforcement Learning techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published