Q-Learning Agent playing Taxi-v3 π
This is a trained Q-Learning agent playing Taxi-v3. This model was trained as part of the Hugging Face Deep RL Course Unit 2.
Training Hyperparameters
n_training_episodes = 25000
learning_rate = 0.7
gamma = 0.95
max_epsilon = 1.0
min_epsilon = 0.05
decay_rate = 0.0005
Evaluation Results
Mean reward: 8.23 +/- 2.49
Usage
import numpy as np
import gymnasium as gym
# Load the Q-table
qtable = np.load("qtable.npy")
# Create environment
env = gym.make('Taxi-v3')
# Run an episode
state, _ = env.reset()
done = False
total_reward = 0
while not done:
action = np.argmax(qtable[state])
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
total_reward += reward
print(f"Total reward: {total_reward}")
Evaluation results
- mean_reward on Taxi-v3self-reported8.23 +/- 2.49