In this section, we will delve into the practical aspects of implementing a learning agent in video games. This involves understanding the core concepts of machine learning, setting up the environment, and coding the agent to learn and adapt to the game environment. We will use reinforcement learning as our primary approach.

Key Concepts

Before we start coding, let's review some key concepts:

  1. Reinforcement Learning (RL):

    • Agent: The entity that interacts with the environment.
    • Environment: The world through which the agent moves and interacts.
    • State: A representation of the current situation of the agent within the environment.
    • Action: A set of all possible moves the agent can make.
    • Reward: Feedback from the environment based on the action taken by the agent.
    • Policy: The strategy that the agent employs to determine the next action based on the current state.
    • Value Function: A function that estimates the expected reward of a state or state-action pair.
  2. Q-Learning:

    • A model-free reinforcement learning algorithm.
    • Q-Table: A table where we store the Q-values (expected future rewards) for each state-action pair.
    • Bellman Equation: Used to update the Q-values.

Setting Up the Environment

We will use a simple grid-based environment for our learning agent. The agent will learn to navigate from a starting point to a goal while avoiding obstacles.

Environment Setup

import numpy as np
import random

class GridEnvironment:
    def __init__(self, grid_size, start, goal, obstacles):
        self.grid_size = grid_size
        self.start = start
        self.goal = goal
        self.obstacles = obstacles
        self.state = start

    def reset(self):
        self.state = self.start
        return self.state

    def step(self, action):
        next_state = (self.state[0] + action[0], self.state[1] + action[1])
        if next_state in self.obstacles or not (0 <= next_state[0] < self.grid_size[0] and 0 <= next_state[1] < self.grid_size[1]):
            next_state = self.state  # Invalid move, stay in the same state
        reward = -1
        if next_state == self.goal:
            reward = 100  # Reward for reaching the goal
        self.state = next_state
        return next_state, reward

    def is_done(self):
        return self.state == self.goal

Actions

We define the possible actions the agent can take:

actions = {
    0: (-1, 0),  # Up
    1: (1, 0),   # Down
    2: (0, -1),  # Left
    3: (0, 1)    # Right
}

Implementing Q-Learning

Q-Table Initialization

q_table = np.zeros((grid_size[0], grid_size[1], len(actions)))

Training the Agent

def train_agent(env, q_table, episodes, alpha, gamma, epsilon):
    for episode in range(episodes):
        state = env.reset()
        done = False
        while not done:
            if random.uniform(0, 1) < epsilon:
                action = random.choice(list(actions.keys()))  # Explore
            else:
                action = np.argmax(q_table[state[0], state[1]])  # Exploit

            next_state, reward = env.step(actions[action])
            old_value = q_table[state[0], state[1], action]
            next_max = np.max(q_table[next_state[0], next_state[1]])

            new_value = (1 - alpha) * old_value + alpha * (reward + gamma * next_max)
            q_table[state[0], state[1], action] = new_value

            state = next_state
            done = env.is_done()

Parameters

  • alpha: Learning rate (e.g., 0.1)
  • gamma: Discount factor (e.g., 0.9)
  • epsilon: Exploration rate (e.g., 0.1)

Example Usage

grid_size = (5, 5)
start = (0, 0)
goal = (4, 4)
obstacles = [(1, 1), (2, 2), (3, 3)]

env = GridEnvironment(grid_size, start, goal, obstacles)
episodes = 1000
alpha = 0.1
gamma = 0.9
epsilon = 0.1

train_agent(env, q_table, episodes, alpha, gamma, epsilon)

Practical Exercises

Exercise 1: Modify the Environment

Task: Modify the grid environment to include more obstacles and test the agent's performance.

Solution:

obstacles = [(1, 1), (1, 2), (2, 2), (3, 3), (3, 4)]
env = GridEnvironment(grid_size, start, goal, obstacles)
train_agent(env, q_table, episodes, alpha, gamma, epsilon)

Exercise 2: Adjust Hyperparameters

Task: Experiment with different values of alpha, gamma, and epsilon to observe their impact on the agent's learning.

Solution:

alpha = 0.2  # Increased learning rate
gamma = 0.95  # Increased discount factor
epsilon = 0.05  # Reduced exploration rate
train_agent(env, q_table, episodes, alpha, gamma, epsilon)

Exercise 3: Implement a Different Reward System

Task: Change the reward system to penalize the agent for hitting obstacles.

Solution:

def step(self, action):
    next_state = (self.state[0] + action[0], self.state[1] + action[1])
    if next_state in self.obstacles or not (0 <= next_state[0] < self.grid_size[0] and 0 <= next_state[1] < self.grid_size[1]):
        next_state = self.state  # Invalid move, stay in the same state
        reward = -10  # Penalty for hitting an obstacle
    else:
        reward = -1
    if next_state == self.goal:
        reward = 100  # Reward for reaching the goal
    self.state = next_state
    return next_state, reward

Conclusion

In this section, we have implemented a basic learning agent using Q-learning. We set up a grid-based environment, defined actions, and trained the agent to navigate towards a goal while avoiding obstacles. By experimenting with different parameters and reward systems, you can further enhance the agent's learning capabilities. This foundational knowledge prepares you for more complex machine learning applications in video games.

© Copyright 2024. All rights reserved