Commit 4079dd15 authored by Matthew Hausknecht's avatar Matthew Hausknecht

Updated manual to add high level state and action spaces.

parent ab59293b
This source diff could not be displayed because it is too large. You can view the blob instead.
No preview for this file type
\documentclass[12pt]{article} \documentclass[12pt]{article}
\usepackage{hyperref} \usepackage{hyperref,graphicx}
\usepackage{fullpage} \usepackage{fullpage}
\title{Half Field Offense \\ Technical Manual} \title{Half Field Offense \\ Technical Manual}
...@@ -15,8 +15,55 @@ ...@@ -15,8 +15,55 @@
This document describes the state and action spaces of the HFO domain. This document describes the state and action spaces of the HFO domain.
\section{State Space} \section{State Spaces}
The HFO domains provides a choice between a low-level feature set and
a higher-level feature set. Selecting between the different feature
sets is accomplished when connecting to the agent server. See
\verb|examples/hfo_example_agent.cpp| and
\verb|examples/hfo_example_agent.py| for examples.
\subsection{High Level Feature Set}
A set of high-level features is provided following the example given
by Barrett et al. pp. 159-160 \cite{THESIS14-Barrett}. Barrett writes
``There are many ways to represent the state of a game of half field
offense. Ideally, we want a compact representation that allows the
agent to learn quickly by generalizing its knowledge about a state to
similar states without over-constraining the policy.'' The following
features are used:
\subsubsection{High Level State Feature List}
\begin{itemize}
\item{\textbf{X position} - The agent’s x position on the field.}
\item{\textbf{Y position} - The agent’s y position on the field.}
\item{\textbf{Orientation} - The direction that the agent is facing.}
\item{\textbf{Goal opening angle} - The size of the largest open angle
of the agent to the goal, shown as $\theta_g$ in Figure
\ref{fig:openAngle}.}
\item{\textbf{Teammate i's goal opening angle} - For each teammate i:
the i’s goal opening angle}
\item{\textbf{Distance to Opponent} - If an opponent is present,
distance to the closest opponent. This feature is absent if there
are no opponents.}
\item{\textbf{Distance from teammate i to opponent} - For each teammate
i: the distance from the teammate to the closest opponent. This
feature is absent if there are no opponents.}
\item{\textbf{Pass opening angle i} - For each teammate i: the open
angle available to pass to teammate i. Shown as $\theta_p$ in Figure
\ref{fig:openAngle}.}
\end{itemize}
\begin{figure}[htp]
\centering
\includegraphics[width=.75\textwidth]{figures/openAngle}
\caption{Open angle from ball to the goal $\theta_g$ avoiding the
blue goalie and the open angle from the ball to the yellow
teammate $\theta_p$. Figure reproduced with permission from Samuel
Barrett.}
\label{fig:openAngle}
\end{figure}
\subsection {Low Level Feature Set}
The state features used by HFO are designed with the mindset of The state features used by HFO are designed with the mindset of
providing an overcomplete, basic, egocentric viewpoint. The features providing an overcomplete, basic, egocentric viewpoint. The features
are basic in the sense that they provide distances and angles to are basic in the sense that they provide distances and angles to
...@@ -27,12 +74,12 @@ goalkeeper. ...@@ -27,12 +74,12 @@ goalkeeper.
All features are encoded as floating point values normalized to the All features are encoded as floating point values normalized to the
range of [-1,1]. Different types of features are discussed next. range of [-1,1]. Different types of features are discussed next.
\subsection{Boolean Features} \subsubsection{Boolean Features}
Boolean features assume either the minimum feature value of -1 or the Boolean features assume either the minimum feature value of -1 or the
maximum feature value of 1. maximum feature value of 1.
\subsection{Valid Features} \subsubsection{Valid Features}
Since feature information is attained from the Agent's world-model, it Since feature information is attained from the Agent's world-model, it
is possible that, the world model's information may be stale or is possible that, the world model's information may be stale or
...@@ -47,7 +94,7 @@ zero if an inconsistency is detected. For example, if the world model ...@@ -47,7 +94,7 @@ zero if an inconsistency is detected. For example, if the world model
detects that the agent's velocity is invalid, the feature that encodes detects that the agent's velocity is invalid, the feature that encodes
the magnitude of self velocity will be set to zero. the magnitude of self velocity will be set to zero.
\subsection{Angular Features} \subsubsection{Angular Features}
\textit{Angular features} (e.g. the angle to the ball), are encoded as two \textit{Angular features} (e.g. the angle to the ball), are encoded as two
floating point numbers -- the $sin(\theta)$ and $cos(\theta)$ where floating point numbers -- the $sin(\theta)$ and $cos(\theta)$ where
...@@ -59,7 +106,7 @@ discontinuity that when normalized, could cause the feature value to ...@@ -59,7 +106,7 @@ discontinuity that when normalized, could cause the feature value to
flip between the maximum and minimum value in response to small flip between the maximum and minimum value in response to small
changes in $\theta$. changes in $\theta$.
\subsection{Distance Features} \subsubsection{Distance Features}
\textit{Distance features} encode the distance to objects of \textit{Distance features} encode the distance to objects of
interest. Unless otherwise indicated, they are normalized against the interest. Unless otherwise indicated, they are normalized against the
...@@ -68,7 +115,7 @@ maximum possible distance in the HFO playfield (defined as $\sqrt{l^2 ...@@ -68,7 +115,7 @@ maximum possible distance in the HFO playfield (defined as $\sqrt{l^2
playfield). A distance of zero will be encoded with the minimum playfield). A distance of zero will be encoded with the minimum
feature value of -1 while a maximum distance will be encoded with 1. feature value of -1 while a maximum distance will be encoded with 1.
\subsection{Landmark Features} \subsubsection{Landmark Features}
Landmark features encode the relative angle and distance to a landmark Landmark features encode the relative angle and distance to a landmark
of interest. Each landmark feature consists of three floating point of interest. Each landmark feature consists of three floating point
...@@ -76,7 +123,7 @@ values, two to encode the angle to the landmark and one to encode the ...@@ -76,7 +123,7 @@ values, two to encode the angle to the landmark and one to encode the
distance. Note that if the agent's self position is invalid, then the distance. Note that if the agent's self position is invalid, then the
landmark feature values are zeroed. landmark feature values are zeroed.
\subsection{Player Features} \subsubsection{Player Features}
Player features are used to encode the relationship of the agent to Player features are used to encode the relationship of the agent to
another player or opponent. Each player feature is encoded as 1) a another player or opponent. Each player feature is encoded as 1) a
...@@ -85,13 +132,13 @@ player's body 3) the magnitude of the player's velocity and 4) the ...@@ -85,13 +132,13 @@ player's body 3) the magnitude of the player's velocity and 4) the
global angle of the player's velocity. Eight floating point numbers global angle of the player's velocity. Eight floating point numbers
are used to encode each player feature. are used to encode each player feature.
\subsection{Other Features} \subsubsection{Other Features}
Some features, such as the agent's stamina, do not fall into any of Some features, such as the agent's stamina, do not fall into any of
the above categories. These features are referred to as \textit{other the above categories. These features are referred to as \textit{other
features}. features}.
\section{State Feature List} \subsubsection{Low Level State Feature List}
Basic Features are always present and independent of the number of Basic Features are always present and independent of the number of
teammates or opponents. The 32 basic features are encoded using 58 teammates or opponents. The 32 basic features are encoded using 58
...@@ -156,12 +203,16 @@ number of features is $58 + 8*\textrm{num\_teammates} + ...@@ -156,12 +203,16 @@ number of features is $58 + 8*\textrm{num\_teammates} +
\end{itemize} \end{itemize}
\section{Action Space} \section{Action Space}
The HFO domain provides support for both low-level primitive actions
The action space of the HFO domain is primitive: basic parameterized and high-level strategic actions. Basic, parameterized actions are
actions are provided for locomotion and kicking. Control of the provided for locomotion and kicking. Additionally high-level strategic
agent's head and gaze is not provided. The primitive actions are as actions are available for moving, shooting, passing and
follows: dribbling. Control of the agent's head and gaze is not provided and
follows Agent2D's default strategy. Selection between high-level and
low-level actions spaces is performed when connecting to the agent
server.
\subsection{Low Level Actions}
\begin{itemize} \begin{itemize}
\item{\textbf{Dash}(power, degrees): Moves the agent with power [-100, \item{\textbf{Dash}(power, degrees): Moves the agent with power [-100,
100] where negative values move backwards. The relative direction 100] where negative values move backwards. The relative direction
...@@ -180,4 +231,23 @@ follows: ...@@ -180,4 +231,23 @@ follows:
terminate the HFO environment.} terminate the HFO environment.}
\end{itemize} \end{itemize}
\subsection{High Level Actions}
\begin{itemize}
\item{\textbf{Move}(): Re-positions the agent according to the
strategy given by Agent2D. The \textit{move} command works only when
agent does not have the ball. If the agent has the ball, another
command such as \textit{dribble}, \textit{shoot}, or \textit{pass}
should be used.}
\item{\textbf{Shoot}(): Executes the best available shot. This command
only works when the agent has the ball.}
\item{\textbf{Pass}(): Finds the best teammate to pass to and the type
of pass to use. This command only works when the agent has the
ball.}
\item{\textbf{Dribble}(): Advances the ball towards the goal using a
combination of short kicks and moves.}
\end{itemize}
\bibliographystyle{abbrv}
\bibliography{manual}
\end{document} \end{document}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment