The FIFO interface is text-based and allows the possibility of run-length encoding the screen. This section documents the actual protocol used; sample code implementing this protocol in Java is also included in this release.
After preliminary handshaking, the FIFO interface enters a loop in which ALE sends information about the current time step and the agent responds with both players' actions (in general agents will only control the first player). The loop is exited when one of a number of termination conditions occurs.
\subsection{Handshaking}
ALE first sends the width and height of its screen matrix as a hyphen-separated string:
\begin{verbatim}
www-hhh\n
\end{verbatim}
where \verb+www+ and \verb+hhh+ are both integers.
The agent then responds with a comma-separated string:
\begin{verbatim}
s,r,k,R\n
\end{verbatim}
where \verb+s+, \verb+r+, \verb+R+ are 1 or 0 to indicate that ALE should or should not send, at every time step, screen, RAM and episode-related information (see below for details). The third argument, \verb+k+, is deprecated and currently ignored.
\subsection{Main Loop -- ALE}
After handshaking, ALE will then loop until one of the termination conditions occurs; these conditions are described below in Section \ref{subsec:termination_conditions}. If terminating, ALE sends
\begin{verbatim}
DIE\n
\end{verbatim}
Otherwise, ALE sends
\begin{verbatim}
<RAM_string><screen_string><episode_string>\n
\end{verbatim}
Where each of the three strings is either the empty string (if the agent did not request this
particular piece of information), or the relevant data terminated by a colon.
\subsubsection{\texttt{RAM\_string}}
The RAM string is 128 2-digit hexadecimal numbers, with the $i^{th}$ pair denoting the
$i^{th}$ byte of RAM; in total this string is 256 characters long, not including the terminating
`:'.
\subsubsection{\texttt{screen\_string}}
In ``full'' mode, the screen string is \texttt{www}$\times$\texttt{hhh} 2-digit hexadecimal numbers, each representing a pixel. Pixels are sent row by row, with \texttt{www} characters for each row. In total this string is 2 $\times$\texttt{www}$\times$\texttt{hhh} characters long.
In run-length encoding mode, the screen string consists of a variable number of (colour,length) pairs denoting a run-length encoding of the screen, also row by row. Both colour and length are described using 2-digit hexadecimal numbers. Each pair indicates that the next `length' pixels take on the given colour; run length is thus limited to 255. Runs may wrap around onto the next row. The encoding terminates when the last pixel (i.e. the bottom-right pixel) is encoded. The length of this string is 4 characters per (colour,length) pair, and varies depending on the screen.
In either case, the screen string is terminated by a colon.
\subsubsection{\texttt{episode\_string}}
The episode string contains two comma-separated integers indicating episode termination (1 for
termination, 0 otherwise) and the most recent reward. It is also colon-terminated.
\subsubsection{Example}
Assuming that the agent requested screen, RAM and episode-related information, a string sent by ALE might look like:
After receiving a string from ALE, the agent should now send the actions of player A and player B.
These are sent as a pair of comma-separated integers on a single line, e.g.:
\begin{verbatim}
2,18\n
\end{verbatim}
where the first integer is player A's action (here, \textsc{fire}) and the second integer, player B's action (here, \textsc{noop}). Emulator control (reset, save/load state) is also handled by sending a special action value as player A's action. See Section \ref{sec:available_actions} for the list of available actions.