\documentclass[12pt,reqno]{article}
\usepackage[usenames]{color}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{amscd}
\usepackage{graphicx}
\usepackage{algorithm2e}
\usepackage[colorlinks=true,
linkcolor=webgreen,
filecolor=webbrown,
citecolor=webgreen]{hyperref}
\definecolor{webgreen}{rgb}{0,.5,0}
\definecolor{webbrown}{rgb}{.6,0,0}
\usepackage{color}
\usepackage{fullpage}
\usepackage{float}
\usepackage{psfig}
\usepackage{graphics}
\usepackage{latexsym}
\usepackage{epsf}
\usepackage{breakurl}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{.1in}
\setlength{\evensidemargin}{.1in}
\setlength{\topmargin}{-.1in}
\setlength{\textheight}{8.4in}
\newcommand{\seqnum}[1]{\href{https://oeis.org/#1}{\underline{#1}}}
\SetKwComment{tcc}{\{}{\}}
\DeclareMathOperator{\lshift}{LEFT-SHIFT}
\DeclareMathOperator{\LS}{LS}
\DeclareMathOperator{\DLS}{DLS}
\DeclareMathOperator{\Rows}{Rows}
\DeclareMathOperator{\Columns}{Columns}
\DeclareMathOperator{\MD}{MD}
\DeclareMathOperator{\AD}{AD}
\DeclareMathOperator{\CR}{CR}
\DeclareMathOperator{\LA}{LA}
\DeclareMathOperator{\SquaresCnt}{SquaresCnt}
\DeclareMathOperator{\Condition}{Condition}
\DeclareMathOperator{\AllN}{AllN}
\begin{document}
\begin{center}
\epsfxsize=4in
\leavevmode\epsffile{logo129.eps}
\end{center}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{conjecture}[theorem]{Conjecture}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
\begin{center}
\vskip 1cm{\LARGE\bf
Enumerating Diagonal Latin Squares \\
\vskip .1in
of Order Up to 9
}
\vskip 1cm
\large
Stepan Kochemazov and Oleg Zaikin\\
ISDCT SB RAS\\
Lermontov street 134\\
664033 Irkutsk\\
Russia\\
\href{mailto:veinamond@gmail.com}{\tt veinamond@gmail.com}\\
\href{mailto:zaikin.icc@gmail.com}{\tt zaikin.icc@gmail.com}\\
\ \\
Eduard Vatutin\\
Southwest State University\\
50 let Oktyabrya 94\\
305040 Kursk\\
Russia\\
\href{mailto:evatutin@rambler.ru}{\tt evatutin@rambler.ru}\\
\ \\
Alexey Belyshev\\
BOINC.ru\\
Nakhimov Avenue 36 Building 1\\
117218 Moscow\\
Russia\\
\href{mailto:alexey-bell@yandex.ru}{\tt alexey-bell@yandex.ru}\\
\end{center}
\vskip .2 in
\begin{abstract}
We propose an algorithm for enumerating diagonal Latin squares. It relies
on specific properties of diagonal Latin squares to employ symmetry
breaking techniques. Furthermore, the algorithm employs several heuristic
optimizations and bit arithmetic techniques. We use the algorithm to
enumerate diagonal Latin squares of order at most 9.
\end{abstract}
\section{Introduction}\label{sec:intro}
By a \textit{Latin square} of order $N$, we mean an $N \times N$ table that is filled
with elements from the set $S=\{0,\ldots,N-1\}$ such that all elements from $S$ appear in each row and column. If its main diagonal and main anti-diagonal contain every
element from $S$, then we refer to such a square as a \textit{diagonal Latin square}.
Typically, it is unrealistic to explicitly enumerate every possible Latin square of
a specific order. However, considering various intrinsic symmetries
enables the construction of algorithms that allow outstanding results to be achieved. In particular, Latin squares of order up to 11 can be enumerated \cite{McKayR95,McKayW05}. The corresponding numbers are presented in the
{\it On-Line Encyclopedia of Integer Sequences} (OEIS) \cite{Sloane94,web:OEIS} as sequence \seqnum{A002860}.
However, to the best of our knowledge, diagonal Latin squares of small order were not studied until the end of 2016. Hence, in this study, we develop an approach for the enumeration of diagonal Latin squares of small order.
From the enumeration perspective, diagonal Latin squares differ significantly from ordinary Latin squares. The uniqueness constraints on diagonals restrict most transformations used to form equivalent Latin squares (primarily arbitrary row and column permutations), resulting in much smaller isotopy classes.
In this study, we first design a fast algorithm for the explicit generation of diagonal Latin squares, augment it with symmetry breaking techniques that utilize equivalence classes of diagonal Latin squares, and then apply it to enumerate diagonal Latin squares of order up to 9.
The two main contributions of the present study are as follows: The first one is the optimized brute-force algorithm for the enumeration of diagonal Latin squares and related designs, such as Latin rectangles. Essentially, the algorithm represents a Latin square as an integer array and uses $\leq N^2$ nested loops to traverse all possible variants of Latin square cell values. We improve it through several heuristic-based optimizations. In particular, the order in which the algorithm fills the cells affects its performance significantly. Additionally, the necessary checks and assignments, the organization of each loop, etc.\ are important. Bit arithmetic techniques enhance the performance of these operations substantially. The resulting version of the algorithm enables an enumeration of up to 7 million of diagonal Latin squares of order 9 per second on one central processing unit (CPU) core. We further enhance the performance of the algorithm by augmenting it with symmetry breaking techniques that are based on the class of transformations, which convert diagonal Latin squares into diagonal Latin squares. These techniques allow the size of the search space to be reduced by several orders of magnitude. Subsequently, we use the constructed algorithm for the second contribution, i.e., to enumerate diagonal Latin squares of orders 8 and 9 and to estimate the number of diagonal Latin squares of order 10.
A brief outline of the paper is as follows. In the next section, we discuss possible methods to generate Latin squares. Subsequently, we describe the basic structure of our algorithm that we use as a basis for further optimizations. In Section~\ref{sec:bit}, we describe how the bit arithmetic techniques enable the performance of the algorithm to be enhanced significantly in practice and experimentally evaluate different algorithm versions. In Section~\ref{sec:eq}, we describe how the isotopy classes of diagonal Latin squares can be constructed and use this information to introduce the symmetry breaking techniques into the proposed algorithm. Subsequently, we present the results of our computational experiments with and without symmetry breaking. Finally, we discuss related studies and present the conclusions.
\section{Algorithm description}\label{sec:alg}
Hereinafter, without the loss of generality, generation and enumeration are assumed as the same; hence, these terms are treated as interchangeable. Within the context of enumeration, it is sensible to consider only algorithms that are deterministic and complete, i.e., those that can generate all possible representatives of the desired species that satisfy fixed constraints. As we do not intend to store generated diagonal Latin squares, the enumeration should proceed in a fixed order and randomization should not be applied at any stage. Consequently, we process the whole search space and do not enumerate some diagonal Latin square more than once.
In the next subsection, we consider several algorithmic concepts that fit the description above. As our main goal is to enumerate diagonal Latin squares of order 9, we primarily evaluate possible algorithms in the context of this problem. Unless stated otherwise, in all performance evaluations, we used one core of Intel Core i7-6770 CPU and 16 GB RAM. To implement all the algorithms proposed herein, we used the C++ programming language (Microsoft Visual Studio 2015 compiler for Windows or \texttt{gcc} for Linux).
\subsection{Approaches for generating diagonal Latin squares}\label{subsec:ls_gen}
Each row and each column of a Latin square is a permutation of $N$ elements. This implies that for a small $N$, one can generate all possible permutations and construct Latin squares by combining them. A square can be filled by row while verifying that different rows do not have equal elements in the same positions. However, in this case, once several rows are filled, the number of available variants for the remaining rows decreases significantly. For example, if we consider Latin squares of order 10, we have $10!=3628800$ possible permutations. We can put each of them in the first row; subsequently, for the second row, we loop through the list and test if a permutation number $i$ does not violate the Latin square constraints. For rows after the 5th, the number of such permutations (that we can put as the next row) is in the range of hundreds at the most. Thus, if we cycle through all available permutations to put into, say, the 8th row, even if we can test if they fit very quickly, the process is still ineffective.
In this context, it is sensible to represent the original problem as an \textit{exact cover} instance and employ relatively sophisticated algorithms, such as DLX \cite{Knuth00}, which can restrict the search space ``on-the-fly''. If we are interested only in diagonal Latin squares, then two more uniqueness constraints must be considered. In our preliminary evaluations, we established that bit arithmetic-aided exhaustive search and DLX enable approximately $5 \cdot 10^5$ diagonal Latin squares of order 9 per second to be enumerated on one CPU core.
In the present study, we follow another simple approach for generating Latin squares. Within it, we represent a Latin square of order $N$ as an array of $N^2$ integer values corresponding to its cells and fill their values in a fixed order. In the most basic variant, we implement this enumeration procedure in the form of $N^2$ nested loops.
Initially, it appears that this approach is crude and will not be comparable to those mentioned above. Indeed, if we fill square elements from left to right from top to bottom, then the generation speed is extremely slow: approximately $6 \cdot 10^3$ diagonal Latin squares of order 9 per second on one CPU core. However, after several optimizations, as will be described below, this approach significantly outperforms the competitors.
\subsection{Algorithm design}\label{subsec:alg_design}
Assume that the enumeration of diagonal Latin squares of order $N$ is considered.
Hence, our algorithm uses several auxiliary constructs:
\begin{enumerate}
\item Integer array $\LS[i,j]$, $i,j\in\{0,\ldots,N-1\}$ that contains a Latin square;
\item Integer arrays $\Rows[i,j]$ and $\Columns[i,j]$, $i,j\in\{0,\ldots,N-1\}$ where we reflect elements that are already ``occupied'' in each row/column;
\item Integer arrays $\MD[i]$ and $\AD[i]$, $i\in\{0,\ldots,N-1\}$ where we reflect elements that are ``occupied'' on the main diagonal and main anti-diagonal;
\item Integer value $\SquaresCnt$, in which we accumulate the number of squares.
\end{enumerate}
{
\begin{algorithm}[htbp]
\label{alg1}
\KwData{$\LS[\cdot,\cdot]$, $\Rows[\cdot,\cdot])$, $\Columns[\cdot,\cdot]$, $\MD[\cdot]$, $\AD[\cdot]$, $\SquaresCnt$}
\tcc{All variables are initialized by 0.}
\tcc{Iterate over all possible values of cell $\LS[0,0]$}
$\LS[0,0]=0$\;
\While{$\LS[0,0]0$}{
$\LS[i,j]:=\LA[i,j]\bigwedge (-\LA[i,j])$\;
\tcc{Mark the value as occupied and proceed}
$\Rows[i]:=\Rows[i]\bigvee \LS[i,j]$\;
$\Columns[j]:=\Columns[j] \bigvee \LS[i,j]$\;
$\MD:=\MD \bigvee \LS[i,j]$\;
$\AD:=\AD \bigvee \LS[i,j]$\;
\textbf{BODY OF INNER LOOP FOR NEXT CELL $\LS[i',j']$}\;
\tcc{Mark value as ``free''}
$\Rows[i]:=\Rows[i]\bigoplus \LS[i,j]$\;
$\Columns[j]:=\Columns[j] \bigoplus \LS[i,j]$\;
$\MD:=\MD \bigoplus \LS[i,j]$\;
$\AD:=\AD \bigoplus \LS[i,j]$\;
$\LA[i,j]:=\LA[i,j]\bigwedge(\LA[i,j]-1)$\;
}
\caption{Optimized inner loop structure with the bit arithmetic}
\end{algorithm}
The first significant achievement in the improved inner loop design is that one of the two conditional operators is eliminated. Therefore, we first compute the value of $\CR[i,j]$. The result is a bit vector with $1$-bits in positions corresponding to values of $\LS[i,j]$ that violate any uniqueness constraints. Then, we use an additional auxiliary integer array $\LA[i,j]$. In the \textbf{for} cycle, we initialize $\LA[i,j]$ with possible values of $\LS[i,j]$ that do not violate any constraint and iterate over them by switching off the rightmost $1$ bit until $\LA[i,j]$ becomes $0$. For each value of $\LA[i,j]$, we generate the value of $\LS[i,j]$ by isolating the rightmost $1$-bit in $\LA[i,j]$. Once $\LA[i,j]$ becomes $0$, it means that we have processed all available alternatives. This improved algorithm version enables approximately $6 \cdot 10^6$ diagonal Latin squares of order 9 per second to be generated without heuristic optimizations.
In Table \ref{tab:versions_performance}, we compare three versions of the algorithm. All of them use Algorithm \ref{alg1} as a basis. The \textit{standard}, \textit{bit arithmetic}, and \textit{optimized bit arithmetic} versions employ the inner loop structures from Algorithms \ref{alg2}, \ref{alg3}, and \ref{alg4}, respectively. We compare these versions by the generation speed for different classes of Latin squares. Here, for diagonal Latin squares, we set the first row (in an ascending order) and for ordinary Latin squares, we set both the first row and first column (in an ascending order).
Table entry \textbf{(D)LS} followed by a number represents (diagonal) Latin squares of specific order. The order of cells in each case was determined according to the heuristic procedure outlined in Subsection \ref{subsec:cells_order}. For DLS9, we measured the performance of the optimized bit arithmetic version that uses the lookahead heuristic. For other entries, we did not implement it as the corresponding optimization requires considerable empirical evaluation and testing.
\begin{table}[htbp]
\centering
\begin{tabular}{|c|c|c|}
\hline
\textbf{Version}&\textbf{Problem}&\textbf{Squares per seconds}\\
\hline
Standard &DLS9&$1.8 \cdot 10^6$\\
\hline
Bit arithmetic & DLS9 & $2.6 \cdot 10^6$ \\
\hline
&DLS9&$6.8 \cdot 10^6$\\
\cline{2-3}
&LS8& $9 \cdot 10^6$\\
\cline{2-3}
Optimized bit arithmetic &DLS8 & $5.8 \cdot 10^6$\\
\cline{2-3}
&LS9 & $8.0 \cdot 10^6$\\
\cline{2-3}
&LS10 & $6.3 \cdot 10^6$\\
\cline{2-3}
&DLS10 & $ 6.0 \cdot 10^6$\\
\hline
\end{tabular}
\caption{Performance of the proposed versions of the algorithm for generation of Latin squares of small order.}
\label{tab:versions_performance}
\end{table}
It is clear that bit arithmetic techniques increase the algorithm performance significantly.
\section {Equivalence classes of diagonal Latin squares} \label{sec:eq}
Latin squares form large equivalence classes constructed by all
possible transpositions of rows/columns/element names.
These transformations are critical in the enumeration of Latin squares of small order
\cite{McKayR95,McKayW05}. However, for diagonal Latin squares, the
vast majority of such transformations result in the falsification of the constraint
on the uniqueness of diagonal elements. Meanwhile,
a class of symmetric row-column transpositions that transform diagonal Latin
squares into diagonal Latin squares can be determined.
The roots of the transformations presented below belong to the area of magic squares. Hence, we refer to them as \textit{M-transformations}. Any M-transformation is
a combination of \textit{basic} M-transformations.
Below, we describe the three types of the latter. Nonetheless,
it would be more appropriate to refer only to the
second and third as M-transformations; however, for simplicity,
we shall group them together with the third type.
\begin{itemize}
\item Mirroring a Latin square horizontally or vertically relative to the main diagonal or main anti-diagonal and then rotating it anticlockwise by 90-degrees multiple times, to obtain eight transformations.
\item Permutations of at least two columns positioned symmetrically with respect to the middle with simultaneous permutation of two symmetrically positioned rows. An example of such a transformation is the permutation of the $0$-th and $(n-1)$-th columns with the simultaneous permutation of the $0$-th and $(n-1)$-th rows. For diagonal Latin squares of order $n$, $2^{\lfloor \frac{n}{2} \rfloor}$ of such transformations exist.
\item Transpositions where we simultaneously
transpose $\geq 2$ columns in the left half of a diagonal Latin square and
the columns positioned symmetrically with respect to the middle in the right half
of a square with simultaneous similar transposition of rows. An example of such a transformation is the transposition of the $0$-th and $1$-th columns, $(n-2)$-th and $(n-1)$-th columns, $0$-th and $1$-th rows, and $(n-2)$-th and $(n-1)$-th rows. For diagonal Latin squares of order $n$, $\lfloor \frac{n}{2}\rfloor !$ transformations of this type exist.
\end{itemize}
Each time we apply any basic M-transformation to a diagonal Latin square, the diagonal elements remain in the corresponding diagonals or the diagonals switch places. Thus, any M-transformation transforms a diagonal Latin square into a diagonal Latin square.
Let us consider a diagonal Latin square, in which the elements on the main diagonal are in an ascending order (in contrast to Subsection~\ref{subsec:cells_order}, where the first row is ordered). Then, an equivalence class for such a diagonal Latin square of order $N$
has a size of $4 \cdot \lfloor \frac{N}{2}\rfloor! \cdot 2^{\lfloor
\frac{N}{2}\rfloor}$ at the most (here, the factor $4$ instead of $8$ is owing to some transformations cancelling each other).
An interesting fact is that M-transformations can be applied to partially
filled diagonal Latin squares. However, only such classes of
incomplete diagonal Latin squares should be considered, which are closed with respect to all or almost all basic M-transformations.
{\arraycolsep=2pt\def\arraystretch{1.2}
\begin{figure}[htbp]
\footnotesize
\begin{center}
$$
\begin{array}{ccc}
\left[
\begin{array}{ccccccccc}
0&\_&\_&\_&\_&\_&\_&\_& 1\\
\_& 1&\_&\_&\_&\_&\_& 0&\_\\
\_&\_& 2&\_&\_&\_& 3&\_&\_\\
\_&\_&\_& 3&\_& 7&\_&\_&\_\\
\_&\_&\_&\_& 4&\_&\_&\_&\_\\
\_&\_&\_& 6&\_& 5&\_&\_&\_\\
\_&\_& 8&\_&\_&\_& 6&\_&\_\\
\_& 5&\_&\_&\_&\_&\_& 7&\_\\
2&\_&\_&\_&\_&\_&\_&\_& 8\\
\end{array}
\right]
&
\left[
\begin{array}{ccccccccc}
0&\_&\_&\_&\_&\_&\_&\_& 1\\
\_& 1&\_&\_&\_&\_&\_& 0&\_\\
\_&\_& 2&\_&\_&\_& 3&\_&\_\\
\_&\_&\_& 3&\_& 7&\_&\_&\_\\
1& 0& 3& 2& 4& 6& 5& 8& 7\\
\_&\_&\_& 6&\_& 5&\_&\_&\_\\
\_&\_& 8&\_&\_&\_& 6&\_&\_\\
\_& 5&\_&\_&\_&\_&\_& 7&\_\\
2&\_&\_&\_&\_&\_&\_&\_& 8\\
\end{array}
\right]
&
\left[
\begin{array}{ccccccccc}
0&\_&\_&\_& 2&\_&\_&\_& 1\\
\_& 1&\_&\_& 3&\_&\_& 0&\_\\
\_&\_& 2&\_& 0&\_& 3&\_&\_\\
\_&\_&\_& 3& 1& 7&\_&\_&\_\\
1& 0& 3& 2& 4& 6& 5& 8& 7\\
\_&\_&\_& 6& 7& 5&\_&\_&\_\\
\_&\_& 8&\_& 5&\_& 6&\_&\_\\
\_& 5&\_&\_& 8&\_&\_& 7&\_\\
2&\_&\_&\_& 6&\_&\_&\_& 8\\
\end{array}
\right]
\end{array}
$$
\end{center}
\caption{Example of partial designs for $N=9$. From left to right:
\textit{cross}, \textit{asterisk}, \textit{doublecross}.}
\label{partial}
\end{figure}
}
Next, we introduce three auxiliary structures that will be discussed below. We will
refer to an incomplete diagonal Latin square in which only the entries for the main diagonal and main anti-diagonal are known as a \textit{cross design}. For an odd $N$, if the values in the middle row are known, then the corresponding design will
be referred to as an \textit{asterisk design}. Additionally, for an odd $N$ where the values of elements from the main diagonal, main anti-diagonal, middle row, and middle column are known, then we denote such structure as a \textit{doublecross} design. In general, we will refer
to them as \textit{partial designs}. Figure \ref{partial} shows the examples of
the introduced designs for $N=9$.
It is clear that the classes of cross and doublecross designs are
closed with respect to arbitrary M-transformations, while the class of
asterisk designs is closed only with respect to M-transformations that do not mirror
partial diagonal Latin square relative to its main diagonal or main anti-diagonal.
This fact allows us to split the space of all possible
partial designs of order $N$ into equivalence classes.
We now consider the following proposition.
\begin{proposition}
\label{prop1} Assume that $C_1$ and $C_2$ are two examples of
one of the introduced designs (cross, asterisk, or doublecross) of order $N$
and that $C_2$ is the result of applying some M-transformation
$\mu$ to $C_1$, i.e., $C_2=\mu(C_1)$. Then, the number of diagonal Latin squares of order $N$
that share $C_1$ is equal to the number of diagonal Latin squares of order $N$ that
share $C_2=\mu(C_1)$.
\end{proposition}
\begin{proof}
When we apply $\mu$ to $C_1$, we implicitly apply it to every single diagonal Latin square
that shares cell values with $C_1$. As $\mu(C_1)=C_2$, it implies that any
diagonal Latin square that shares cell values with $C_1$ will be transformed to a diagonal Latin square that shares
cell values with $C_2$.
By definition, for any M-transformation $\mu$, an inverse transformation
$\mu^{-1}$ exists. Thus, for an arbitrary diagonal Latin square $A$, the following holds:
$\mu(\mu^{-1}(A))=\mu^{-1}(\mu(A))=A$.
Hence, it is impossible for two distinct Latin squares $A$ and $B$
to share $C_1$ and $\mu(A)=\mu(B)$.
Subsequently, it follows that the sets of diagonal Latin squares that share
$C_1$ and $C_2$ have the same cardinality.
\end{proof}
In the next subsection, we show the method to embed this technique into the \textit{optimized bit arithmetic} version of the enumeration algorithm (see Section~\ref{sec:bit}).
\subsection{Embedding symmetry breaking into the algorithm}\label{subsec:4.1}
Proposition \ref{prop1} enables the enumeration of diagonal Latin squares of order $N$ to be performed as follows:
\begin{enumerate}
\item Generate all possible cross designs of order $N$.
\item Split cross designs into equivalence classes $C_1,\ldots,C_k$.
\item For each class $C_i$, select one cross design $p\in C_i$ and compute the
number of diagonal Latin squares of order $N$ that share $p$. Assume that it
is equal to $N_i$.
\item The number of diagonal Latin squares of order $N$ is then equal to $N!\sum_{i=1}^k N_i|C_i|$.
\end{enumerate}
Next, we focus on $N=8$ and $N=9$. To estimate the potential performance gain attainable by considering equivalence classes, we first
generated all possible cross designs and split them into equivalence classes.
Consequently, for both $N=8$
and $N=9$, there were $4752$ cross designs that formed $20$ equivalence classes.
Thus, the enumeration algorithm runtime can be reduced up to $237$ times.
Indeed, after embedding this information directly into the \textit{optimized bit arithmetic} version of the algorithm (see Section~\ref{sec:bit}), the enumeration of diagonal Latin squares of order $N=8$ consumed approximately 5 s on one core of the Core i7-6770 CPU (vs. 21 min for the \textit{optimized bit arithmetic}). Hereinafter, we refer to the proposed version of the enumeration algorithm as the \textit{symmetry breaking} version.
However, partial designs can be split into equivalence classes during the runtime of the enumeration algorithm. Recall that the \textit{optimized bit arithmetic} version fills Latin square cells individually in a specific order (see Subsection~\ref{subsec:cells_order}). However, it can fill the main diagonal and main anti-diagonal first. Assume that at some moment we construct a cross design $C$. We apply all applicable M-transformations to $C$ and construct its equivalence class. Subsequently, we verify whether $C$ is the first representative of its
equivalence class when ordered in lexicographic order (for a fixed order of cross-design-filled cells). If yes, then we can fill the other diagonal Latin square cells. Otherwise, we generate the next cross design.
\section{Computational experiments}
We applied the proposed algorithm to enumerate diagonal Latin squares of order at most 9 with the first row set in an ascending order, i.e., the first row equals to $1,2,\ldots,9$. Consequently, we obtained the following sequence:
\begin{equation*}
1, 0, 0, 2, 8, 128, 171\;200, 7\;447\;587\;840, 5\;056\;994\;653\;507\;584.
\end{equation*}
We added this sequence to the OEIS, as \seqnum{A274171}. By multiplying the corresponding numbers to $N!$, we obtained the following sequence that represents the number of diagonal Latin squares of order up to 9:
\begin{equation*}
1, 0, 0, 48, 960, 92\;160, 862\;848\;000, 300\;286\;741\;708\;800, 1\;835\;082\;219\;864\;832\;081\;920.
\end{equation*}
We added this sequence to the OEIS, as \seqnum{A274806}.
The experimental details are presented below. We first applied the \textit{standard} version of the algorithm (see Section~\ref{sec:alg}) to enumerate diagonal Latin squares of order at most 8. Then, we applied the \textit{optimized bit arithmetic} version of the algorithm (see Section~\ref{sec:bit}) to enumerate diagonal Latin squares of order 9 in two large-scale computational experiments. Later, using the \textit{symmetry breaking} version, we verified the obtained results. Additionally, the \textit{optimized bit arithmetic} version was used to estimate the number of diagonal Latin squares of order 10.
\subsection{Enumeration of diagonal Latin squares of order at most 8}\label{sec:exp8}
We used the \textit{standard} version of the algorithm to enumerate diagonal Latin squares of order at most 8 with the first row fixed in the ascending order. For an order of at most 7, the calculations consumed less than 1 s. For order 8 at the time of experiment, 30 h were consumed on one CPU core to perform the calculations. The \textit{optimized bit arithmetic} version achieved this result in 21 min, while the \textit{symmetry breaking} version in approximately 5 s.
\subsection{Enumeration of diagonal Latin squares of order 9}\label{sec:exp}
We performed two separate experiments to enumerate diagonal Latin squares of order 9 using the \textit{optimized bit arithmetic} version of the algorithm. We performed the first experiment in a volunteer computing project. For the second experiment, we employed a computing cluster.
In both cases, we decomposed the problem as follows. We set the first Latin square row in an ascending order. Because we filled in the cells of a Latin square in a specific order (presented in Figure \ref{fig:dls9_cells_order}), we could select a small number of cells to be filled and process their correct assignments separately. In our experiments, we varied values of the first 10 cells (according to the aforementioned order). There were 1\;225\;884 correct assignments of these cells, for which we constructed a subproblem for each of these assignments. In each subproblem, all diagonal Latin squares of order 9 must be enumerated with constant values of 19 among 81 cells. As the proposed subproblems are disjoint, we could solve them independently. Consequently, we obtained an array of 1\;225\;884 integer numbers. Their total sum was equal to the number of diagonal Latin squares of order 9 with the first row equal to $1,2,\ldots,9$.
\subsubsection{Experiment in a volunteer computing project}\label{subsec:volunteer}
Volunteer computing is a relatively cheap and natural method for solving computationally hard problems that can be decomposed into independent subproblems. It is based on using computers of the so-called volunteers--- private persons willing to donate their computational resources for certain causes. We used the volunteer computing project Gerasim@home \cite{VatutinZKV17} to enumerate Latin squares of order 9. Gerasim@home is based on the Berkeley Open Infrastructure for Network Computing \cite{AndersonF06}.
The computational experiment was aimed at the enumeration of diagonal Latin squares of order 9 that started on June 18, 2016. The server of Gerasim@home
created and maintained 1\;225\;884 subproblems; all of them were solved by volunteers' computers. The experiment lasted 3 months and ended on September 17, 2016. In total, approximately 1000 computers of 500 volunteers from 51 countries were used in the experiment. The peak performance was 5 teraflops, and the average performance was approximately 3 teraflops.
Consequently, we determined that the number of diagonal Latin squares of order 9 with the first row set in an ascending order is 5\;056\;994\;653\;507\;584. If we multiply it to $9!$, we obtain the number of diagonal Latin squares of order 9: 1\;835\;082\;219\;864\;832\;081\;920.
It should be noted, that if we did not improve the algorithm performance by means, described in Sections \ref{sec:alg} and \ref{sec:bit}, the corresponding experiment in Gerasim@home would take about 10 years.
\subsubsection{Experiment on a computing cluster}\label{subsec:cluster}
In application to hard enumeration problems, it is crucial to cross-check the results as small errors may remain undetected in specific circumstances, thus jeopardizing the validity. One might say that this is especially true when using volunteer computing; however, our empirical results prove otherwise.
In any case, we decided to perform verifications by ourselves and conducted an additional experiment aimed at solving the same problem. To perform it, we used the computing cluster ``Academician V.M. Matrosov'' of the Irkutsk supercomputing center of Siberian Branch of Russian Academy of Sciences \cite{web:cluster}. Each node of this cluster contains two 16-core AMD Opteron 6276 CPUs and 64 gigabytes of RAM. The approach as that in the Gerasim@home experiment was used for decomposition.
We developed the MPI-program (here, MPI represents message passing interface) based on the \textit{optimized bit arithmetic} version of the algorithm. In this program, one process is a control process, and all the remaining processes are computing processes. The control process creates and maintains the pool of subproblems to be processed by computing processes. Furthermore, it accumulates and processes results obtained by computing processes.
The experiment started on July 17, 2016 and required several executions of the MPI-program to complete it. In these executions, the number of employed cluster nodes varied from 10 to 15, while the runtime varied from 2 h to 7 days. The majority of executions involved using 15 nodes with a runtime of 7 days. The experiment ended on October 17, 2016. Finally, the experiment lasted 2 months. Consequently, we verified that the number computed in Gerasim@home was correct.
\subsubsection{Experiment with symmetry breaking}\label{subsec:exp_sym9}
We developed and implemented the \textit{symmetry breaking} version of the algorithm long after the two experiments described above have already ended.
In the experiment for diagonal Latin squares of order
9, we used the sequence of partial designs, each an extension of the previous one.
We pre-filled the main diagonal of a Latin square in an ascending order.
Then, we filled the cross design cells and verified whether this cross design was
the first representative of its equivalence class. If \textit{yes}, then we
generated asterisk designs that were representatives of their equivalence classes.
To each asterisk design class representative, we applied the similar scheme:
we generated doublecross designs and for each doublecross equivalence class
representative, we computed the number of diagonal Latin squares of order 9 that share it.
The motivation here is simple: the size of equivalence class for the cross design varies from
12 to 768; for the asterisk design, it is typically either 768 or 384; for doublecross, it
is 1536 in most cases. Therefore, it is the most beneficial to use doublecross designs.
However, without the auxiliary steps including cross and
asterisk designs, we must generate and reject many doublecross instances
that are not the class representatives.
The experiment lasted 79 h on a computer equipped with Intel Core i7-6770 (with eight threads). This means that for $N=9$, the \textit{symmetry breaking} version of the algorithm is approximately 1000 times faster than the \textit{optimized bit arithmetic} version.
\subsection{Estimation of the number of diagonal Latin squares of order 10}\label{subsec:exp_DLS10}
After enumerating diagonal Latin squares of order 9, we decided to apply the \textit{optimized bit arithmetic} version of the algorithm for the enumeration of diagonal Latin squares of order 10. However, it became clear quickly that the number was extremely large. To estimate it, we employed the Monte Carlo method \cite{MetropolisU49} in the following form. If we specify some order in which we fill the cells of a diagonal Latin square, we can consider an incomplete diagonal Latin square formed by the first $k$ cells according to the specified order. It is natural to consider the trivial order: the first $k$ elements of a Latin square from left to right, from top to bottom. Let us refer to such incomplete diagonal Latin squares of order 10 as $\DLS_{10}^k$. First, for a specific $k$, we compute the number of possible $\DLS_{10}^k$, to which we refer as $N_{10}^k$. Then, we form a random sample of $\DLS_{10}^k$. For each incomplete diagonal Latin square from the sample, we enumerate all possible diagonal Latin squares of order 10 that can be constructed by filling the unassigned cells of $\DLS_{10}^k$. As a result of processing the random sample, we estimate an expected value of the number of diagonal Latin squares of order 10 that shares the same $\DLS_{10}^k$. By multiplying this estimated expected value to $N_{10}^k$, we then estimated the number of diagonal Latin squares of order 10.
First, we must select $k$ such that the number of $\DLS_{10}^k$ can be computed in reasonable time and that the sample of $\DLS_{10}^k$ can be processed to a sufficient size. We set the elements of the first row of a Latin square in an ascending order for simplicity. We started from $k=30$. The corresponding $N_{10}^{30}$ is 284 086 571 712. However, for each $\DLS_{10}^{30}$, several days are required on one core of the state-of-the-art CPU to enumerate all possible diagonal Latin squares that share some $\DLS_{10}^{30}$. Thus, we selected $k=32$ and estimated this value. The number of corresponding incomplete diagonal Latin squares $N_{10}^{32}$ was 12\;611\;543\;636\;160. We generated a random sample of 10\;000 $\DLS_{10}^{32}$ instances and used it to estimate the expected value of the number of diagonal Latin squares of order 10 with $\DLS_{10}^{32}$. The corresponding expected value was equal to 11\;931\;268\;344. Thus, the estimated number of diagonal Latin squares of order 10 was approximately $1.5 \cdot 10^{23}$.
\section{Related studies}\label{sec:related}
In previous studies \cite{HulpkeKO11,McKayR95,McKayW05}, approaches that resulted in the enumeration of Latin squares of orders 10 and 11 were described. The authors of the present paper are not aware of algorithms that are developed specifically for the enumeration of diagonal Latin squares.
McKay et al.\ \cite{McKayMW06} enumerated the transversals for Latin squares of order at most 9. They considered the fact that the space of Latin squares could be divided into isotopy classes. Transversals were enumerated for each representative, thus enabling the number of transversals to be calculated for each isotopy class. In other studies \cite{McKayMW06,Potapov16,Taranenko15}, the lower and upper bounds on the number of transversals in Latin squares were investigated.
Among other recent and relevant enumeration results, we would like to briefly mention the following. McKay et al.\ \cite{McKayMM07} determined that Latin squares of order 10 from several particular families cannot participate in a triple of mutually orthogonal Latin squares (MOLS) of order 10. Egan and Wanless \cite{EganW16} enumerated MOLS of order up to 9; additionally, they found the triple of Latin squares of order 10 that is the closest to being a triple of MOLS discovered hitherto. Zaikin et al.\ \cite{ZaikinZKV16} discovered a triple of diagonal Latin squares of order 10 that is the closest to being a triple of mutually orthogonal diagonal Latin squares discovered hitherto. Potapov \cite{Potapov18} estimated the number of sets of orthogonal Latin squares and that of Latin hypercubes.
Several studies have been performed where authors applied parallel computing to search for combinatorial designs based on Latin squares. Lam et al.\ \cite{LamTS89} used a computing cluster to prove the nonexistence of finite projective planes of order 10. McGuirre et al.\ \cite{McGuireTC14} proved a hypothesis regarding the minimum number of clues in Sudoku using a computing cluster. They developed a fast algorithm to enumerate and verify all possible Sudoku variants.
\section{Conclusions}\label{sec:concl}
We herein presented a fast algorithm for the enumeration of diagonal Latin squares of small order. One of its main features was symmetry breaking techniques that enabled the search space to be reduced significantly. Using the proposed algorithm, we enumerated diagonal Latin squares of order up to 9. For future studies, we plan to investigate if the proposed algorithm can be used to solve other problems in related areas.
The present study is a significantly reworked and extended variant of the paper \cite{VatutinKZ17}. The modifications included (but were not limited to) new sections on symmetry breaking and applications of the corresponding techniques (Section \ref{sec:eq}), owing to which the main result of \cite{VatutinKZ17} was substantially improved.
\section{Acknowledgments}\label{sec:ack}
We would like to thank the editor and the anonymous referee for their comments and suggestions. We are also grateful to Professor I. M. Wanless for fruitful discussions. We thank all Gerasim@home volunteers who used their computers for participating in the experiment.
Stepan Kochemazov and Oleg Zaikin are partially supported by the Russian Science Foundation (project No.~16-11-10046). Eduard Vatutin is partially supported by the Russian Foundation for Basic Research (grant No.~17-07-00317-a).
Authors' contributions:
\begin{itemize}
\item Stepan Kochemazov, bit arithmetic implementation from Section~\ref{sec:bit}, symmetry breaking technique from Section~\ref{sec:eq}, experiments from Subsection~\ref{subsec:exp_sym9} and Subsection~\ref{subsec:exp_DLS10};
\item Oleg Zaikin, experiment from Subsection~\ref{subsec:cluster};
\item Eduard Vatutin, general scheme of the algorithm proposed in Section~\ref{sec:alg}, experiments from Subsection~\ref{sec:exp8} and Subsection~\ref{subsec:volunteer};
\item Alexey Belyshev, M-transformations from Section~\ref{sec:eq}.
\end{itemize}
\bibliographystyle{jis}
\bibliography{KZVB-JIS-refs}
\bigskip
\hrule
\bigskip
\noindent 2010 {\it Mathematics Subject Classification}:
Primary 05B15; Secondary 68W01.
\noindent \emph{Keywords:} Latin square, enumeration.
\bigskip
\hrule
\bigskip
\noindent (Concerned with sequences
\seqnum{A002860},
\seqnum{A274171}, and
\seqnum{A274806}.)
\bigskip
\hrule
\bigskip
\vspace*{+.1in}
\noindent
Received July 3 2019;
revised versions received October 8 2019; October 28 2019.
Published in {\it Journal of Integer Sequences}, December 28 2019.
\bigskip
\hrule
\bigskip
\noindent
Return to
\htmladdnormallink{Journal of Integer Sequences home page}{https://cs.uwaterloo.ca/journals/JIS/}.
\vskip .1in
\end{document}