\documentclass[fleqn]{article} \usepackage[round,longnamesfirst]{natbib} \usepackage{graphicx,keyval,a4wide,makeidx,color,colordvi} \usepackage{hyperref} \newcommand\R{\textsf{R}} \newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}} \newcommand{\sQuote}[1]{`{#1}'} \newcommand{\dQuote}[1]{``{#1}''} \newcommand{\file}[1]{\sQuote{\textsf{#1}}} \newcommand{\data}[1]{\texttt{#1}} \newcommand{\var}[1]{\textit{#1}} \newcommand{\class}[1]{\textsf{#1}} \newcommand{\proglang}[1]{\textsf{#1}} %% \code without `-' ligatures \def\nohyphenation{\hyphenchar\font=-1 \aftergroup\restorehyphenation} \def\restorehyphenation{\hyphenchar\font=`-} {\catcode`\-=\active% \global\def\code{\bgroup% \catcode`\-=\active \let-\codedash% \Rd@code}} \def\codedash{-\discretionary{}{}{}} \def\Rd@code#1{\texttt{\nohyphenation#1}\egroup} \newcommand{\codefun}[1]{\code{#1()}} \newcommand{\codefunind}[1]{\codefun{#1}\index{\texttt{#1}}} \newcommand{\codeind}[1]{\code{#1}\index{\texttt{#1}}} \SweaveOpts{strip.white=true} \AtBeginDocument{\setkeys{Gin}{width=0.5\textwidth}} \definecolor{Blue}{rgb}{0,0,0.8} \definecolor{Red}{rgb}{0.7,0,0} \date{2016-07-01} \title{Good Relations with \R} \author{David Meyer and Kurt Hornik} %% \VignetteIndexEntry{Relations} %\VignetteDepends{relations,sets} %\VignetteKeywords{set, tuple, relation, relation ensemble, consensus, Hasse Diagram} %\VignettePackage{relations} \makeindex{} \sloppy{} \begin{document} \maketitle %\begin{abstract} %\end{abstract} <>= options(width = 80) library("sets") library("relations") @ % %\section{Introduction} %\label{sec:introduction} Given $k$ sets of objects $X_1$, \ldots, $X_k$, a $k$-ary relation $R$ on $D(R) = (X_1, \ldots, X_k)$ is a subset $G(R)$ of the Cartesian product $X_1 \times \cdots \times X_k$. I.e., $D(R)$ is a $k$-tuple of sets and $G(R)$ is a set of $k$-tuples. We refer to $D(R)$ and $G(R)$ as the \emph{domain} and the \emph{graph} of the relation $R$, respectively (alternative notions are that of \emph{ground} and \emph{figure}, respectively). Relations are a very fundamental mathematical concept: well-known examples include the linear order defined on the set of integers, the equivalence relation, notions of preference relations used in economics and political sciences, etc. Package \pkg{relations} provides data structures along with common basic operations for relations and also relation ensembles (collections of relations with the same domain), as well as various algorithms for finding suitable consensus relations for given relation ensembles. \section{Relations and Relation Ensembles} \label{sec:relations+ensembles} \subsection{Relations} For a $k$-ary relation~$R$ with domain $D(R) = (X_1, \ldots, X_k)$, we refer to $s = (s_1, \ldots, s_k)$, where each $s_i$ gives the cardinality of $X_i$, as the \emph{size} of the relation. Note that often, relations are identified with their graph; strictly speaking, the relation is the \emph{pair} $(D(R), G(R))$. We say that a $k$-tuple $t$ is \emph{contained} in the relation $R$ iff it is an element of $G(R)$. The \emph{incidence} (array)~$I(R)$ of~$R$ is a~$k$-dimensional 0/1 array of size~$s$ whose elements indicate whether the corresponding $k$-tuples are contained in $R$ or not. Package \pkg{relations} implements finite relations as an S3 class which allows for a variety of representations (even though currently, typically dense array representations of the incidences are employed). Other than by the generator \codefunind{relation}, relations can be obtained by coercion via the generic function \codefunind{as.relation}, which has methods for at least logical and numeric vectors, unordered and ordered factors, arrays including matrices, and data frames. Unordered factors are coerced to equivalence relations; ordered factors and numeric vectors are coerced to order relations. Logical vectors give unary relations (predicates). A (feasible) $k$-dimensional array is taken as the incidence of a $k$-ary relation. Finally, a data frame is taken as a relation table (object by attribute representation of the relation graph). Note that missing values will be propagated in the coercion. <>= ## A relation created by specifying the graph: R <- relation(graph = data.frame(A = c(1, 1:3), B = c(2:4, 4))) ## extract domain relation_domain(R) ## extract graph relation_graph(R) ## both ("a pair of domain and graph" ...) as.tuple(R) ## extract incidence relation_incidence(R) ## (Almost) the same using the set specification ## (the domain labels are missing). R <- relation(graph = set(tuple(1,2), tuple(1,3), tuple(2,4), tuple(3,4))) ## equivalent to: ## relation(graph = list(1:2, c(1,3), c(2,4), c(3,4))) relation_incidence(R) ## Domains can be composed of arbitrary R objects: R <- relation(domain = set(c, "test"), graph = set(tuple(c, c), tuple(c, "test"))) relation_incidence(R) as.relation(1:3) relation_graph(as.relation(c(TRUE, FALSE, TRUE))) relation_graph(as.relation(factor(c("A", "B", "A")))) @ Note that while coercion uses the factor values to obtain the graph, it infers the domain from the factor names if available and unique, or from the values if unique: <<>>= relation_graph(as.relation(factor(c(X = "A", Y = "B", Z = "A")))) relation_graph(as.relation(factor(c("A", "B", "C")))) @ The \emph{characteristic function} $f_R$ (sometimes also referred to as indicator function) of a relation $R$ is the predicate (Boolean-valued) function on the Cartesian product $X_1 \times \cdots \times X_k$ such that $f_R(t)$ is true iff the $k$-tuple $t$ is in $G(R)$. Characteristic functions can both be recovered from a relation via \codefunind{relation\_charfun}, and be used in the generator for the creation. In the following, \code{R} represents ``a divides b'': <>= divides <- function(a, b) b %% a == 0 R <- relation(domain = list(1 : 10, 1 : 10), charfun = divides) R "%|%" <- relation_charfun(R) 2L %|% 6L 2:4 %|% 6L 2L %|% c(2:3, 6L) "%|%"(2L, 6L) @ Quite a few \codefun{relation\_is\_\var{foo}} predicate functions are available. For example, relations with arity 2, 3, and 4 are typically referred to as \emph{binary}, \emph{ternary}, and \emph{quaternary} relations, respectively---the corresponding functions in package \pkg{relations} are \codefunind{relation\_is\_binary}, \codefunind{relation\_is\_ternary}, etc. For binary relations~$R$, it is customary to write $x R y$ iff $(x, y)$ is contained in $R$. For predicates available on binary relations, see Table~\ref{tab:binary}. An \emph{endorelation} on $X$ (or binary relation \emph{over} $X$) is a binary relation with domain $(X, X)$. Endorelations may or may not have certain basic properties (such as transitivity, reflexivity, etc.) which can be tested in \pkg{relations} using the corresponding predicates (see Table~\ref{tab:endorelations} for an overview). Some combinations of these basic properties have special names because of their widespread use (such as linear order or weak order), and can again be tested using the functions provided (see Table~\ref{tab:combinations}). \begin{table}[p] \centering \begin{tabular}{|l|l|} \hline left-total & for all $x$ there is at least one $y$ such that $x R y$.\\ right-total & for all $y$ there is at least one $x$ such that $x R y$.\\ functional & for all $x$ there is at most one $y$ such that $x R y$.\\ surjective & the same as right-total.\\ injective & for all $y$ there is at most one $x$ such that $x R y$.\\ bijective & left-total, right-total, functional and injective.\\ \hline \end{tabular} \caption{Some properties \var{foo} of binary relations---the predicates in \pkg{relations} are \texttt{relation\_is\_\var{foo}()} (with hyphens replaced by underscores).} \label{tab:binary} \end{table} \begin{table}[p] \centering \begin{tabular}{|l|p{0.6\textwidth}|} \hline reflexive & $x R x$ for all $x$. \\ irreflexive & there is no $x$ such that $x R x$. \\ coreflexive & $x R y$ implies $x = y$. \\ symmetric & $x R y$ implies $y R x$. \\ asymmetric & $x R y$ implies that not $y R x$. \\ antisymmetric & $x R y$ and $y R x$ imply that $x = y$. \\ transitive & $x R y$ and $y R z$ imply that $x R z$. \\ complete & for all distinct $x$ and $y$, $x R y$ or $y R x$. \\ strongly complete & for all $x$ and $y$, $x R y$ or $y R x$. \\ negatively transitive & not $x R y$ and not $y R z$ imply that not $x R z$. \\ Ferrers & $x R y$ and $z R w$ imply $x R w$ or $y R z$. \\ semitransitive & $x R y$ and $y R z$ imply $x R w$ or $w R z$. \\ quasitransitive & $x R y$ and not $y R x$ and $y R z$ and not $z R y$ imply that $x R z$ and not $z R x$ (i.e., the asymmetric part of $R$ is transitive). \\ trichotomous & exactly one of $x R y$, $y R x$, or $x = y$ holds. \\ Euclidean & $x R y$ and $x R z$ imply $y R z$. \\ \hline \end{tabular} \caption{Some properties \var{bar} of endorelations---the predicates in \pkg{relations} are \texttt{relation\_is\_\var{bar}()} (with spaces replaced by underscores).} \label{tab:endorelations} \end{table} \begin{table}[p] \centering \begin{tabular}{|l|l|} \hline preorder & reflexive and transitive. \\ quasiorder & the same as preorder. \\ equivalence & a symmetric preorder. \\ weak order & complete and transitive. \\ preference & the same as weak order. \\ partial order & an antisymmetric preorder. \\ strict partial order & irreflexive, transitive and antisymmetric. \\ linear order & a complete partial order. \\ strict linear order & a complete strict partial order. \\ match & strongly complete. \\ tournament & complete and antisymmetric. \\ interval order & complete and Ferrers. \\ semiorder & a semitransitive interval order. \\ \hline \end{tabular} \caption{Some categories \var{baz} of endorelations---the predicates in \pkg{relations} are \texttt{relation\_is\_\var{baz}()} (with spaces replaced by underscores).} \label{tab:combinations} \end{table} <>= R <- as.relation(1:5) relation_is(R, "binary") relation_is(R, "transitive") relation_is(R, "partial_order") @ Relations with the same domain can naturally be ordered according to their graphs. I.e., $R_1 \le R_2$ iff $G(R_1)$ is a subset of $G(R_2)$, or equivalently, if every $k$-tuple $t$ contained in $R_1$ is also contained in $R_2$. This induces a lattice structure, with meet (greatest lower bound) and join (least upper bound) the intersection and union of the graphs, respectively, also known as the \emph{intersection} and \emph{union} of the relations. The least element moves metric on this lattice is the \emph{symmetric difference metric}, i.e., the cardinality of the symmetric difference of the graphs (the number of tuples in exactly one of the relation graphs). This \dQuote{symdiff} dissimilarity between (ensembles of) relations can be computed by \codefunind{relation\_dissimilarity}. <>= x <- matrix(0, 3L, 3L) R1 <- as.relation(row(x) >= col(x)) R2 <- as.relation(row(x) <= col(x)) R3 <- as.relation(row(x) < col(x)) relation_incidence(max(R1, R2)) relation_incidence(min(R1, R2)) R3 < R2 relation_dissimilarity(min(R1, R2), max(R1, R2)) @ The \emph{complement} (or negation) $R^c$ of a relation $R$ is the relation with domain $D(R)$ whose graph is the complement of $G(R)$, i.e., which contains exactly the tuples not contained in $R$. For binary relations $R_1$ and $R_2$ with domains $(X, Y)$ and $(Y, Z)$, the \emph{composition} $S = R_1 \ast R_2$ of $R_1$ and $R_2$ is defined by taking $x S z$ iff there is a $y$ such that $x R_1 y$ and $y R_2 z$. The \emph{transpose} (or \emph{inverse}) $R^t$ of the relation $R$ with domain $(X, Y)$ is the relation with domain $(Y, X)$ such that $x R^t y$ iff $y R x$. % Basic relation operations are available as group methods: \code{min} % and \code{max} give the meet and join, and \code{range} a % relation ensemble with these two. % Comparison operators implement the natural ordering in the relation % lattice. Where applicable, \code{!} gives the complement, \code{\&} % and \code{|} intersection and union, and \code{*} composition, % respectively. Finally, \code{t} gives the transpose. <>= relation_incidence(! R1) relation_incidence(R1 * R2) relation_incidence(t(R2)) @ There is a \codefun{plot} method for certain endorelations (currently, only complete or antisymmetric transitive relations are supported) provided that package \pkg{Rgraphviz} \citep{relations:Hansen+Gentry+Long:2017} is available, creating a Hasse diagram of the relation. The following code produces the Hasse diagram corresponding to the inclusion relation on the power set of $\{a,b,c\}$ which is a partial order (see Figure~\ref{fig:plot}). <>= ps <- 2 ^ set("a", "b", "c") inc <- set_outer(ps, "<=") if (require("Rgraphviz")) plot(relation(incidence = inc)) @ \begin{figure}[h] \begin{center} <>= <> @ \caption{Hasse Diagram of the inclusion relation on the power set of $\{a,b,c\}$.} \label{fig:plot} \end{center} \end{figure} \subsection{Relation Ensembles} \dQuote{Relation ensembles} are collections of relations $R_i = (D, G_i)$ with the same domain $D$ and possibly different graphs $G_i$. Such ensembles are implemented as suitably classed lists of relation objects (of class \class{relation\_ensemble} and inheriting from \class{tuple}), making it possible to use \codefun{lapply} for computations on the individual relations in the ensemble. Relation ensembles can be created via \codefunind{relation\_ensemble}, or by coercion via the generic function \codefunind{as.relation\_ensemble} which has methods for at least data frames (regarding each variable as a separate relation). Available methods for relation ensembles include those for subscripting, \codefun{c}, \codefun{t}, \codefun{rep}, \codefun{print}, and \codefun{plot}. In addition, there are summary methods defined (\codefun{min}, \codefun{max}, and \codefun{range}). Other operations work element-wise like on tuples due to the inheritance. The Cetacea data set \citep{relations:Vescia:1985} is a data frame with 15 variables relating to morphology, osteology, or behavior, with both self-explanatory names and levels, and a common zoological classification (variable \code{CLASS}) for 36 types of cetacea. We consider each variable an equivalence relation on the objects, excluding 2 variables with missing values, giving a relation ensemble of length 14 (number of complete variables in the data set). <<>>= data("Cetacea") ind <- vapply(Cetacea, function(s) all(!is.na(s)), TRUE) relations <- as.relation_ensemble(Cetacea[, ind]) print(relations) @ Available methods for relation ensembles allow to determine duplicated (relation) entries, to replicate and combine, and extract unique elements: <<>>= any(duplicated(relations)) thrice <- c(rep(relations, 2L), relations) all.equal(unique(thrice), relations) @ Note that \codefun{unique} does not preserve attributes, and hence names. In case one wants otherwise, one can subscript by a logical vector indicating the non-duplicated entries: <<>>= all.equal(thrice[!duplicated(thrice)], relations) @ Relation (cross-)dissimilarities can be computed for relations and ensembles thereof: <<>>= relation_dissimilarity(relations[1 : 2], relations["CLASS"]) @ To determine which single variable is \dQuote{closest} to the zoological classification: <<>>= d <- relation_dissimilarity(relations) sort(as.matrix(d)[, "CLASS"])[-1L] @ There is also an Ops group method for relation ensembles which works elementwise (in essence, as for tuples): <<>>= complement <- !relations complement @ \section{Relational Algebra} \label{sec:relalgebra} In addition to the basic operations defined on relations, the package provides functionality similar to the corresponding operations defined in relational algebra theory as introduced by \cite{relations:Codd:1970}. Note, however, that domains in database relations, unlike the concept of relations we use here, are unordered. In fact, a database relation (\dQuote{table}) is defined as a set of elements called \dQuote{tuples}, where the \dQuote{tuple} components are named, but unordered. Thus, a \dQuote{tuple} in this Codd sense is a set of mappings from the attribute names into the union of the attribute domains. The functions defined in \pkg{relations}, however, preserve and respect the column ordering. The \emph{projection} of a relation on a specified margin (i.e., a vector of domain names or indices) is the relation obtained when all tuples are restricted to this margin. As a consequence, duplicate tuples are removed. The corresponding function in package \pkg{relations} is \codefunind{relation\_projection}. <>= ## projection Person <- data.frame(Name = c("Harry", "Sally", "George", "Helena", "Peter"), Age = c(34, 28, 29, 54, 34), Weight = c(80, 64, 70, 54, 80), stringsAsFactors = FALSE) Person <- as.relation(Person) relation_table(Person) relation_table(relation_projection(Person, c("Age", "Weight"))) @ (Note that Harry and Peter have the same age and weight.) The \emph{selection} of a relation is the relation obtained by taking a subset of the relation graph, defined by some logical expression. The corresponding function in \pkg{relations} is \codefunind{relation\_selection}. <>= ## selection relation_table(R1 <- relation_selection(Person, Age < 29)) relation_table(R2 <- relation_selection(Person, Age >= 34)) relation_table(R3 <- relation_selection(Person, Age == Weight)) @ The \emph{union} of two relations simply combines the graph elements of both relations; the \emph{complement} of two relations $R$ and $S$ removes the tuples of $S$ from $R$. One can use \codeind{-} as a shortcut for \codefunind{relation\_complement}, and \codeind{\%U\%} or \codeind{|} for \codefunind{relation\_union}. The difference between \code{\%U\%} and \code{|} is that the latter only works for identical domains. <>= ## union relation_table(R1 %U% R2) ## works only for the same domains: relation_table(R2 | R3) ## complement relation_table(Person - R2) @ The \emph{intersection} (\emph{symmetric difference}) of two relations is the relation with all tuples they have (do not have) in common. One can use \codeind{\&} instead of \codefunind{relation\_intersection} in case of identical domains. <>= ## intersection relation_table(relation_intersection(R2, R3)) ## works only for the same domains: relation_table(R2 & R3) ## symmetric difference relation_table(relation_symdiff(R2, R3)) @ The \emph{Cartesian product} of two relations is obtained by basically building the Cartesian product of all graph elements, but combining the resulting pairs into single tuples. A shortcut for \codefunind{relation\_cartesian} is \codeind{\%><\%}. <>= ## cartesian product Employee <- data.frame(Name = c("Harry", "Sally", "George", "Harriet", "John"), EmpId = c(3415, 2241, 3401, 2202, 3999), DeptName = c("Finance", "Sales", "Finance", "Sales", "N.N."), stringsAsFactors = FALSE) Employee <- as.relation(Employee) relation_table(Employee) Dept <- data.frame(DeptName = c("Finance", "Sales", "Production"), Manager = c("George", "Harriet", "Charles"), stringsAsFactors = FALSE) Dept <- as.relation(Dept) relation_table(Dept) relation_table(Employee %><% Dept) @ The \emph{division} of relation $R$ by relation $S$ is the reversed Cartesian product. The result is a relation with the domain unique to $R$ and containing the maximum number of tuples which, multiplied by $S$, are contained in $R$. The \emph{remainder} of this operation is the complement of $R$ and the division of $R$ by $S$. Note that for both operations, the domain of $S$ must be contained in the domain of $R$. The shortcuts for \codefunind{relation\_division} and \codefunind{relation\_remainder} are \codeind{\%/\%} and \codeind{\%\%}, respectively. <>= ## division Completed <- data.frame(Student = c("Fred", "Fred", "Fred", "Eugene", "Eugene", "Sara", "Sara"), Task = c("Database1", "Database2", "Compiler1", "Database1", "Compiler1", "Database1", "Database2"), stringsAsFactors = FALSE) Completed <- as.relation(Completed) relation_table(Completed) DBProject <- data.frame(Task = c("Database1", "Database2"), stringsAsFactors = FALSE) DBProject <- as.relation(DBProject) relation_table(DBProject) relation_table(Completed %/% DBProject) ## division remainder relation_table(Completed %% DBProject) @ The (natural) \emph{join} of two relations is their Cartesian product, restricted to the subset where the elements of the common attributes do match. The left/right/full outer join of two relations $R$ and $S$ is the union of $R$/$S$/($R$ and $S$), and the inner join of $R$ and $S$. The implementation of \codefun{relation\_join} uses \codefun{merge}, and so the left/right/full outer joins are obtained by setting \code{all.x}/\code{all.y}/\code{all} to \code{TRUE} in \codefunind{relation\_join}. The domains to be matched are specified using \code{by}. Alternatively, one can use the operators \codeind{\%|><|\%}, \codeind{\%=><\%}, \codeind{\%><=\%}, and \codeind{\%=><=\%} for the natural join, left join, right join, and full outer join, respectively. <>= ## Natural join relation_table(Employee %|><|% Dept) ## left (outer) join relation_table(Employee %=><% Dept) ## right (outer) join relation_table(Employee %><=% Dept) ## full outer join relation_table(Employee %=><=% Dept) @ The left (right) \emph{semijoin} of two relations $R$ and $S$ is the join of these, projected to the attributes of $R$ ($S$). Thus, it yields all tuples of $R$ ($S$) participating in the join of $R$ and $S$. Shortcuts for \codefunind{relation\_semijoin} are \codeind{\%|><\%} and \codeind{\%><|\%} for left and right semijoin, respectively. <>= ## semijoin relation_table(Employee %|><% Dept) relation_table(Employee %><|% Dept) @ The left (right) \emph{antijoin} of two relations $R$ and $S$ is the complement of $R$ ($S$) and the join of both, projected to the attributes of $R$ ($S$). Thus, it yields all tuples of $R$ ($S$) \emph{not} participating in the join of $R$ and $S$. Shortcuts for \codefunind{relation\_antijoin} are \codeind{\%|>\%} and \codeind{\%<|\%} for left and right antijoin, respectively. <>= ## antijoin relation_table(Employee %|>% Dept) relation_table(Employee %<|% Dept) @ \section{Consensus Relations} \label{sec:consensus} Consensus relations \dQuote{synthesize} the information in the elements of a relation ensemble into a single relation, often by minimizing a criterion function measuring how dissimilar consensus candidates are from the (elements of) the ensemble (the so-called \dQuote{optimization approach}), typically of the form $\Phi(R) = \sum w_b d(R_b, R) ^ e$, where $d$ is a suitable dissimilarity measure, $w_b$ is the case weight given to element $R_b$ of the ensemble, and $e \ge 1$. Such consensus relations are called \dQuote{central relations} in \cite{relations:Regnier:1965}. For $e = 1$, we obtain (generalized) medians; $e = 2$ gives (generalized) means (least squares consensus relations). Consensus relations can be computed by \codefunind{relation\_consensus}, which has the following built-in methods. Apart from Condorcet's and the unrestricted Manhattan and Euclidean consensus methods, these are applicable to ensembles of endorelations only. \begin{description} \item[\code{"Borda"}] the consensus method proposed by \cite{relations:Borda:1781}. For each relation $R_b$ and object $x$, one determines the Borda/Kendall scores, i.e., the number of objects $y$ such that $y R_b x$ (\dQuote{wins} in case of orderings). These are then aggregated across relations by weighted averaging. Finally, objects are ordered according to their aggregated scores. \item[\code{"Copeland"}] the consensus method proposed by \cite{relations:Copeland:1951} is similar to the Borda method, except that the Copeland scores are the number of objects $y$ such that $y R_b x$, minus the number of objects $y$ such that $x R_b y$ (\dQuote{defeats} in case of orderings). \item[\code{"Condorcet"}] the consensus method proposed by \cite{relations:Condorcet:1785}, in fact minimizing the criterion function $\Phi$ with $d$ as symmetric difference distance over all possible relations. In the case of endorelations, consensus is obtained by weighting voting, such that $x R y$ if the weighted number of times that $x R_b y$ is no less than the weighted number of times that this is not the case. Even when aggregating linear orders, this can lead to intransitive consensus solutions (\dQuote{effet Condorcet}). \item[\code{"CS"}] the consensus method of \cite{relations:Cook+Seiford:1978} which determines a linear order minimizing the criterion function $\Phi$ with $d$ as generalized Cook-Seiford (ranking) distance and $e = 1$ via solving a linear sum assignment problem. % Using \code{"symdiff/\var{F}"} does not italicize the F ... \item[\code{"symdiff/$F$"}] an exact solver for determining the consensus relation by minimizing the criterion function $\Phi$ with $d$ as symmetric difference distance (\dQuote{symdiff}) and $e = 1$ over a suitable class (\dQuote{Family}) of endorelations as indicated by \var{F}, with values: \begin{description} \item[\code{G}] general (crisp) endorelations. \item[\code{A}] antisymmetric relations. \item[\code{C}] complete relations. \item[\code{E}] equivalence relations: reflexive, symmetric, and transitive. \item[\code{L}] linear orders: complete, reflexive, antisymmetric, and transitive. \item[\code{M}] matches: complete and reflexive. \item[\code{O}] partial orders: reflexive, antisymmetric and transitive. \item[\code{S}] symmetric relations. \item[\code{T}] tournaments: complete, irreflexive and antisymmetric (i.e., complete and asymmetric). \item[\code{W}] weak orders (complete preorders, preferences, \dQuote{orderings}): complete, reflexive and transitive. \item[\code{preorder}] preorders: reflexive and transitive. \item[\code{transitive}] transitive relations. \end{description} These consensus relations are determined by reformulating the consensus problem as an integer program (for the relation incidences), see \cite{relations:Hornik+Meyer:2007} for details. The solver employed can be specified via the control argument \code{solver}, with currently possible values \code{"glpk"}, \code{"lpsolve"}, \code{"symphony"}, or \code{"cplex"} or a unique abbreviation thereof, specifying to use the solvers from packages \pkg{Rglpk} \citep[default]{relations:Theussl+Hornik:2017}, \pkg{lpSolve} \citep{relations:Buttrey:2005}, \pkg{Rsymphony} \citep{relations:Harter+Hornik+Theussl:2017}, or \pkg{Rcplex} \citep{relations:Bravo+Theussl:2016}, respectively. Unless control option \code{sparse} is false, a sparse formulation of the binary program is used, which is typically more efficient. For fitting equivalences and weak orders (cases \code{E} and \code{W}) it is possible to specify the number of classes $k$ using the control parameter \code{k}. For fitting weak orders, one can also specify the number of elements in the classes via control parameter \code{l}. \item[\code{"CKS/$F$"}] an exact solver for determining the consensus relation by minimizing the criterion function $\Phi$ with $d$ as Cook-Kress-Seiford distance (\dQuote{CKS}) and $e = 1$ over a suitable class (\dQuote{Family}) of endorelations as indicated by \var{F}, with available families and control parameters as for methods \code{"symdiff/\var{F}"}. \item[\code{"PC/$F$"}] an exact solver for determining the consensus relation of an ensemble of crisp endorelations by minimizing the criterion function $\Phi$ with $d$ as (generalized) paired comparison (\dQuote{PC}) distance and $e = 1$ over a suitable class (\dQuote{Family}) of crisp endorelations as indicated by \var{F}, with available families and control parameters as for methods \code{"symdiff/\var{F}"}, and control option \code{delta} for specifying the paired comparison discrepancies. \item[\code{"manhattan"}] the (unrestricted) median of the ensemble, minimizing $\Phi$ with $d$ as Manhattan (symmetric difference) distance and $e = 1$ over all (possibly fuzzy) relations. \item[\code{"euclidean"}] the (unrestricted) mean of the ensemble, minimizing $\Phi$ with $d$ as Euclidean distance and $e = 2$ over all (possibly fuzzy) relations. \item[\code{"majority"}] a generalized majority method for which the consensus relation contains of all tuples occurring with a relative frequency of more than $100 p$ percent (of 100 percent if $p = 1$). The fraction $p$ can be specified via the control parameter \code{p}. By default, $p = 1/2$ is used. \end{description} For the Condorcet, CS, symdiff, CKS and PC methods, one can obtain a relation ensemble with \emph{all} consensus relations by setting the control parameter \code{all} to \code{TRUE}. In the following, we first show an example of computing a consensus equivalence (i.e., a consensus partition) of 30 felines repeating the classical analysis of \cite{relations:Marcotorchino+Michaud:1982}. The data comprises 10 morphological and 4 behavioral variables, taken here as different classifications of the same 30 animals: <>= data("Felines") relations <- as.relation_ensemble(Felines) @ Now fit an equivalence relation to this, and look at the classes: <>= E <- relation_consensus(relations, "symdiff/E") ids <- relation_class_ids(E) split(rownames(Felines), ids) @ %% Next, we demonstrate the computation of consensus preferences, using an example from \citet[pp.~48ff]{relations:Cook+Kress:1992}. The input data is a \dQuote{preference matrix} of paired comparisons where entry $(i,j)$ is one iff object~$x_i$ is preferred to object~$x_j$ ($x_i \succ x_j$). We set up the corresponding `$\prec$' relation. <>= pm <- matrix(c(0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0), nrow = 5L, byrow = TRUE, dimnames = list(letters[1:5], letters[1:5])) R <- as.relation(t(pm)) relation_incidence(R) relation_is(R, "tournament") @ Next, we seek a linear consensus order: <>= L <- relation_consensus(R, "symdiff/L") relation_incidence(L) @ or perhaps more conveniently, the class ids sorted according to increasing preference: <<>>= relation_class_ids(L) @ % Note, however, that this linear order is not unique; we can compute \emph{all} consensus linear orders, and also produce a comparing plot (see Figure~\ref{fig:plot2}): <>= L <- relation_consensus(R, "symdiff/L", control = list(all = TRUE)) print(L) if(require("Rgraphviz")) plot(L) @ % Quite annoyingly, object~$c$ comes out first and last, respectively: <<>>= lapply(L, relation_class_ids) @ Finally, we compute the closest weak order with at most 3 indifference classes: <>= W3 <- relation_consensus(R, "symdiff/W", control = list(k = 3)) relation_incidence(W3) relation_class_ids(W3) @ % (Note again that this is not unique; there are 6 consensus weak orders with $k = 3$ classes, which can be computed as above by adding \code{all = TRUE} to the \code{control} list.) \begin{figure}[h] \begin{center} <>= if(require("Rgraphviz")) plot(L) @ \caption{Hasse Diagram of all consensus relations (linear orders) for an example provided by Cook and Kress.} \label{fig:plot2} \end{center} \end{figure} % \section{Outlook} % \label{sec:outlook} % In addition to the dQuote{crisp} relations descrbied, % there is also the notion of fuzzy relations, for which each tuple is % contained in the graph with a certain membership value. (I.e., the % graph is a fuzzy subset of the Cartesian product of the elements of % the domain.) Basic support for fuzzy relations will be added % eventually. % \subsubsection*{Acknowledgments} % We are grateful to Walter B\"ohm for providing efficient C code for % solving assignment problems. {\small \bibliographystyle{abbrvnat} \bibliography{relations} } \printindex{} \end{document}