The abominable equality operator!

Posted on November 18th, 2014

Ok…so maybe the title gives this post an element of hyperbole, but give me time to explain myself. Awhile ago whilst programming in ML, I came across this warning when trying to compare two datatypes using the equality operator:
stdIn:4.1-4.4 Warning: calling polyEqual
Though in most cases innocuous, this is a warning worth taking note of.

In primitive languages such as C or Assembly, all data is first-order therefore elements generally support all operations available to other elements. By contrast, in abstract programming languages (e.g. ML) not all types are this versatile, equality completely unsupported for some.

Real numbers are another example of values which equality evaluation is forbidden. Typing (for examples sake) 7.7 = 7.7 into the REPL infers this error:

stdIn:1.2-1.11 Error: operator and operand don't agree [equality type required]
operator domain: ''Z * ''Z
operand: real * real
in expression:
7.7 = 7.7

ML prohibits equality comparisons for values of type real due to the fact that floating numbers are stored in memory as approximations, not as exact representations, therefore a strict equality operation is infeasible (debate continues to rage on the feasibility of floating-point number equality…if you thoroughly understand the blog post in the preceding hyperlink, you need to drop a ‘Numerics for Dummies’ comment down below for me 🙂 ).  With this error message the REPL is informing us that the equality operator is only applicable to equality types (operator domain: ''Z * ''Z).  Note the polymorphic type must be ”Z, not just ‘Z. A double apostrophe signals that the polymorphic type must also be an equality type.

As aforementioned, this warning is not always considered a big deal, with some even advising you turn off the PolyEqual warning (-Control.polyEqWarn := false;). However for unique datatypes, I would advise to rather do away with the equality operator. To bare true efficacy, equality tests must not always be confined to the constrained equality operator.  Furthermore, ML forbids equality testing for function types and abstract types. An abstract type only provides the operations specified in its definition, hiding the representations equality test as it rarely coincides with the desired abstract equality. In other words due to the multifaceted nature of Trees, the use of the equality operator to compare two Trees (if it was prohibited, just using as an example) wouldn’t produce the same results achieved if the Tree’s nodes were traversed manually.

So having almost entirely removed the equality operator from our repertoire, what’s the way forward when comparisons are needed when using unique types? Pattern matching!

Instead of rambling on I think this would be a good place to show a simple example to conclude this discourse. Suppose we wanted to define a card in a game of solitaire:

datatype suit = Clubs | Diamonds | Hearts | Spades

datatype rank = Jack | Queen | King | Ace | Num of int

So a card type consists of two variables, suit and rank (suit * rank). If we wanted to write a function which compares two cards,  what’s the best way to execute our code without initiating the PolyEqual warning? A good, even superior analogue for equality is pattern matching. All we need is the main equal_cards function accompanied by two ‘helper’ functions, equalSuit and equalRank which compare the cards suit and rank values respectively, and our equality-free function is complete!

fun equalSuit(suit1, suit2) =
  case (suit1, suit2) of (Clubs, Clubs) => true
    | (Diamonds, Diamonds) => true
    | (Hearts, Hearts) => true
    | (Spades, Spades) => true
    | _ => false
fun equalRank(rank1, rank2) =
  case (rank1, rank2) of (Num i, Num j) => i=j
    | (Jack, Jack) => true
    | (Queen, Queen) => true
    | (King, King) => true
    | (Ace, Ace) => true
    | _ => false
fun equal_cards ((suit,rank), (suit',rank')) =
  equalSuit(suit, suit') andalso equalRank(rank,rank')

By making each branch of the case expression compare whether a Double is homogeneous (containing two suits/ranks of the same kind) we extinguish any ambiguity which may have arisen had we used the equality operator. Generally any test of parity is best done in this way, especially for more sophisticated data types.

I hope you enjoyed the post and found it informative. What are your views? What would you have done differently? Post your comments and code down below! All [constructive] comments welcome!!