Wadler's Blog: Representation Pain

22.1.08

Representation Pain

Some representations are easier to use than others. Ease of use also seems to depend on purpose and inclination. S-expressions are excellent for many purposes, but the associative law is far easier to read in an infix notation. Compare


  (= (+ x (+ y z)) (+ (+ x y) z))

and


  x + (y + z)  =  (x + y) + z.

My taste puts S-expressions far above XML, but given the relative popularity of the two, I presume for some it is the exact opposite.

Ezra Cooper and I observed the importance of the pain of representation in connection with mapping function calls into path names. If every function takes a fixed number of arguments, one can map a call directly into a path name. For example,


  f(g(a,b), h(c))

becomes


  f/g/a/b/h/c

although the second is less clear to read. We could map open and close brackets into the path


  f/open/g/open/a/b/close/h/open/c/close/close

but this so painful as to be worse than useless.

There must be a psychological theory that can underpin such choices. A quick web search failed to turn up anything apposite, suggestions for the correct terms to search on would be welcome.

# posted by Philip Wadler @ 10:54 AM

Comments:

The infix associativity law does have a couple fewer characters, but once you're used to prefix notation, they both read pretty easily, particularly if you break them across lines. I can't seem to get a fixed-width font, but, e.g.,:

(= (+ (+ a b) c)
(+ a (+ b c)))

Most of us have years of mathematical training that makes us used to reading "a + b", but when you start introducing new operators like a :: b @ c, the infix becomes pretty demanding on the human to parse, since they must remember the precedences and associativity of all the operators.

# posted by

Anonymous : 22/1/08 2:30 PM

I prefer S-expressions over
XML when coding, because
Scheme/Lisp editors allow you
to navigate S-expressions easily.
XML is just too verbose.

If you're reading code, it can
be hard to see where an S-expression ends. XML end tags
make the lexical extent of a tag
obvious.

If you're sending data over the
wire, and no human is involved,
S-expressions are a clear win.
XML's redundant end tags eat a
lot of bandwidth. But the world
hasn't seen the light.

# posted by

Paul Steckler : 22/1/08 5:00 PM

The keywords you want are "Cognitive Dimensions". SimonPJ discussed them in http://research.microsoft.com/~simonpj/Papers/excel/excel.pdf, and http://www.cl.cam.ac.uk/~afb21/CognitiveDimensions/papers/Green1989.pdf seems to be the original paper on the idea.

Paul.

# posted by

Paul Johnson : 23/1/08 3:51 AM

I think the issues here extend beyond "It's what we're used to". For common, binary operators, infix is better, because it reduces the spatial distance between the operation and its operators. With infix, the operands are always immediately to the left and right, making it's arguments quite apparent, while

The issues of operator precedence comes up, however, people are pretty good at the sort of cascading association necessary. Haskell probably has the most intricate operator precedence, and with a bit of learning the patterns, it's second nature. You certainly don't have to keep recalling the precedence orders.

The popularity of combinators, is, I think, evidence for this. It's more than just the ability to use a concise symbolic name.

An obvious limitation is that you only get two operands. You'll notice that when math needs more than two operands, you'll get vertical displacements. Of course, you sometimes get this with plain binary operators - division and exponentiation, to further visually differentiate them.

I actually think such expansion into the 2nd dimension is a natural path for future programming languages. Not quite visual, connect-the-boxes sort, but math-notation sort.

# posted by

mgsloan : 23/1/08 4:14 AM

Modularity can be applied to syntax:

Put things close to each other that are related; put things far from each other that are not related.

Whether something is related or not depends on the use - it is not an intrinsic property, but determined by the task.

Because of this, there can be no "one true way" or representation. People who think so only have one task in mind...

In syntax, "close" includes physical closeness (number of chars), but also in appearance (eg. upper-case for classes).

# posted by

Brendan : 23/1/08 4:21 AM

Another contender, much-overlooked these days, is the Peano "dot" syntax. This uses more dots for lower precedence,

x:+:y.+.x:.=:.x:+:y.+.z,

and could immediately translate into font size.

# posted by

Anonymous : 23/1/08 8:26 PM

Another data representation choice,
besides XML and S-expressions, is JSON, which has emerged as a lightweight, compact format to send data over the wire.

My understanding is that JSON is
almost equivalent to Javascript data representations. In that sense, it's like S-expressions. I don't believe you can write code to be eval'd in JSON, though.

# posted by

Paul Steckler : 26/1/08 8:23 PM

Ah, but (= (+ a b c) (+ a b c)) seems best of all.

Preston

# posted by

Preston : 28/1/08 4:47 PM

Many thanks to Paul Johnson for providing the phrase I was looking for, 'cognitive dimensions', and the citation to Green.

# posted by

Philip Wadler : 5/2/08 3:11 PM

just for laughs here's a representation for the unix users

f/g/a/../b/../../h/c

# posted by

Anonymous : 18/2/08 8:26 PM

Wadler's Blog

22.1.08

Representation Pain

About Me

archives