(toiminnot)

hwechtla-tl: Good dynamic typing

Kierre.png

Mikä on WikiWiki?
nettipäiväkirja
koko wiki (etsi)
viime muutokset


(nettipäiväkirja 15.10.2018) There's been already for quite a while, in programming languages, a megatrend towards types, more complex and complete types, proving program properties, and catching problems in an early stage based on static (expression-bound) types. This development was probably caused by the growing popularity of very good (Hindley-Milner) kind of type systems with type inference and parametric polymorphism. The approach was first popularised by ML languages, but now the flagship is Haskell, whose ghc implementation is constantly pushing the border of well-typed programs to include more and more programs.

So I'm probably very much in the minority when I think that all this hassle around type systems is a little bit silly. I'm an old-school Lisper, and while Ocaml was my first functional language of choice (and I still like it), Scheme is the language that I really like. I find it especially silly that the question between static (expression-bound) and dynamic (value-bound) typing is made into a question between languages; after all, you can write a dynamically typed implementation of any statically typed language; and you can statically prove expression properties, such as ranges of possible values, for any program written in a dynamically typed language.

So what is the commotion about? When type systems become more complex, the number of programs within them approaches dynamically typed programs; but at the same time, the types themselves become more complex, the error messages harder and harder to understand, and the level of expertise required to use the type system to productively prove your program's correctness goes right through the roof. It's not like many programmers would like to ensure their program's correctness by dependent typing; they've just heard it's possible, and they think it's "cool".

I recently came to understand that dynamically typed languages have a bad reputation exactly for the same reason that statically typed languages had 20 years ago: there are very many examples of bad dynamic type systems, and relatively few good ones. This leads people to think that dynamic type systems are, by nature, something like that of JavaScript or PHP (or maybe Clojure): complex, magical, with very many automatic conversions between different data types and almost no guarantees that a particular error (such as miscalculated index into an array) will ever get caught.

Well, not surprisingly, dynamically typed languages need not be like that. Their type system need not have complicated rules for how values are converted into booleans or how collections are converted to each other. In my opinion, automatic conversions are the thing that kills both code readability and maintainability. Coincidentally, they introduce the kind of ad-hoc polymorphism that makes it almost impossible to reason about the behaviour of programs statically (a standard library whose functions accept almost anything as input has a similar impact).

For instance, think about the function circle_area r = pi * r * r. If the standard library "*" operator/function accepts all kinds of crap as input (say, singleton lists, or a string and an integer as does Python's), there's no telling what kind of input circle_area can take. It's very hard to signal errors then, static or not. If, on the other hand, "*" requires its operands to be numbers, we can infer circle_area to have the type (number -> number) -- even if we didn't know on beforehand what "*" returns!

I'd like to present you the type system of Scheme, and in particular, R5RS scheme (because of its simplicity).

Scheme's type system has 10 distinct types, if we count the empty list as a distinct type (https://schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%_sec_3.2). Every value belongs to one of these types. There is no "null" or "nil" or "undef" in the traditional sense to mark the absence of a value. No value has a double use; the empty list is distinct from the symbol "nil", which is distinct from the value #f (false). For (if), every non-#f value (including different kinds of number 0) counts as true; #f is the only false value.

Scheme does make an interesting experiment in ad-hoc polymorphism. Scheme's math primitives are allowed to convert between its numeric types. Instead of specifying that int*int must be an int, Scheme requires that it preserves exactness: if the inputs were exact, so must be the result too. Exactness is the only manifest type trait of numbers; as for representation, the number 15 satisfies not only integer?, but also rational?, real? and complex?. Working with Haskell makes me miss Scheme's number system a lot.

How can this type system be extended? There are two ways: type tags, and interfaces. Neither of these need any specific support from the language.

Type tags are used for creating transparent data types (data structures whose value can be inspected and destructured). It usually works by creating a compound data structure (list or vector) whose first element is a "type tag" that tells which type it is. These type tags can be symbols which provide a nice and readable way to tag things, or they can be pairs, which are guaranteed to be distinct -- so you cannot end using a "glass" object (referring to the kitchenware) as a "glass" object (referring to the building material). Either way, you get an object that can easily be told apart from the other types. Of course, it satisfies pair? (or vector?) at the same time, but you don't want any procedure to be so ad-hoc polymorphic in its input that it would do something with lists and something else with your user-defined type.

Interfaces are records of values that guarantee some specific set of semantics. For instance, if you want to abstract over all kinds of things that can be combined pairwise and have a left and right identity element (Monoids, as mathematics tells us), you can define a compound data structure (say, a pair) with two fields: the identity element, and the combination operation as a function.

(define a-monoid-for-numbers (cons 1 *))
(define a-monoid-for-lists (cons '() append))

You can then use this kind of interfaces to produce code that is abstracted over data types:

(define (fold list monoid)
  (if (null? list) (car monoid)
    ((cdr monoid) (car list) (fold (cdr list) monoid))))

(fold '((a b) (c d) (e)) a-monoid-for-lists)  ;; => (a b c d e)
(fold '(1 2 3) a-monoid-for-numbers)  ;; => 6

Very often the interface only has a single value, so it can be modeled by that value. For instance, every functor is properly modeled by its "fmap" operation. It's very simple.

The magic of such a language is that you can write your programs whichever way you like, but still most of your functions can be statically proven to work or not to work. Errors get checked, not propagated; and it is relatively easy to write generic code (code that works for whatever kind of data), since there could not be any unprecedented types.


kommentoi (viimeksi muutettu 24.10.2018 14:07)