Blog of Gavin King

Ceylon 1.1 progress report

Ceylon 1.1 has been in development for 6 months already, so it's way past time for a progress report! Since the release is nearly ready, this is going to have to take the form of a summary of what we've been working on. Well, that's a daunting task, since we've already closed more than 650 issues in the compiler and language module, and 300 in the IDE. Phew!

The top priorities for Ceylon 1.1 were:

  • Finalize and freeze the language module APIs.
  • Clean up and minimize the use of Java and JavaScript native code in the language module.
  • Mop up remaining issues affecting Java interop.
  • Performance.
  • IDE build performance.

Naturally, we've also fixed hundreds of bugs unrelated to those priorities.

Language changes

There have been very few changes to the language, which has been considered stable since last year's 1.0 release. The big new features in 1.1 are:

  • Support for use-site variance.
  • Introduction of a Byte class that may be optimized by the compiler to byte on the JVM.
  • Type inference for parameters of anonymous functions that occur as arguments in positional argument lists.

Other notable changes are:

  • Powerful disjointness analysis for sequence and tuple types.
  • Allow comprehensions to begin with an if clause.
  • New sealed annotation to prevent extension or instantiation of a type outside the module in which it is defined.
  • Introduction of dynamic interfaces, designed for wrapping native JavaScript APIs.
  • Redefined - operator to work for any Invertible.
  • Allow metamodel references to objects and members of objects.
  • try (x) was changed to distinguish between the lifecycles of Obtainable and Destroyable resources.
  • The type of the expression {} is now {Nothing*} instead of [].
  • Allow refinement of multiple overloaded versions of a Java supertype method.
  • Added ability to catch instances of Throwable.
  • Minor adjustment to type argument inference algorithm for covariant and contravariant type parameters.

The last two changes are breaking changes but should not impact very many programs.

Language module changes

For the 1.1 release, we've invested a lot of thought and development effort in the language module, carefully reviewing its design and scope, reducing the use of native code to an absolute minimum, optimizing performance, and picking on anything that looked like a mistake.

Therefore, this release makes several breaking changes, which will impact existing programs. As of Ceylon 1.1, the language module is considered stable, and we won't make further breaking changes.

  • Addition of a raft of new methods and functions for working with streams.
  • Optimization of the performance of Array, along with some minor improvements to interop with Java native arrays.
  • Removal of the Cloneable interface, and addition of a clone() method to Collection.
  • Addition of Throwable.
  • Replacement of Closeable with Obtainable and Destroyable.
  • Correspondence.items() changed to getAll().
  • Various minor changes to the operations of Iterable, List, and Map.
  • ArraySequence is now sealed and may be instantiated via the sequence() function.
  • Substantial redesign of Enumerable and Range.
  • Several changes to the type hierarchy for numeric types.
  • Removal of SequenceBuilder and move of StringBuilder to ceylon.collection.
  • Removal of superfluous functions entries() and coalesce().
  • Removal of LazyList, LazySet, and LazyMap.

Modularity

We're currently investing effort in trying to make it easier to use Ceylon modules outside of the Ceylon module runtime.

  • Ceylon .car archives now include automatically generated OSGi and Maven metadata, and can execute in an OSGi container.
  • New API for cross-platform resource loading.
  • Support for deploying Ceylon modules to Vert.x.

SDK

Notable changes to the SDK include:

  • Introduction of ceylon.locale, ceylon.logging, and ceylon.promise.
  • Many enhancements to ceylon.collection, including addition of ArrayList, TreeSet, TreeMap, and PriorityQueue classes, along with Stack and Queue interfaces.
  • Various improvements to ceylon.dbc.
  • Many improvements to ceylon.test.

The collections module is now considered stable, and its API is frozen.

IDE

Development of the IDE has been extremely active, with many new features and major performance enhancements.

  • Complete rework of build process, for much improved performance.
  • New refactorings: Move Out, Make Receiver, Move to Unit, Extract Parameter, Collect Parameters, Invert Boolean, Safe Delete.
  • Major enhancements to the Change Parameters refactoring.
  • Inline refactoring now works for shared class/interface members.
  • Much better handling of anonymous functions and function references in Extract and Inline refactorings.
  • Brand new high quality code formatter.
  • Rewritten Ceylon Explorer with much better presentation of modules and modular dependencies.
  • New navigation actions: Open in Type Hierarchy View, Go to Refined Declaration.
  • Popup Quick Find References and Recently Edited Files.
  • Graphical Visualize Modular Dependencies.
  • Further integration of "linked mode" with refactorings and quick assists.
  • Useful Format Block source action.
  • Auto-escape special characters when pasting into string literals.
  • Synchronization of all keyboard accelerators with JDT equivalents (by popular request).
  • Save actions in Ceylon Editor preferences.
  • IntelliJ-style "chain completion" (hit ctrl-space twice).
  • Propose toplevel functions applying to a type alongside members of the type.
  • Several new options for customizing autocompletion and appearance in Ceylon Editor preferences.
  • New quick fixes/assists: convert between string interpolation and concatenation, convert to/from verbatim string, add satisfied interfaces, add type parameter, change named argument list to positional, fill in argument names, export module, convert to verbose form refinement, print expression, fix refining method signature, change to if (exists), change module version, assign to for/try/if (exists)/if (nonempty)/if (is).
  • Run As Ceylon Test on node.js.
  • New default color scheme for syntax highlighting and many other aesthetic improvements.

Why I distrust wildcards and why we need them anyway

In any programming language that combines subtype polymorphism (object orientation) with parametric polymorphism (generics), the question of variance arises. Suppose I have a list of strings, type List<String>. Can I pass that to a function which accepts List<Object>? Let's start with this definition:

interface List<T> {
    void add(T element);
    Iterator<T> iterator();
    ...
}

Broken covariance

Intuitively, we might at first think that this should be allowed. This looks OK:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());

Indeed, certain languages, including Eiffel and Dart do accept this code. Sadly, it's unsound, as can be seen in the following example:

//Eiffel/Dart-like language with 
//broken covariance:
void put(List<Object> list) {
    list.add(10);
}
put(ArrayList<String>());

Here we pass a List<String> to a function accepting List<Object>, which attempts to add an Integer to the list.

Java makes this same mistake with arrays. The following code compiles:

//Java:
void put(Object[] list) {
    list[0]=10;
}
put(new String[1]);

It fails at runtime with an ArrayStoreException.

Use-site variance

Java takes a different approach, however, for generic class and interface types. By default, a class or interface type is invariant, which is to say, that:

  • L<U> is assignable to L<V> if and only if U is exactly the same type as V.

Since this is extremely inconvenient much of the time, Java supports something called use-site variance, where:

  • L<U> is assignable to L<? extends V> if U is a subtype of V, and
  • L<U> is assignable to L<? super V> if U is a supertype of V.

The ugly syntax ? extends V or ? super V is called a wildcard. We also say that:

  • L<? extends V> is covariant in V, and that
  • L<? super V> is contravariant in V.

Since Java's wildcard notation is so ugly, we're not going to use it anymore in this discussion. Instead, we'll write wildcards using the keywords in and out for contravariance and covariance respectively. Thus:

  • L<out V> is covariant in V, and
  • L<in V> is contravariant in V.

A given V is called the bound of the wildcard:

  • out V is an upper-bounded wildcard, and V is its upper bound, and
  • in V is a lower-bounded wildcard, and V is its lower bound.

In theory, we could have a wildcard with both an upper and lower bound, for example, L<out X in Y>.

We can express multiple upper bounds or multiple lower bounds using an intersection type, for example, L<out U&V> or L<in U&V>.

Note that the type expressions L<out Anything> and L<in Nothing> refer to exactly the same type, and this type is a supertype of all instantiations of L.

You'll often see people refer to wildcarded types as existential types. What they mean by this is that if I know that list is of type List<out Object>:

List<out Object> list;

Then I know that there exists an unknown type T, a subtype of Object, such that list is of type List<T>.

Alternatively, we can take a more Ceylonic point of view, and say that List<out Object> is the union of all types List<T> where T is a subtype of Object.

In a system with use-site variance, the following code does not compile:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>()); //error: List<String> not a List<Object>

But this code does:

void iterate(List<out Object> list) {
    Iterator<out Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());

Correctly, this code does not compile:

void put(List<out Object> list) {
    list.add(10); //error: Integer is not a Nothing
}
put(ArrayList<String>());

Now we're at the entrance to the rabbit hole. In order to integrate wildcarded types into the type system, while rejecting unsound code like the above example, we need a much more complicated algorithm for type argument substitution.

Member typing in use-site variance

That is, when we have a generic type like List<T>, with a method void add(T element), instead of just straightforwardly substituting Object for T, like we do with ordinary invariant types, we need to consider the variance of the location in which the type parameter occurs. In this case, T occurs in a contravariant location of the type List, namely, as the type of a method parameter. The complicated algorithm, which I won't write down here, tells us that we should substitute Nothing, the bottom type, in this location.

Now imagine that our List interface has a partition() method with this signature:

interface List<T> {
    List<List<T>> partition(Integer length);
    ...
}

What is the return type of partition() for a List<out Y>? Well, without losing precision, it is:

 List<in List<in Y out Nothing> out List<in Nothing out Y>>

Ouch.

Since nobody in their right mind wants to have to think about types like this, a sensible language would throw away some of those bounds, leaving something like this:

List<out List<out Y>>

Which is vaguely acceptable. Sadly, even in this very simple case, we're already well beyond the point where the programmer can easily follow along with what the typechecker is doing.

So here's the essence of why I distrust use-site variance:

  • A strong principle in the design of Ceylon is that the programmer should always be able to reproduce the reasoning of the compiler. It is very difficult to reason about some of the complex types that arise with use-site variance.
  • It has a viral effect: once those wildcard types get a foothold in the code, they start to propagate, and it's quite hard to get back to my ordinary invariant types.

Declaration-site variance

A much saner alternative to use-site variance is declaration-site variance, where we specify the variance of a generic type when we declare it. This is the system we use in Ceylon. Under this system, we need to split List into three interfaces:

interface List<out T> {
     Iterator<T> iterator();
     List<List<T>> partition(Integer length);
     ...
}

interface ListMutator<in T> {
    void add(T element);
}

interface MutableList<T>
    satisfies List<T>&ListMutator<T> {}

List is declared to be a covariant type, ListMutator a contravariant type, and MutableList an invariant subtype of both.

It might seem that the requirement for multiple interfaces is a big disadvantage of declaration-site variance, but it often turns out to be useful to separate mutation from read operations, and:

  • mutating operations are very often invariant, whereas
  • read operations are very often covariant.

Now we can write our functions like this:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());

void put(ListMutator<Integer> list) {
    list.add(10);
}
put(ArrayList<String>()); //error: List<String> is not a ListMutator<Integer>

You can read more about declaration-site variance here.

Why we need use-site variance in Ceylon

Sadly, Java doesn't have declaration-site variance, and clean interoperation with Java is something that is very important to us. I don't like adding a major feature to the typesystem of our language purely for the purposes of interoperation with Java, and so I've resisted adding wildcards to Ceylon for years. In the end, reality and practicality won, and my stubborness lost. So Ceylon 1.1 now features use-site variance with single-bounded wildcards.

I've tried to keep this feature as tightly constrained as possible, with just the minimum required for decent Java interop. That means that, like in Java:

  • there are no double-bounded wildcards, of form List<in X out Y>, and
  • a wildcarded type can not occur in the extends or satisfies clause of a class or interface definition.

Furthermore, unlike Java:

  • there are no implicitly-bounded wildcards, upper bounds must always be written in explicitly, and
  • there is no support for wildcard capture.

Wildcard capture is a very clever feature of Java, which makes use of the "existential" interpretation of a wildcard type. Given a generic function like this one:

List<T> unmodifiableList<T>(List<T> list) => ... :

Java would let me call unmodifiableList(), passing a wildcarded type like List<out Object>, returning another wildcarded List<out Object>, reasoning that there is some unknown X, a subtype of Object for which the invocation would be well-typed. That is, this code is considered well-typed, even though the type List<out Object> is not assignable to List<T> for any T:

List<out Object> objects = .... ;
List<out Object> unmodifiable = unmodifiableList(objects);

In Java, typing errors involving wildcard capture are almost impossible to understand, since they involve the unknown, and undenoteable, type. I have no plans to add support for wildcard capture to Ceylon.

Try it out

Use-site variance is already implemented and already works in Ceylon 1.1, which you can get from GitHub, if you're super-motivated.

Even though the main motivation for this feature was great Java interop, there will be other, hopefully rare, occasions where wildcards will be useful. That doesn't, however, indicate any significant shift in our approach. We will continue using declaration-site variance in the Ceylon SDK except in extreme cases.

UPDATE:

I just realized I forgot to say thanks to Ross Tate for helping me with the finer points of the member typing algorithm for use site variance. Very tricky stuff that Ross knows off the top of his head!

Ranges and slices

I guess we've all seen Dijkstra's famous argument that a range of natural numbers should be expressed using an inclusive lower bound and exclusive upper bound, and that, as a corollary, arrays should be indexed from 0. It's a thought provoking little nugget of reasoning, though it fails to contemplate several objections, including that:

  • The inclusive lower bound/exclusive upper bound combination (let's call that a Dijkstra range) isn't natural for a range which includes negative integers. The range start<=i<=0 would be written as start<=i<1. Dijkstra ranges are nastily asymmetrical!
  • Zero-based indexing is infuriatingly inconvenient when accessing the last element of an array, or when indexing from the end of the array. Who here loves array[length-i-1]? This inconvenience, at least arguably, outweighs the convenience of being able to write 0<=i<length instead of 1<=i<length+1, and thus substantially undermines Dijkstra's case for zero-based indexing, even if we accept his argument for Dijkstra ranges!
  • Two endpoints isn't the only way to specify a range of integers!

Ceylon doesn't have a traditional for loop, and we don't iterate list elements by looping over the indices of the list. Nevertheless, we still need a way to express ranges of integers. Our solution to this problem is a bit different to other languages, and amounts to a partial rejection of Dijkstra's conclusions, so it's worth explaining the reasoning behind it.

Ranges

Our design is premised on the observation that we almost never, in practice, naturally find ourselves with an inclusive lower bound/exclusive upper bound combination. What naturally arises in our program is almost always either:

  • two (inclusive) endpoints, or
  • an (inclusive) starting point and a length.

Using a Dijkstra range, we can express either case without too much discomfort:

  • start<=i<end+1
  • start<=i<start+length

Thus, we can view the traditional use of Dijkstra ranges as a sort of compromise between these two cases: a choice that makes neither option too painful.

But, of course, by clearly distinguishing these two common cases, it becomes clear that both case are amenable to further optimization. Thus, Ceylon provides two options for expressing a range of integers:

  • The spanned range operator expresses a range in terms of its two endpoints as start..end. In the case that end<start, the range is of decreasing values. In the case that end==start, the range has exactly one value.
  • The segmented range operator expresses a range in terms of its starting point and length as start:length. In the case of a nonpositive length, the range is empty.

Thus, a traditional C-style for loop of this form:

for (i=0; i<length; i++) { ... }

is written like this:

for (i in 0:length) { ... }

Now, since integers aren't the only things we can form ranges of, the .. and : operators are generalized to any type T that satisfies the interfaces Ordinal & Comparable<T>. So, for example, we can iterate the letters of the English alphabet like this:

for (c in 'a'..'z') { ... }

Slices

Ceylon goes one better, giving you the choice between:

  • The span operator, written list[start..end] for the elements list[start], list[start+1], ..., list[end].
  • The segment operator, written list[start:length] for the elements list[start], list[start+1], ..., list[start+length-1].

The span and segment operators are defined in terms of the rather abstract interface Ranged and therefore apply to more than just Lists. For example, the platform module ceylon.collection lets you express subranges of a SortedMap or SortedSet using these operators.

About the plus symbol

Operators in Ceylon can't be overloaded. That is to say, I can't redefine an operator like + to do something totally arbitrary, say, add a new element to a list. Instead, the semantics of each operator is defined by the language specification in terms of types defined in the language module. However, many of these types, and therefore the definitions of the operators associated with them, are quite abstract. For example, + is defined in terms of an interface called Summable. So if you want to define your own Complex number class, you just make it satisfy Summable, and you can use + to add complex numbers. We call this approach operator polymorphism.

One of the first things people notice about Ceylon is that, right after singing the praises of not having operator overloading, we go right ahead and use + for string concatenation! I've seen a number of folks object that string concatenation has nothing to do with numeric addition, and that this is therefore an example of us breaking our own rules.

Well, perhaps. I admit that the main motivation for using + for string concatenation is simply that this is what most other programming languages use, and that therefore this is what we find easiest on the eyes.

On the other hand, I don't think there's a strong reason to object to the use of + for concatenation. There's no single notion of "addition" in mathematics. Quite a few different operations are traditionally called "addition", and written with the + symbol, including addition of vectors and matrices.

Generalizing over all these operations is the job of abstract algebra. So I recently spent some time nailing down and documenting how Ceylon's language module types and operators relate to the algebraic structures from abstract algebra.

The following three famous algebraic structures are of most interest to us:

  • A semigroup is a set of values with an associative binary operation.
  • A monoid is a semigroup with an identity element.
  • A group is a monoid where each value has an inverse element.

If the binary operation is also commutative, we get a commutative semigroup, a commutative monoid, or an abelian group.

Finally, a ring is a set of values with two binary operations, named addition and multiplication, where:

  • the ring is an abelian group with respect to addition,
  • the ring is a monoid with respect to multiplication, and
  • multiplication distributes over addition.

Strings with concatenation form a monoid, since string concatenation is associative, and the empty string is an identity element with respect to concatenation. They don't form a group, since there are no inverse strings. Also, string concatenation isn't commutative.

On the other hand, integers with addition form an abelian group. Together with both addition and multiplication, the integers form a ring.

We could have chosen to say that Ceylon's + operator applies only to abelian groups, or perhaps only to groups, or perhaps only to commutative monoids or only to commutative semigroups. But any of those choices would be as arbitrary as any other. Instead, we've decided to say that the interface Summable abstracts over all semigroups, thereby blessing the use of + with any mathematical semigroup. Thus, we can legally use it to denote string concatenation or any other pure associative binary operation.

Furthermore:

  • The interface Invertible abstracts over groups, allowing the use of the - operator with any mathematical group.
  • The interface Numeric abstracts over rings, allowing the use of the * and / operators with any mathematical ring.

Of course, we could have called these interfaces Semigroup, Group, and Ring, and that would have made us feel smart, perhaps, but we're trying to communicate with programmers here, not mathematicians.

Object-oriented != imperative

Dear FP community: one of the things I really like about you folks is the rigor you've brought to the field of programming language design. Compared to the kind of magical and folklore-based thinking we've grown accustomed to in the field of computing, your approach to problems is a massive breath of fresh air. But there's one area where you guys seem to have fallen into some rather fuzzy and unfounded rhetoric. What I'm taking about is your continued conflation of object orientation with imperative programming.

When we program with classes and objects, we have the choice between expressing ourselves using:

  • mutable objects, or
  • immutable objects.

This is no different to programming using functions, where we have the choice between:

  • impure functions, or
  • pure functions.

The use of mutable objects and impure functions is called imperative programming and is the norm in business and scientific computing. Another way to describe it is "programming with side-effects". Savvy programmers from both the OO and FP traditions have long understood that side-effects make it more difficult to reason about complex code, and that a design based upon immutable objects and/or pure functions is often superior.

Indeed, I've been hearing the advice to prefer immutable objects, and avoid side-effects, for as long as I've been programming using object-oriented languages.

Oh, but wait, you might respond: isn't my point totally specious, when so many of the objects that people actually write in practice are actually mutable? Well, no, I don't think so. The truth is, that almost as many of the functions that people actually write in practice are impure. Programming with functions, and without objects, does not in itself innoculate us against side-effects. Indeed, the disciplined use of immutability that we see in some parts of the FP community is simply not the norm in business or scientific computing, even in languages which don't support object orientation. Trust me, the old Fortran code I used to mess about with when I was doing scientific computing work back in university was certainly no more free of side-effects than typical object-oriented code I've worked with since.

Perhaps a source of the confusion here is that we say that objects "hold state and behavior". When some people who aren't so familiar with OO read this, they imagine that by "state", we mean mutable state. But that's not quite right. What this statement is saying is that an object holds references to other objects, along with functions that make use of those references. Thus, we can distinguish one instance of a class from another instance, by the different references (state) it holds. We don't mean, necessarily that those references are mutable, and, indeed, they're very often immutable, especially in well-designed code.

"So", replies my imaginary FP interlocutor, "how then would such an immutable object be any different to a closure?" Well, at some level it's not any different, and that's the point! There's nothing unfunctional about objects! You can think of a class as a function that returns the closure of its own exported nested declarations. (Indeed, this is precisely how we think of a class in Ceylon.)

Among the modern languages, what really distinguishes a language as object-oriented is whether it supports subtype polymorphism (or structural polymorphism, which I consider just a special kind of subtyping). Thus, some languages that people consider "functional" (ML, Haskell) aren't object-oriented. Others (OCaml, F#) are.

A request: FP folks, please, please improve your use of terminology, because I've seen your misuse of the term "object-oriented" creating a lot of confusion in discussions about programming languages.