count vowels

count vowels
Photo by Brett Jordan / Unsplash

I think it's a good idea to write simple programs from time to time to see what you might have right or wrong with the language design. So let's try writing a program that counts the number of vowels in a string.

vowels :: 'aeiou'.
[string count-vowels]
  [string: String -> return: UnsignedInteger |
    count := 0.
    string
      where> [character | vowels includes: character]
         do> [character | count += 1].
    return count].

'qwertuiop' count-vowels // 4

Let's see if we can count up the number of each kind of vowel next:

vowels :: "aeiou".
[string vowel-counts-into: counts]
  [string: String, counts: &Map[String, UnsignedInteger] -> ø |
    vowels do> [vowel | counts[vowel] = 0].
    string
      where> [character | vowels includes: character]
         do> [character | counts[character] += 1]].

counts := {&Map[String, UnsignedInteger] | new} -- free.
'qwertuiop' vowel-counts-into: counts.

// {"a": 0, "e": 1, "u": 1, "i": 1, "o": 1}

Reviewing this second method I do not like the way we create a map on the heap and I do not like the deferred syntax for freeing it.

Starting with the 'create something on the heap' - the simplicity of Smalltalk-80 comes to mind immediately. I can write Dictionary new and I'm done. Of course, there's no typing so there's no way to put it on the stack.

So let's say that a reference is a known stack size. Sending #new to a type that is a reference-type is an allowable thing. Let's also change the syntax for deferred evaluation to something more ideomatic:

counts := Map[String, UnsignedInteger] new.
  ...[counts free].

'qwertuiop' vowel-counts-into: counts.

The return type of new is always going to be a reference. It might be a very simple method. Let's try and implement it:

[type new]
  [type: Class -> return: &type |
    return {&type | offset: __context allocator allocate: type}]

That looks suspiciously easy. It also incorporates the use of the special context variable to get the allocator. Failing to allocate in this way would result in a panic, so let's also make a version that can fail gracefully:

[type new]
  [type: Class -> return: &type / Failure |
    return {&type | offset: __context allocator allocate: type}]

Whatever failure allocate: might be able to provide, such as OutOfMemory, will now be gracefully passed back to the caller. The safest way to use an allocation is with then:. I don't want to promote endless indenting as a norm in the language, so a closure call that also does the free for you is not something the language should have in its base.

We can take this a step further and specialise the type's type to something that can be initialized (and released):

Initializable :: {Class|}.

[object init] [Initializable -> ø| #required].
[object free] [Initializable -> ø| #required].

[type new]
  [type: Initializable -> &type / Failure | 
    return {&type |
      offset: __context allocator allocate: type,
      then: [init]}]

Here we utilise the ability to not call init if there's a failure; but otherwise call init. Any class type we create that specialises off of Initializable will now have init called when you send new.

Note we cannot send free to it if it's a failure; nor should we be trying to. But that does make the deferred evaluation a little more awkward. We can ensure we have the object we want by assigning using an else: call so the failure goes in to the block and the real value goes in to the temporary. This would be best practice:

[do-the-thing] [ -> return: Failure |
  counts := Map[String, UnsignedInteger] new
    else: [failure | return failure].
       ...[counts free].

  // do stuff
  
  return OK].

Or if things absolutely cannot be allowed to fail we can call the regular new and let it go boom with a panic:

counts := Map[String, UnsignedInteger] new.

// do stuff

...[counts free].

The problem I have with these solutions is using the type information at runtime. It's better if it remains inside the compile time sections of the program. How that looks is a little awkward though:

counts := {&Map[String, UnsignedInteger] | new}

or:

counts: &Map[String, UnsignedInteger] = {new}

or:

counts: &Map[String, UnsignedInteger].
counts new

Let's rewind a second and take a look at what Map really is. It is a pointer to a list that is managed using hash offsets. That means its members are basically a List of associations (keys to values).

A List is made up of a memory slice, a memory allocator, and the number of items in the list (as opposed to the capacity, which is the length of the slice). That's three integers basically. Or three registers. That's tiny. Map should therefore allocate itself on the stack with the intention that it'll be copied.

In which case we don't need to allocate our counts as a reference. Sure, it gets passed as a reference to vowel-counts-into: but that doesn't affect what we're creating. There is no reason for us to put it on the heap. We can even return it as-is.

counts: Map[String, UnsignedInteger].
'qwertyuiop' vowel-counts-into: counts.

So while we could recreate the whole Smalltalk ideom of sending new to a class to create an instance on the heap, we don't necessarily need to or even want to. It's better, more functional, to pass around copies that may just happen to be shared as copy-on-write references. So long as methods correctly mark themselves as wanting a reference, which vowel-counts-into: does, then the system will work.

The only gotcha here is if we need to grow the list inside of the map then we need an allocator. So long as the object is initialized with empty values then the allocator can be detected as non-existent; in which case it should use the __context allocator to update the map. That would be default behaviour for a List.

That does tell us we need a way to tell the compiler to create an object on the stack (or heap) without sanitizing its memory; let it be random bytes because we're intending to overwrite it ourselves.

We have & to indicate it's a reference; we might want to include a special syntax just to indicate there should be no initialization. Or we could use a meta selector like #required and #deleted used in method specialisation.

buffer: !Array[Byte, 1024]

or:

buffer: #uninitialized Array[Byte, 1024]

I guess we'll leave that unanswered for now. We'll have to come back to it and make a decision eventually though. It's important for certain performance oriented code.