string builder

string builder
Photo by Taha Mazandarani / Unsplash

We twisted our arm a little bit to make stuff in a second allocator and then copy the result back in to the first allocator. It'd be better if it were allocated in the first allocator to begin with and we didn't have any temporary stuff at all.

One way we can achieve that is with a bespoke object used to build strings. We're going to go through the exercise of designing that - and then may be never use it.

In some languages they have what's known as a 'string builder' which you give it data and it will append it to a string buffer and at the end you can ask for the string buffer. Smalltalk-80 allows you to create a 'writable stream' on top of a string buffer and it will do the same, except in a more generic way since you can build a WriteStream on top of any kind of collection that can grow.

builder: StringBuilder.
builder append: 'Hello'.
builder space.
builder append: 'World'.
hello-world := builder string.

Compare that with the Smalltalk-80-ish approach:

stream: (WriteStream of: String).
stream append: 'Hello'.
stream space.
stream append: 'World'.
hello-world := stream contents.

The builder here is more generic and therefore more useful. This is a genuinely good solution to a lot of problems. But we already have template strings in STZ, so the code could look like this:

hello := 'Hello'.
world := 'World'.
hello-world := `~{hello} ~{world}`.

The main advantage here is that since both hello and world are constants the whole thing can be resolved at compile time. But let's assume we cannot do that - what could would it write for you? well, basically the stream code above.

The only gotcha here is how allocation happens with the WriteStream. A List has a capacity and a length where the capacity can be ≥ length. It allocates more than it needs to avoid lots of little allocations constantly.

Should a WriteStream not basically be a List and if so what about the wasted allocation? Well, the first question is easy to answer - no, an output part of IO can have specialised methods for its underlying buffer such as String to allow methods like space to work and make sense. List should not do that.

The second part is an artifact of our modern world. The allocation is not necessarily a waste given memory allocations will be aligned. The algorithm for deciding by how much to grow, on the other hand, is something we should be able to configure.

One neat trick we haven't explored yet is using a generic interface to configure another class without it taking up any space. Let's try and do it two ways:

WriteStream :: {
  #specialise: [] [
    buffer-class := List of: Any.
    growth-strategy := DefaultGrowthStrategy],
  #specialise: [of: buffer-class] [Any -> ø |
    growth-strategy := DefaultGrowthStrategy],
  #specialise: [of: buffer-class growth-strategy: growth-strategy] [],

  buffer: buffer-class}

We can add the growth strategy as part of an ever growing set of parameters we can use to configure a new WriteStream. This isn't ideal as the keywords will get lengthy and wordy and hard to follow. Instead we can pass in a Configuration parameter.

WriteStreamConfig :: {
  growth-strategy: Integer, Integer -> Integer}.

DefaultWriteStreamConfig :: {growth-strategy: DefaultGrowthStrategy}.

DefaultGrowthStrategy ::
 [current: Integer, overflow: Integer -> allocate: Integer |
  required := current + overflow.
  target := current * 2.
  required ≥ target
    then: [allocate current * 2]
    else: [DefaultGrowthStrategy evaluate: {target, required - target}].

StrictGrowthStrategy ::
  [current: Integer, overflow: Integer -> Integer | current + overflow].

WriteStream :: {
  #specialise: [] [
    buffer-class := List of: Any.
    config := DefaultWriteStreamConfig],
  #specialise: [of: buffer-class] [Any -> ø |
    config := DefaultWriteStreamConfig],
  #specialise: [of: buffer-class config: config] [],

  buffer: buffer-class}.

[main] [ -> ø |
  stream: (WriteStream of: String config: {growth-strategy: StrictGrowthStrategy})].

Adding new kinds of configuration to the WriteStream is as simple as adding a new key to WriteStreamConfig. Note that this isn't limited to specific kinds of data - we're instantiating a WriteStreamConfig at compile time using the interpreter and feeding that information in to a specialised version of WriteStream for it to build its methods, again, at compile time.

For a string write stream we might always prefer a strict growth strategy. I'm not sure that matters all too much. But let's say it did because of embedded device programming. We can alter our specialisation ever so slightly by adding a special case:

WriteStream :: {
  // all the stuff before but also:
  #specialise: [of: buffer-class] [String -> ø |
    config := {
      ...DefaultWriteStreamConfig,
      growth-strategy: StrictGrowthStrategy}]

  buffer: buffer-class}.

Now if we're building a WriteStream of: String we will get the strict growth strategy by default.

What if we added a new kind of container, such as MyCustomStringThing and we wanted custom WriteStream specialisation for that too?

Aha! we have discovered a design flaw in the language. You can add new methods that specialise to MyCustomStringThing very easily. But now specialised variants of existing classes.

To fix this there's four things we could do. 1- allow "extensions" to the definition of WriteStream. 2- allow "monkey patching" of WriteStream. 3- move class specialisation methods out of the class definition. 4- say no and require a subtype of WriteStream to implement your specialisation.

Right off the bat I have to say I'm not a huge fan of option 4. Adding more types to the system makes it harder to learn the system and maintain it going forward. Do you need to use WriteStream or MyCompanyWriteStream for this particular buffer class?

Option 1 is also not something I'm keen on. If you want to extend the definition of a class then I would say that is a subtype and you are back to Option 4 where you need to ask if this even makes sense to do for something like WriteStream.

Option 2. I've been a programming language implementor for quite a while. Monkey patching is useful but also an absolute nightmare when it comes to upgrading from an older version of the language to a newer one. You have to check every single monkey patch you've done against changes in the language. It is utter hell.

....which leaves us with only Option 3. And so we will come back to this in a future post (possibly the next) and see if we can't make something sensible of it without reinventing the Smalltalk-80 Metaclass design.