Contents

Class RunArray


Inheritance:

Object  »
  Collection  »
    SequenceableCollection  »
      ArrayedCollection  »
        RunArray

A RunArray provides space-efficient storage of data which tends to be constant over long runs of the possible indices. Essentially repeated values are stored singly and then associated with a "run" length that denotes the number of consecutive occurrences of the value.

This class was part of Smalltalk-80. It worked on computers with 16 MHz clock frequency and it is certainly one of the best pieces of code ever written. It is also one of the most used ones: For more than twenty years, instances of RunArray did the hard work behind the scene each time a piece of text was edited or displayed. Even today, more than twenty years after its first publication the definition of RunArray is still admirably perfect.

A RunArray is used to store display properties of a piece of Text in a compact manner. The key idea is to create a new record only when a display property changes. The record of display properties that is associated with the character at position i is also valid for all following characters until a character with a different properties record is encountered.

A run is a maximal sequence of characters that have identical display properties.

A RunArray stores the lengths of the runs, not their start and end positions in the string. The positions are computed each time they are needed. The decision to store run lengths was made to support quick editing: The insertion of a single character changes the length of only one run, but it changes the positions of all the following runs in the text. The chosen representation favors quick editing at the cost of displaying, which is slightly slowed down by the necessity to compute run positions.

It goes almost without saying that RunArray is able to automatically merge two runs into one if they become neighbors after a delete operation or a style change.

Instance variables:

Instance Protocol:

All enumeration methods of RunArray that use the method do: evaluate the block once for each element in the RunArray, not once for each run! This is an important difference. Look at this example:

  | runArray cnt result |
   runArray := RunArray new.
   runArray addLast: #hello times: 5000.
   cnt := 0.
   result := runArray allSatisfy:
        [:element | cnt := cnt + 1.
            element = #hello
        ].
  (Array with: result with: cnt) inspect

There is a method runsAndValuesDo: that can be used to execute a block once per run. Look at this code:

    | runArray cnt result |
   runArray := RunArray new.
   runArray addLast: #hello times: 5000.
   cnt := 0.
   result := true.
     runArray runsAndValuesDo:
        [:runLength :value | cnt := cnt + 1.
            result := result & (value = #hello)
        ].
  (Array with: result with: cnt) inspect

It is tempting to redefine some enumeration methods in RunArray. As an example, here is a local redefinition for allSatisfy:

allSatisfy: aBlock

   "Evaluate aBlock with the elements of the receiver.
    If aBlock returns false for any element return false.
    Otherwise return true."

    values do: [:each | (aBlock value: each) ifFalse: [^ false]].
   ^ true 

Note that for blocks with side effects (like the ones in the examples above) this method will exhibit a behavior that differs from the behavior of the inherited method. Note also that is is not possible to simply redefine the method do: in RunArray since this would change the behavior of methods collect:, select: and reject:.


Implementation Notes:

The values of the cache variables are set each time an element is accessed. The value ranges of these variables are:

 lastIndex between: 1 and: self size
 lastRun between: 1 and: runs size
 lastOffset between: 0 and: (runs at: lastRun) - 1

The following expression holds after execution of access method at:

   | lengthOfPreviousRuns |
 lengthOfPreviousRuns
   := (1 to: lastRun - 1)
        inject: 0
        into: [:sum :runIdx | sum + (runs at: runIdx)].

 lastIndex = (lastOffset + lengthOfPreviousRuns + 1)

This is the postcondition of at:

The instance method
   at: idx setRunOffsetAndValue: aBlock
is used to map an index idx into a run. The method parameter aBlock requires three block arguments:
   [:run :offset :value | <instructions>]
where


Contents