TL;DR
A general method for incremental recomputation of a pure function.
If you’ve already computed , this is a method for efficiently computing , where makes only a small change to .
Background & Motivation
A significant portion of software developed today follows what we will refer to as the isolated state paradigm, which recommends that in a software application all the state should be conglomerated and managed in exactly one location, and all other aspects of the program should be expressed as a pure function of the state.
Perhaps the most notable example of this is the Elm language and frontend framework, which popularized a style of front-end development (namely, the Elm architecture, or TEA) abiding by the isolated state paradigm. More generally, any model/view/contoller framework places all application state into the model and therefore abides by isolated state. Event-log-based frameworks also abide by isolated state, as they dictate that all application state should be derivable from the event log.
One issue that isolated-state frameworks inevitably run into is performance. Since almost all aspects of the application are expressed as pure functions of the application state, this means that any update to the application state entails a recomputation of all other aspects of the application. In small applications this is an implementation detail, but in large applications this can be a crippling punch to performance.
There are a number of existing solutions to this problem, but all of them are some combination of only being a partial solution, introducing new overhead, or requiring departure from the isolated state paradigm.
There is no doubt that these strategies genuinely do help performance (and are crucial for the well-being of hundreds of applications in production today). Nonetheless, as of yet no holy grail as been presented: a mitigation for perf issues which retains isolated state and does not simply ‘kick the bucket down the road’ as in the case of the Elm solutions. By this I mean that each strategy used by Elm to improve performance does so by moving where computation has to occur but without necessarily reducing the asymptotic complexity of what needs to be computed. (To be specific: by caching a function we stop having to compute that function but instead have to compute equality on its input type; by diffing we don’t have to re-paint the application but instead have to diff UI state.)
The trouble with this is that, in principle, it shouldn’t be necessary.
Say I write, for instance, a TODO webapp. Among other things, the webapp displays a line item for each TODO, and these line items can be marked as complete, deleted, or edited. If the user marks a single line item as ‘complete’, this induces a very small amount of work that, in princple, needs to occur: a single bit needs to be flipped within the application state, and within the UI only the one line item needs to be re-rendered (all other parts of the UI may be totally ignored).
Unfortunately, currentl solutions do not match up to this principle. In the worst case, the application state changing induces a complete recomputation from scratch of all other parts of the application. Usually it’s not that bad: Elm, for instance, will update the application state, re-compute the UI description (possibly pruning some parts of the computation via memoization), diff the result with the old UI description, and apply the resulting patch to the screen. However, there is still a large amount of excess-in-principle work being done: we’re either recomputing the entire UI description (when only a small part changed), or via memoization only recomputing part of it but having to compare possibly-large structures for equality in order to perform memoization2; after, we have to diff the entire new UI description with the old UI description in order to generate the screen patch.
React and Vue avoid full-application recomputation by splitting an application into small parts called ‘components’ each essentially a miniature application embedded in its parent component. Each component manages only a small part of the application, and when the state of a single component changes, only the computations relevant to that component are performed again; other components may recompute but won’t necessarily.
As such, we avoid full-app recomputations on state changes. However, the state controlled by a React or Vue component is genuinely owned by it; that is, this strategy fragments state: instead of state being stored in one single location, each component owns a bit of it. The isolated state paradigm is no longer being followed.
The Elm frontend framework offers a kind of laziness to improve performance. This is nothing more than veiled memoization of pure functions1 and (presumably) comes with the same flaw with memoization as always: in memoization of a function , when computing and then we may successfully avoid actually recomputing (if ) but we also gain the need to test if first. Hence, we only manage at best to reduce the algorithm to , which in the case of an isolated-state application is .
Elm has another trick up its sleeve, which is diffing. The output of an Elm application is a description of a web UI. When the UI is rendered to the screen, instead of starting from scratch Elm will mutatively modify whatever happens to be on the screen already. In most cases this means that only a small update needs to be made.
This sounds good. Actually, it is, except for the following. In order for Elm to perform this, it still has to compute the entirety of the output of the Elm application, which it then diffs with the output from the previous render in order to figure out what on the screen needs to be modified. In other words, while it’s good that we avoid re-painting the entire application from scratch each time, we’ve still only reduced the runtime to .
Not to mention microoptimizations. For instance, the Shpadoinkle framework writes its core datatypes Chruch-encoded in the name of performance.
To make matters worse, I believe it is the case that UI description computation is upward-closed even in the presence of memoization: if you have to recompute some part of the decsription, you also have to recompute everything above it (in the UI tree). This means that if a change is small but deep, it still may entail a large amount of work being re-performed.
Which is—in principle—so silly! We know what part of the state is changing: a single TODO item is going from ‘incomplete’ to ‘complete’. And we know how our state relates to our UI, meaning that we should be able to tell ahead-of-time what part of the UI needs to be recomputed and what part doesn’t. We ought to be able to directly respond to the TODO item being checked of as follows: make a small change to the state, and then jump directly to its related UI element and re-render, without going through intermediate processes.
And we do have a way to do that already (see React components), but—as far as I know—no solution to this is known that preserves the isolated state paradigm.
This is what we present here: a way to respond to small state changes with equally-small computations while retaining the ability to lie in an isolated stte paradigm. Our solution is highly general and may actually be better phrases as followed. It gives a way to respond to small changes in a function input with small changes in the function output. (When applied to isolated-state applications we recover the description from before). Our solution is not perfect—if applied to the web we’d maybe still have to perform DOM diffing—but is still quite good.
So... let’s see it.
Delta Types
If is a type, a delta type for is a type equipped with an operator .
A delta type is a model for a subset of endofunctions over .
If is a type, a complete delta type for is a delta type plus an operator such that forms a monoid and is a monoid-homomorphism .
Additionally, we require that is surjective.
A complete delta type is a model for the entire set of endofunctions over , plus composition and identity.
Ambiance
An ambiance is a selection of delta types for some (or all) types. In particular, it’s a mapping
where for each type we have that is either a delta type or a complete delta type for .
When is clear from context, then for we write just .
Derivatives
Let be types and delta types; let be a function. Then a derivative for is a function
such that
If we think of as representing a small change to , then the derivative condition asks for to compute the corresponding change to .
For fixed , there may be more than one derivative for . However, the differences between these derivatives will be of no interest to us, so we will refer to “the” derivative.
Weak Derivatives
In many cases, it is impossible to write a plan derivative for a function, but possible to create a weakened form of derivative called a weak derivative.
Def (weak derivative) If are types and are delta types and is a function, a weak derivative of is a function
such that
Automatic Differentiation
This is ways to automatically compute derivatives
A derivative for compositions is a composition of the derivatives.
Say we have , differentiable and composable; then
is a derivative for .
Proof:
A derivative for nonlinear compositions
Consider
We yearn for a derivative for . Assume we are working in an ambiance admitting products. Then a derivative for is given by
as is proven by
In the special case of this recovers that
(old) Delta Types and Derivatives for ADTs
We present delta types for (the empty type), (the singleton type), products of types, and sums of types. We also present derivatives for functions between these types.
These example types are presented as an operator , where is the collection of types generated by , , sums, and products.
Now we present derivatives
The delta type for is given by
This type is, in general, not full, because it’s impossible to express, for instance, the endofunction . Generally speaking this delta type can only represent deltas that act independently on the and parts.
We could make this delta type full by using something like ; however, we would lose the ability to differentiate projections (more on this later).
Now we present derivatives
The derivative of the identity is
The derivative of the inclusion
is given by
The derivative of the projection
is given by
This seemingly-natural derivative is only possible because the delta type construction for products is not full.
Recall that we defined ; we noted that it is not full but could be made full by instead defining it as
The trouble, as foreshadowed, is that this definition would not admit a derivative for . Doing so would require a way to take an delta and measure the change in , producing a delta in . If we assume is full then this weakens to asking for a function .
There is no way to do this! There is no way to turn an into an . (Originally I tried explaining this in words, but I genuinely think it would be more effective to encourage the reader to try to prove me wrong and find out for themselves why it’s impossible)
Concretely, we may consider the endofunction . One cannot measure the “change in " here as an . We require the information of what is in order to say what happens to .
(old) Notes on Extensibility
A standing question is that of extensibility. Say we give delta types language support. It seems like it would be mistaken to fix for each type a static delta type, as what delta type is appropriate may change by context.
For instance, say we fix for the type
This solution might be fine for someone who has all the time in the world and control over every part of their codebase, but I reject it as feasible in practice for the following reasons:
It’s worth noting that (1) could in theory be remedied by allowing each callsite to use its own delta type for . This is a viable solution, but would increase noise in the codebase, as the user would have to create all those delta types (or at least invoke some ‘create’ operation) as well as track which delta types are being used where. It’s not a bad idea, but it’s more complex than having just one delta type per type. Anyway, the question of delta type uniqueness is orthogonal to the following idea.
I propose to resolve (1) that the following can be adopted as a language feature. A user may tag a function with the keyword
List
the following delta type. If is the delta type for , then we define to be
Here an “indexed modification” delta consists of an index and a delta and acts on by applying the -delta to the index of a given list. A “splicing” delta consists of an index and a length and a sequence ; it acts on by taking a list and replacing its through elements with the list .
The details of this delta type are not really significant; the point is that we are choosing some operations to bless as having representations as deltas (any non-blessed operation will be represented as a plain ). The operations we choose to bless here are identity, indexed modification, and splicing. This choice is reasonable because a large number of list operations can be expressed as a composition of these primitives. As just one example, append and prepend are special cases of splicing.
Now when writing derivatives for functions out of , we are able to treat indexed modification, splicing, and compositions thereof specially. This means, for instance, that for a render function where each TODO item is rendered to exactly one web component, the derivative can be extremeley efficient in the case that a TODO item gets marked as ‘complete’. Such an action would produce an “indexed modification” delta, which the derivative could recognize and update the result incrementally.
Fantastic! But maybe not future proof.
One thing we might want to do is sort a TODO list, perhaps by date, or perhaps by some notion of priority. To add this feature to our app we could will into existence a type and then change our render function to have type
That is, we could add to our model. But for the sake of example let’s say we decided to do sorting by keeping the model as just a but choosing to order that list by the desired display order.
The trouble with this is that sorting cannot be efficiently represented by our current delta type for . A sort is not expressible as a composition of indexed modifications and splices, meaning that any time the application user performs a sort, the model update will be passed along as a plain ; this in turn means that the view derivative recieves no information about the incoming update and is forced to recompute the view from scratch. Which is doing extra work, because a sort in the model should ideally become an in-place sort in the view.
The obvious solution here is to add a variant to the delta type ; ie, redefine
(For clarity, a “sorting” delta is to be interpreted as a finite permutation over the indices of the list. Care will have to be taken regarding the case where the incoming does not well-represent a permutation.)
With this new variant, we can now express a user “sort” action as a “sorting” delta-type, and we can modify our derivative to handle the “sorting” delta type nicely. And then we’re done!
From a high-level viewpoint, this solution looks like the following:
That operation is not efficiently represented by the current delta type, so the application slows down
Solution: modify the delta type and view derivative
Poor code locality — The fact that sorting has no good representation as a is an issue that is relevant only to the callsite which actually performs the sorting, and should not be made a problem for the module in which is defined. (Imagine a large app with hundreds of callsites each of which need their own extensions to . We don’t want to force the module containing to have knowledge of all those hundreds of callsites!)
As a framework user, I don’t have to have to manually define my derivatives. It should be automatic or at least semi-automatic.
delta
; doing so will add an appropriate variant to the delta type definition on the appropriate type. For instance, if we have
[delta]
remove :: forall a. Int -> List a -> List a
remove = {- implementation -}
then a constructor Remove :: Int -> Delta (List a)
will be added to the definition of Delta (List a)
and its interpretation will be given by remove
. This allows the delta type for List a
to be extended in a callsite-local way!
The big issue I see with this is how to work it into derivatives. Alongside the [delta]
tag we’d need some kind of "remove
derivative” or something which would be used when performing automatic derivative generation in order to actually make use of the new delta type variant. Just not sure how to do this at the moment.
class HasDelta a where
type Delta a :: Type
nudge :: Delta a -> a -> a
class HasDelta a => HasFullDelta a where
-- Witnesses surjection of 'nudge' as a function into 'a -> a'
-- Law: 'nudge (surj f) a = f a'
surj :: (a -> a) -> Delta a
instance HasDelta Void where
type Delta Void = ()
nudge _ = identity
instance HasDelta () where
type Delta () = ()
nudge _ = identity
instance (HasDelta a, HasDelta b) => HasDelta (a, b) where
type Delta (a, b) = (Delta a, Delta b)
nudge (da, db) (a, b) = (nudge da a, nudge db b)
instance (HasDelta a, HasDelta b) => HasDela (Either a b) where
type Delta (Either a b) = (Delta a, Delta b)
nudge (da, db) = \case
Left a -> Left (nudge da a)
Right b -> Right (nudge db b)
--
type Model = { str :: String, num :: Int }
data ΔModel = ΔStr (String -> String) | ΔNum (Int -> Int)
view :: Model -> Html
view { str, num } =
E.div
[]
[ E.p [] [ E.text str ]
, E.p [] [ E.text (show num) ]
]
∂view :: Model -> ΔModel -> ΔHtml
∂view model δmodel =
∂E.div
noop
( ∂(\x y -> [ x, y ])
( ∂E.p noop ∂(\x -> [x]) (∂\model -> E.text model.str )(δmodel) )
( ∂E.p noop ∂(\x -> [x]) (∂\model -> E.text (show model.num))(δmodel) )
)
where
noop :: forall d a. Delta d a => d
∂(\(x /\ y) -> f x)(δxy) = case δxy of
L δx -> ∂f(δx)
R δy -> noop
Both (d :: End (X /\ Y)) -> Surj (\fx ->
Things noticed when trying it out in code:
When writing a derivative , it’s desirable for to:
Contain a noop representation, in case the change to had no effect on
It’s easy to get definition of derivatives wrong. However, each derivative can be verified fully by checking only a single condition. So, there’s a potential for property tests to grant a very high degree of confidence in our derivatives.
Questions:
Is there a need for derivatives to take both the delta and the current value? Or, when is there a need?
I think it’s going to very commonly be a need. For instance:
Our input delta is removing a file from a folder. We want to produce an output delta for the “total size of folder” statistic. This requires knowing the size of the file removed, which is not a part of the delta (... or is it?).
More generally, our input delta is removing a chunk of the state. To appropriately modify the output, we need to know what the removed chunk looked like, but that information is not part of the delta itself (... or is it???)
Another example is the derivative of the projection . If our current value is and input delta is , then the output delta should be , which requires knowing .
Another example, synthesized from a conversation with Samwell.
Consider the function , where is enormous (say, in the trillions), and hence the sum is very slow to recompute.
An input delta representing an addition of to easily corresponds to an output delta; namely, the delta representing an addition of .
However, an input delta representing a scaling of to does not easily correspond to any output delta, since we cannot recover from .
However, if the derivative is also given the current value of , then it can easily compute the correct output delta as the delta incrementing the result by .