AppJet, Inc., with modifications by the Etherpad Foundation
December 9, 2018
Contents
(2 T e)[ci, C2,C3,...]
where
2 is the length of the document before the change,
2 is the length of the document after the change,
[ci, C2, C3,...] is an array of 2 characters that described the document after the change.
Note that Vq : 0 < i < 2 is either an integer or a character.
Later we will discuss optimizations to changeset representation (using “strips” and other such techniques). The two constraints must apply to any representation of changesets.
Example A = (0 t B = (5 T 11)[0 — 4, “ world^]
We can write the document “hello world” as A - B or just AB. Note that the “initial document” can be made into the changeset (0 T N)[“ < the document text > ”].
When A and B are changesets, we can also refer to (AB) as “the composition” of A and B. Changesets are closed under composition.
For any two changesets A, B such that
A = (n1 T n2)[...]
B = (n2 T n3)[...]
it is clear that there is a third changeset C = (ni T n3)[•••] such that applying C to a document X yields the same resulting document as does applying A and then B. In this case, we write AB = C.
Given the representation from Section 3, it is straightforward to compute the composition of two changesets.
Now we come to realtime document editing. Suppose two different users make two different changes to the same document at the same time. It is impossible to compose these changes. For example, if we have the document X of length n, we may have A = (n T na)[... nacharacters], B = (n T n)[... n^characters] where n = na = n&
It is impossible to compute (XA)B because B can only be applied to a document of length n, and (XA) has length n@. Similarly, A cannot be applied to (XB) because (XB) has length n^.
This is where merging comes in. Merging takes two changesets that apply to the same initial document (and that cannot be composed), and computes a single new changeset that preserves the intent of both changes. The merge of A and B is written as m(A, B). For the Etherpad system to work, we require that m(A, B) = m(B, A).
Aside from what we have said so far about merging, there are many different implementations that will lead to a workable system. We have created one implementation for text that has the following constraints.
When users A and B have the same document X on their screen, and they proceed to make respective changesets A and B, it is no use to compute m(A, B), because m(A, B) applies to document X, but the users are already looking at document XA and XB. What we really want is to compute B, and Az such that
XAB! = XBA = Xm(A, B)
“Following” computes these B, and Az changesets. The definition of the “follow” function / is such that Af (A, B) = Bf (B, A) = m(A, B) = m(B, A). When we compute f (A, B)
Example Suppose we have the initial document X = (0 T 8)[“baseball”] and user A changes it to “basil” with changeset A, and user B changes it to “below” with changeset B.
We have X = (0 t 8)[“baseball”]
A = (8 T 5)[0 - 1, “sz”,7]
B = (8 T 5)[0, “e”,6, “ow”]
First we compute the merge m(A, B) = m(B, A) according to the constraints
m(A,B) = (8 T 6)[0,”e”,”si”,”ow”] = (8 T 6)[0, “eszow”]
Then we need to compute the follows B! = f (A, B) and A' = f (B, A).
Bz = f (A,B) = (5 T 6)[0, “e”,2, 3, “ow”]
Note that the numbers 0, 2, and 3 are indices into A = (8 T 5)[0,1, “si”,7] 0 12 3 4
0 1 s i 7
A' = f (B,A) = (5 t 6)[0,1, ”sz”,3, 4]
We can now double check that AB' = BA' = m(A, B) = (8 T 6)[0, “esiow”].
Now that we have made the mathematical meaning of the preceding pages complete, we can build a client/server system to support realtime editing by multiple users.
There is a server that holds the current state of a document. Clients (users) can connect to the server from their web browsers. The clients and server maintain state and can send messages to one another in real-time, but because we are in a web browser scenario, clients cannot send each other messages directly, and must go through the server always. (This may distinguish from prior art?)
The other critical design feature of the system is that A client must always be able to edit their local copy of the document, so the user is never blocked from typing because of waiting to send or receive data.
At any moment in time, a client maintains its state in the form of 3 changesets. The client document looks like A - X - Y, where
A is the latest server version, the composition of all changesets committed to the server, from this client or from others, that the server has informed this client about. Initially A = (0 T N)[< initial document text >].
X is the composition of all changesets this client has submitted to the server but has not heard back about yet. Initially X = (N T N)[0,1, 2,...,N — 1], in other words, the identity, henceforth denoted In .
Y is the composition of all changesets this client has made but has not yet submitted to the server yet. Initially Y = (N T N)[0,1, 2,...,N — 1].
A client can do 5 things.
As these 5 events happen, the client updates its representation A - X - Y according to the relations that follow. Changes “move left” as time goes by: into Y when the user types, into X when change sets are submitted to the server, and into A when the server acknowledges changesets.
When a user makes an edit E to the document, the client computes the composition (Y - E) and updates its local state, i.e. Y。Y - E. I.e., if Y is the variable holding local unsubmitted changes, it will be assigned the new value (Y - E).
When a client submit its local changes to the server, it transmits a copy of Y and then assigns Y to X, and assigns the identity to Y. I.e.,
This happens every 500ms as long as it receives an acknowledgement. Must receive ACK before submitting again. Note that X is always equal to the identity before the second step occurs, so no information is lost.
When the client hears ACK from server,
A — A - X
X — In
When a client hears about another client's changeset B, it computes a new A, X, and Y, which we will call A', X', and Y, respectively. It also computes a changeset D which is applied to the current text view on the client, V. Because AXY must always equal the current view, AXY = V before the client hears about B, and A'X'Y' = VD after the computation is performed.
The steps are:
In steps 2,3, and 4, / is the follow operation described in Section 8.
Proofthat AXY = V 今 A/X/Y/ = VD. Substituting A/X丫 = (AB)(/(B,X))(/(/(X,B),Y)), we recall that merges are commutative. So for any two changesets P and Q,
m(P, Q) = m(Q, P) = Qf (Q, P) = Pf (P, Q)
Applying this to the relation above, we see
A/X/Y/ = ABf (B,X )f (f (X,B),Y)
=AXf (X,B)f (f (X,B),Y)
=AXYf (Y,f (X,B))
=AXYD
=VD
As claimed.
When a client connects to the server for the first time, it first generates a random unique ID and sends this to the server. The client remembers this ID and sends it with each changeset to the server.
The client receives the latest version of the document from the server, called
HEADTEXT. The client then sets
A — HEADTEXT
And finally, the client displays HEADTEXT on the screen.
Like the client(s), the server has state and performs operations. Operations are only performed in response to messages from clients.
The server maintains a document as an ordered list of revision records. A revision record is a data structure that contains a changeset and authorship information.
RevisionRecord = {
ChangeSet,
Source (unique ID),
Revision Number (consecutive order, starting at 0)
For efficiency, the server may also store a variable called HEADTEXT, which is the composition of all changesets in the list of revision records. This is an optimization, because clearly this can be computed from the set of revision records.
The server does two things in addition to maintaining state representing the set of connected clients and remembering what revision number each client is up to date with:
When a server receives a connection request from a client, it receives the client's unique ID and stores that in the server's set of connected clients. It then sends the client the contents of HEADTEXT, and the corresponding revision number. Finally the server notes that this client is up to date with that revision number.
When the server receives information from a client about the client's changeset C, it does five things:
Additional topics
2. Creates a new changeset C that is relative to the server's most recent revision number, which we call th (H for HEAD). C! can be computed using follows (Section 8). Remember that the server has a series of changesets,
S0 T 2 3 4 S1 T Src T Src + 1 T . . . T SrH
C is relative to Src, but we need to compute C relative to S^. We can compute a new C relative to Src+i by computing /(Src+i, C). Similarly we can repeat for Src+2 and so forth until we have C! represented relative to SrH .
3. Send C 'to all other clients
4. Send ACK back to original client
5. Add C' to the server's list of revision records by creating a new revision record out of this and the client's ID.