IBM / Collaborative User Experience Group
HOME   |   HOW IT WORKS   |   RESULTS   |   GALLERY   |   CONTACT

history flow: how it works

The history flow application charts the evolution of a document as it is edited by many people using a very simple visualization technique.



  How it Works
Imagine a scenario where three people will make contributions to a Wiki page at different points in time. Each person edits the page and then saves their changes to what becomes the latest version of that page.


diagram
The vertical red line represents the first version of the document. Since Mary creates the page all of the contents in the page reflect her author color. The length of the line indicates the amount of text Mary has written.


  diagram
Suzanne adds some text to the end of Mary's original entry; note that Suzanne's blue line is appended to the end of Mary's red line indicating that Suzanne's text was added at the end of the page. Suzanne saves her changes and this becomes the latest version of the page.

  diagram
Martin finds the original text too verbose; he deletes some of it and writes his own shorter version between the introductory text and Suzanne's contribution.
 


  diagram
On version 4 Suzanne comes back and makes a small contribution in the middle of what remains of the introductory text.

 

  diagram
history flow connects text that has been kept the same between consecutive versions; in other words, it connects corresponding segments on the lines representing versions. Pieces of text that do not have correspondence in the next (or previous) version are not connected and the user sees a resulting "gap" in the visualization; this happens for deletions and insertions.

 

 

 


Visualization Modes

history flow
has four main visualization modes that display the contents of the document being analyzed as it changes over different versions. Each one of these modes highlights different aspects of authorship and content changes as these evolve over time.

1. Community view: this is the default mode and it shows all contributions from different authors, color-coding the text to indicate the author of each sentence.

diagram

2. Individual author view: this mode highlights the contributions of a single author and it depicts the persistence of these contributions over time.
diagram


3. Recent Changes View: highlights the new content in each version of the Wiki page independent of authorship. This view allow us to see what portions of the text have been edited the most over time.

diagram



4. Age View: this view has no colors representing authorship; instead, the focus is on the persistence of different contributions. A gray scale gradient goes from white (brand new contribution) to dark gray (very old contribution).

diagram


Implementation notes
Finding matching sections of two document revisions is a well-studied problem in computer science, with many possible solutions. For our visualization, we've chosen a simple technique from the 70s. See Heckel, Communications of the ACM, April 1978: "A Technique for Isolating Differences Between Files." The algorithm works by matching up tokens--in our program "sentences" defined as pieces of text delimited by periods or html tags--which gives decent results with fair efficiency. One problem with this approach is that tiny changes, such as the addition of a single comma, will show up as a change to an entire sentence.

Our application is written in Java 1.4 and runs as a standalone program. Further details will appear in a forthcoming paper and are also available upon request (see our contact page). Also note that if you wish to conduct similar studies, all wikipedia data is available for download here.


Related work
There are many existing methods for visualizing document revisions. Several popular source control systems include the capability to color-code changed regions in files, and to show a side-by-side comparison of two files, graphically connecting matching sections. Other methods use a thumbnail view of a program, with line-by-line coloring to indicate authorshio or age; see for example the work by Eick and others on software visualization. History flow diagrams have some visual similarity to Theme River (tm) and to Inselberg's parallel coordinates, but our method depicts a completely different type of data. As far as we know the timeline visualization introduced here is new, but please let us know if you're familiar with other work we should cite.




(c) copyright 2003 IBM. [ Contact IBM ] [ Legal ] [ Privacy ] [ Orders ] [ IBM Home ] [ Research Home ]