|
Since the data flow diagram (DFD) technique was introduced in the late
1970s,1,2 it has become the main process modeling tool for information systems.
Recent research has shown that DFDs are the most popular tool taught in systems
analysis and design courses: 597 out of 647 schools (92 percent) indicated that
they teach DFDs in that course.3
Although recent object-oriented design methodologies such as the Unified
Method by Booch and Rumbaugh4 may attempt to replace functional modeling,5 DFDs
seem to have certain advantages. Empirical research by Vessey and Conger6 shows
that DFDs are easier to learn and to use, at least by novice users. Similarly,
Agarwal et al. showed that DFDs produce higher-quality solutions in
process-oriented tasks and are not inferior to object-oriented methodologies
even in object-oriented tasks.7
If DFDs are so easy to use, one may ask, where is the problem? Why bother
making DFDs even easier and more flexible? There are several reasons for
adopting the modification proposed in this note. First, since the DFD is a
popular tool, it is easy to justify the effort to improve it. Making DFDs
simpler and more flexible may help us to also reduce the tension between
discipline and creativity in the practice of systems development.8,9 Finally,
by removing the data modeling aspect of DFDs we can avoid redundancy and
conflict with the popular entity-relationship diagram (ERD) methodology.
The overlap with ERDs
According to Whitten and Bentley,10 process modeling is a "technique for
organizing and documenting the structure and flow of data through a system's
Processes and/or the logic, policies, and procedures to be implemented by a
system's Processes." The problem is that ERDs11 already model data structures.
As shown below, asking DFDs to depict which tables are required by the system
causes duplication of effort, clutter, and inflexibility.
Figure 1 depicts a simple DFD, adapted from Fertuck.12 According to the Gane
and Sarson2 notation used here, the rounded boxes
represent processes, such as
"enroll students," which transform incoming data flows, represented by
arrows, into outgoing data flows. An open-ended rectangle represents a data
store, typically a database table such as "students," which stores data for
use at a later time. A plain rectangle represents a terminator or external
entity, such as "student," which is an external source or destination for
information.
Figure 1
This is a rather simple case and it is further simplified by the omission of
other necessary data stores such as "teachers" and "courses." Still, this
approach of assigning data store symbols to every table makes this diagram more
complex and less flexible than it should be.
If the analyst realizes that certain tables should be added to the system,
the change will not be limited to the ERD; this DFD would also have to be
redrawn and would become even busier. Similar changes will have to be made
throughout all levels of the DFD hierarchy. After one or two cycles of such
changes, the analyst will probably be less inclined to use DFDs in future
assignments.
Furthermore, consider adding a third process to this diagram, say "generate
reports." Since this third process may require access to many tables, we
immediately have either data flows crossing one another or data store replicas
cluttering the diagram. Figure 2 demonstrates how the addition of two more data
stores and one more process causes a rapid deterioration in the visual appeal
of the DFD. Things can become much uglier when designing DFDs for more complex
situations.
Figure 2
These limitations are self-imposed due to the insistence on using DFDs to
model not only processes, but also data structures.
Proposed adaptation. The solution to these problems is to let DFDs and ERDs
serve different purposes. Allow ERDs to focus on modeling data, and let DFDs
focus on modeling processes. If we adopt a guideline whereby each data store
can represent a whole database, then we can model the original system (Figure 1) using the DFD depicted by Figure 3.
Figure 3
This small change has reduced the number of data stores from three to one,
and the number of data flows from twelve to eight. As one more example, Figure 4 shows how the DFD in Figure 2 becomes much simpler when we apply the new
guideline.
Figure 4
By comparing the DFD in Figure 4 to the one in Figure 2 we can see that
using a single data store to depict the whole database allows us to add more
processes without resorting to spaghetti data flows or to data store replicas.
We reduced five data stores to one, and 23 data flows to 11. The new process
can find easy access to the single data store symbol since the database is not
surrounded by other data store symbols.
Although the diagram in Figure 4 is simpler and easier to understand, the
most important impact is the isolation from changes in the data model. Adding
or dropping tables would have no impact on the new DFD, unless these changes
also reflect changes in process design. For example, adding a "classroom"
table to the database would require no changes to the DFD in Figure 4. The
single data store symbol encapsulates the database structure and shields the
process model from such changes.
Concluding remarks
At the lowest level of decomposed DFDs, primitive process specifications
(PPSs) identify records and data elements used as input and output to
processes. This is a valid area of overlap between data models and process
models. Should DFDs then model data structures after all? The answer lies in
the integration provided by modern CASE (computer-assisted software
engineering) tools. The same repository used by the ERD tool to maintain
information about tables and record structures can be used by the DFD tool to
specify records and data elements as input, output, and data flow components.
The key then is not the isolation of DFDs from data aspects, but the isolation
of the graphical DFD representation from such issues.
Although object-oriented analysis and design methods are adding useful
techniques to our systems analysis toolbox, we still lack descriptive and
prescriptive research on the application of these tools. Such research can
increase the likelihood of teaching and practicing effective systems analysis
techniques.
For the last five years I have been teaching students the DFD technique
using this adaptation. The feedback has been very positive. I must concede,
however that, while the examples above seem compelling, the benefits of the
proposed adaptation are mere conjecture at this stage. I can only hope that
those who try this technique report that indeed it makes DFDs easier to create,
understand, and maintain.
Accepted for publication September 16, 1998.
|