The software world of 1972 was struggling with growing pains. The f...
This textbook, "Designing Systems Programs”, focused on methods for...
Here is a flowchart representation of this modularization: 
![](htt...
### KWIC (Key Word In Context) A KWIC ("Key Word In Context") in...
Just listing out modules isn’t enough to start developing the syste...
Here is a flowchart representation of this modularization: 
![](htt...
Hoare's FIND algorithm is a selection algorithm that finds the kth ...
Parnas describes how the circular shift module in the second modula...
Balzer [9], Mealy [10], and Wulf et al. [11] represent significant ...
Imagine you need to call a function in assembly language. Unlike mo...
Programming R. Morris
Techniques Editor
On the Criteria To Be
Used in Decomposing
Systems into Modules
D.L. Parnas
Carnegie-Mellon University
This paper discusses modularization as a mechanism
for improving the flexibility and comprehensibility of a
system while allowing the shortening of its development
time. The effectiveness of a "modularization" is
dependent upon the criteria used in dividing the system
into modules. A system design problem is presented and
both a conventional and unconventional decomposition
are described. It is shown that the unconventional
decompositions have distinct advantages for the goals
outlined. The criteria used in arriving at the decom-
positions are discussed. The unconventional decomposi-
tion, if implemented with the conventional assumption
that a module consists of one or more subroutines, will
be less efficient in most cases. An alternative approach
to implementation which does not have this effect is
sketched.
Key Words and Phrases: software, modules,
modularity, software engineering, KWIC index,
software design
CR Categories: 4.0
Introduction
A lucid statement of the philosophy of modular
programming can be found in a 1970 textbook on the
design of system programs by Gouthier and Pont [1,
¶I0.23], which we quote below: 1
A well-defined segmentation of the project effort ensures
system modularity. Each task forms a separate, distinct program
module. At implementation time each module and its inputs and
outputs are well-defined, there is no confusion in the intended
interface with other system modules. At checkout time the in-
tegrity of the module is tested independently; there are few sche-
duling problems in synchronizing the completion of several tasks
before checkout can begin. Finally, the system is maintained in
modular fashion; system errors and deficiencies can be traced to
specific system modules, thus limiting the scope of detailed error
searching.
Usually nothing is said about the criteria to be used
in dividing the system into modules. This paper will
discuss that issue and, by means of examples, suggest
some criteria which can be used in decomposing a
system into modules.
Copyright @ 1972, Association for Computing Machinery, Inc.
General permission to republish, but not for profit, all or part
of this material is granted, provided that reference is made to this
publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Com-
puting Machinery.
Author's address: Department of Computer Science, Carnegie-
Mellon University, Pittsburgh, PA 15213.
1053
A Brief Status Report
The major advancement in the area of modular
programming has been the development of coding
techniques and assemblers which (l) allow one module
to be written with little knowledge of the code in
another module, and (2) allow modules to be reas-
sembled and replaced without reassembly of the whole
system. This facility is extremely valuable for the
production of large pieces of code, but the systems most
often used as examples of problem systems are highly-
modularized programs and make use of the techniques
mentioned above.
1 Reprinted by permission of Prentice-Hall, Englewood
Cliffs, N.J.
Communications December 1972
of Volume 15
the ACM Number 12
Expected Benefits of Modular Programming
The benefits expected of modular programming are:
(1) managerial--development time should be shortened
because separate groups would work on each module
with little need for communication: (2) product flexi-
bility-it should be possible to make drastic changes to
one module without a need to change others; (3) com-
prehensibility-it should be possible to study the
system one module at a time. The whole system can
therefore be better designed because it is better under-
stood.
What Is Modularization?
Below are several partial system descriptions called
modularizations. In this context "module" is considered
to be a responsibility assignment rather than a sub-
program. The modularizations include the design deci-
sions which must be made before the work on inde-
pendent modules can begin. Quite different decisions
are included for each alternative, but in all cases the
intention is to describe all "system level" decisions (i.e.
decisions which affect more than one module).
Example System 1: A KWIC Index Production System
The following description of a KWIC index will
suffice for this paper. The KWIC index system accepts an
ordered set of lines, each line is an ordered set of words,
and each word is an ordered set of characters. Any line
may be "circularly shifted" by repeatedly removing the
first word and appending it at the end of the line. The
KWXC index system outputs a listing of all circular shifts
of all lines in alphabetical order.
This is a small system. Except under extreme cir-
cumstances (huge data base, no supporting software),
such a system could be produced by a good programmer
within a week or two. Consequently, none of the
difficulties motivating modular programming are im-
portant for this system. Because it is impractical to
treat a large system thoroughly, we must go through
the exercise of treating this problem as if it were a large
project. We give one modularization which typifies
current approaches, and another which has been used
successfully in undergraduate class projects.
Modularlzation 1
We see the following modules:
Module 1: Input. This module reads the data lines
from the input medium and stores them in core for
processing by the remaining modules. The characters
are packed four to a word, and an otherwise unused
character is used to indicate the end of a word. An index
is kept to show the starting address of each line.
Module 2: Circular Shift. This module is called after
the input module has completed its work. It prepares an
index which gives the address of the first character of
each circular shift, and the original index of the line in
the array made up by module 1. It leaves its output in
core with words in pairs (original line number, starting
address).
Module 3: Alphabetizing. This module takes as
input the arrays produced by modules 1 and 2. It
produces an array in the same format as that produced
by module 2. In this case, however, the circular shifts
are listed in another order (alphabetically).
Module 4: Output. Using the arrays produced by
module 3 and module 1, this module produces a nicely
formatted output listing all of the circular shifts. In a
sophisticated system the actual start of each line will
be marked, pointers to further information may be
inserted, and the start of the circular shift may actually
not be the first word in the line, etc.
Module 5: Master Control. This module does little
more than control the sequencing among the other four
modules. It may also handle error messages, space
allocation, etc.
It should be clear that the above does not constitute
a definitive document. Much more information would
have to be supplied before work could start. The defin-
ing documents would include a number of pictures
showing core formats, pointer conventions, calling
conventions, etc. All of the interfaces between the four
modules must be specified before work could begin.
This is a modularization in the sense meant by all
proponents of modular programming. The system is
divided into a number of modules with well-defined
interfaces; each one is small enough and simple enough
to be thoroughly understood and well programmed.
Experiments on a small scale indicate that this is
approximately the decomposition which would be
proposed by most programmers for the task specified.
Modularization 2
We see the following modules:
Module 1: Line Storage. This module consists of a
number of functions or subroutines which provide the
means by which the user of the module may call on it.
The function call CHAR(r,w,c) will have as value an
integer representing the cth character in the rth line,
wth word. A call such as SETCHAR(r,w,c,d) will cause
the cth character in the wth word of the rth line to be
the character represented by d (i.e. CHAR(r,w,c) = d).
WORDS(r) returns as value the number of words in
1054
Communications December 1972
of Volume 15
the ACM Number 12
line r. There are certain restrictions in the way that these
routines may be called; if these restrictions are violated
the routines "trap" to an error-handling subroutine
which is to be provided by the users of the routine.
Additional routines are available which reveal to the
caller the number of words in any line, the number of
lines currently stored, and the number of characters in
any word. Functions DELINE and DELWRD are
provided to delete portions of lines which have already
been stored. A precise specification of a similar module
has been given in [3] and [8] and we will not repeat it
here.
Module 2: INPUT. This module reads the original
lines from the input media and calls the line storage
module to have them stored internally.
Module 3: Circular Shifter. The principal functions
provided by this module are analogs of functions pro-
vided in module I. The module creates the impres-
sion that we have created a line holder containing
not all of the lines but all of the circular shifts of the
lines. Thus the function call CSCHAR(I,w,c) provides
the value representing the cth character in the wth
word of the lth circular shift. It is specified that (1)
if i < j then the shifts of line i precede the shifts of line
j, and (2) for each line the first shift is the original
line, the second shift is obtained by making a one-word
rotation to the first shift, etc. A function CSSETUP is
provided which must be called before the other functions
have their specified values. For a more precise specifica-
tion of such a module see [8].
Module 4: Alphabetizer. This module consists
principally of two functions. One, ALPH, must be
called before the other will have a defined value. The
second, ITH, will serve as an index. ITH(i) will give the
index of the circular shift which comes ith in the
alphabetical ordering. Formal definitions of these
functions are given [8].
Module 5: Output. This module will give the desired
printing of set of lines or circular shifts.
Module 6: Master Control. Similar in function to the
modularization above.
Comparison of the Two Modularizations
General. Both schemes will work. The first is quite
conventional; the second has been used successfully in
a class project [7]. Both will reduce the programming to
the relatively independent programming of a number of
small, manageable, programs.
Note first that the two decompositions may share
all data representations and access methods. Our
discussion is about two different ways of cutting up
what may be the same object. A system built according
to decomposition 1 could conceivably be identical
after assembly to one built according to decomposition
2. The differences between the two alternatives are in
the way that they are divided into the work assignments,
and the interfaces between modules. The algorithms
used in both cases might be identical. The systems are
substantially different even if identical in the runnable
representation. This is possible because the runnable
representation need only be used for running; other
representations are used for changing, documenting,
understanding, etc. The two systems will not be identical
in those other representations.
Changeability. There are a number of design de-
cisions which are questionable and likely to change
under many circumstances. This is a partial list.
1. Input format.
2. The decision to have all lines stored in core. For
large jobs it may prove inconvenient or impractical to
keep all of the lines in core at any one time.
3. The decision to pack the characters four to a word.
In cases where we are working with small amounts of
data it may prove undesirable to pack the characters;
time will be saved by a character per word layout. In
other cases we may pack, but in different formats.
4. The decision to make an index for the circular'
shifts rather that actually store them as such. Again, for
a small index or a large core, writing them out may be
the preferable approach. Alternatively, we may choose
to prepare nothing during CSSETUP. All computation
could be done during the calls on the other functions
such as CSCHAR.
5. The decision to alphabetize the list once, rather
than either (a) search for each item when needed, or
(b) partially alphabetize as is done in Hoare's rIND
[2]. In a number of circumstances it would be advan-
tageous to distribute the computation involved in
alphabetization over the time required to produce the
index.
By looking at these changes we can see the differences
between the two modularizations. The first change is
confined to one module in both decompositions. For the
first decomposition the second change would result in
changes in every module! The same is true of the third
change. In the first decomposition the format of the
line storage in core must be used by all of the programs.
In the second decomposition the story is entirely
different. Knowledge of the exact way that the lines are
stored is entirely hidden from all but module 1. Any
change in the manner of storage can be confined to that
module!
In some versions of this system there was an addi-
tional module in the decomposition. A symbol table
module (as specified in [3]) was used within the line
storage module. This fact was completely invisible to
the rest of the system.
The fourth change is confined to the circular shift
module in the second decomposition, but in the first
decomposition the alphabetizer and the output routines
will also know of the change.
The fifth change will also prove difficult in the first
decomposition. The output module will expect the index
to have been completed before it began. The alpha-
betizer module in the second decomposition was
1055
Communications December 1972
of Volume 15
the ACM Number 12
designed so that a user could not detect when the
alphabetization was actually done. No other module
need be changed.
Independent Development.
In the first modularization
the interfaces between the modules are the fairly com-
plex formats and table organizations described above.
These represent design decisions which cannot be taken
lightly. The table structure and organization are es-
sential to the efficiency of the various modules and must
be designed carefully. The development of those formats
will be a major part of the module development and
that part must be a joint effort among the several
development groups. In the second modularization the
interfaces are more abstract; they consist primarily in
the function names and the numbers and types of the
parameters. These are relatively simple decisions and
the independent development of modules should
begin much earlier.
Comprehensibility.
To understand the output module
in the first modularization, it will be necessary to
understand something of the alphabetizer, the circular
shifter, and the input module. There will be aspects of
the tables used by output which will only make sense
because of the way that the other modules work. There
will be constraints on the structure of the tables due to
the algorithms used in the other modules. The system
will only be comprehensible as a whole. It is my sub-
jective judgment that this is not true in the second
modularization.
The Criteria
Many readers will now see what criteria were used
in each decomposition. In the first decomposition the
criterion used was to make each major step in the
processing a module. One might say that to get the first
decomposition one makes a flowchart. This is the most
common approach to decomposition or modulariza-
tion. It is an outgrowth of all programmer training
which teaches us that we should begin with a rough
flowchart and move from there to a detailed imple-
mentation. The flowchart was a useful abstraction for
systems with on the order of 5,000-10,000 instructions,
but as we move beyond that it does not appear to be
sufficient; something additional is needed.
The second decomposition was made using 'fin-
formation hiding" [4] as a criterion. The modules no
longer correspond to steps in the processing. The line
storage module, for example, is used in almost every
action by the system. Alphabetization may or may not
correspond to a phase in the processing according to
the method used. Similarly, circular shift might, in some
circumstances, not make any table at all but calculate
each character as demanded. Every module in the
second decomposition is characterized by its knowledge
of a design decision which it hides from all others. Its
interface or definition was chosen to reveal as little as
possible about its inner workings.
1056
Improvement in Circular Shift Module
To illustrate the impact of such a criterion let us
take a closer look at the design of the circular shift
module from the second decomposition. Hindsight now
suggests that this definition reveals more information
than necessary. While we carefully hid the method
of storing or calculating the list of circular shifts, we
specified an order to that list. Programs could be effec-
tively written if we specified only (I) that the lines
indicated in circular shift's current definition will all
exist in the table, (2) that no one of them would be
included twice, and (3) that an additional function
existed which would allow us to identify the original
line given the shift. By prescribing the order for the
shifts we have given more information than necessary
and so unnecessarily restricted the class of systems that
we can build without changing the definitions. For
example, we have not allowed for a system in which the
circular shifts were produced in alphabetical order,
ALPH
is empty, and
ITH
simply returns its argument
as a value. Our failure to do this in constructing the
systems with the second decomposition must clearly be
classified as a design error.
In addition to the general criteria that each module
hides some design decision from the rest of the system,
we can mention some specific examples of decom-
positions which seem advisable.
1. A data structure,
its internal linkings,
accessing
procedures and modifying procedures
are part of a
single module. They are not shared by many modules as
is conventionally done. This notion is perhaps just an
elaboration of the assumptions behind the papers of
Balzer [9] and Mealy [10]. Design with this in mind is
clearly behind the design of BLISS [11].
2. The sequence of instructions necessary to call a given
routine and the routine itself are part of the same module.
This rule was not relevant in the Fortran systems used
for experimentation but it becomes essential for systems
constructed in an assembly language. There are no
perfect general calling sequences for real machines and
consequently they tend to vary as we continue our
search for the ideal sequence. By assigning responsibility
for generating the call to the person responsible for the
routine we make such improvements easier and also
make it more feasible to have several distinct sequences
in the same software structure.
3. The
formats of control blocks
used in queues in
operating systems and similar programs
must be hidden
within a "control block module." It is conventional to
make such formats the interfaces between various
modules. Because design evolution forces frequent
changes on control block formats such a decision often
proves extremely costly.
4. Character codes, alphabetic orderings, and similar
data should be hidden
in a module for greatest flexibility.
5. The sequence in which certain items will be proc-
essed should (as far as practical) be hidden within a
single module. Various changes ranging from equip-
Communications December 1972
of Volume 15
the
ACM Number 12
ment additions to unavailability of certain resources in
an operating system make sequencing extremely vari-
able.
Efficiency and Implementation
If we are not careful the second decomposition will
prove to be much less efficient than the first. If each of
the functions is actually implemented as a procedure
with an elaborate calling sequence there will be a great
deal of such calling due to the repeated switching
between modules. The first decomposition will not
suffer from this problem because there is relatively in-
frequent transfer of control between modules.
To save the procedure call overhead, yet gain the
advantages that we have seen above, we must implement
these modules in an unusual way. In many cases the
routines will be best inserted into the code by an
assembler; in other cases, highly specialized and efficient
transfers would be inserted. To successfully and
efficiently make use of the second type of decomposition
will require a tool by means of which programs may be
written as if the functions were subroutines, but as-
sembled by whatever implementation is appropriate. If
such a technique is used, the separation between
modules may not be clear in the final code. For that
reason additional program modification features would
also be useful. In other words, the several representa-
tions of the program (which were mentioned earlier)
must be maintained in the machine together with a
program performing mapping between them.
A Decomposition Common to a Compiler and Interpretor
for the Same Language
In an earlier attempt to apply these decomposition
rules to a design project we constructed a translator for
a Markov algorithm expressed in the notation described
in [6]. Although it was not our intention to investigate
the relation between compiling and interpretive trans-
lators of a langugage, we discovered that our decom-
position was valid for a pure compiler and several
varieties of interpretors for the language. Although there
would be deep and substantial differences in the final
running representations of each type of compiler, we
found that the decisions implicit in the early decom-
position held for all.
This would not have been true if we had divided
responsibilities along the classical lines for either a
compiler or interpretor (e.g. syntax recognizer, code
generator, run time routines for a compiler). Instead
the decomposition was based upon the hiding of various
decisions as in the example above. Thus register repre-
sentation, search algorithm, rule interpretation etc. were
modules and these problems existed in both compiling
and interpretive translators. Not only was the decom-
position valid in all cases, but many of the routines
could be used with only slight changes in any sort of
translator.
This example provides additional support for the
statement that the order in time in which processing is
expected to take place should not be used in making
the decomposition into modules. It further provides
evidence that a careful job of decomposition can result
in considerable carryover of work from one project to
another.
A more detailed discussion of this example was
contained in [8].
Hierarchical Structure
We can find a program hierarchy in the sense illus-
trated by Dijkstra [5] in the system defined according to
decomposition 2. If a symbol table exists, it functions
without any of the other modules, hence it is on level 1.
Line storage is on level 1 if no symbol table is used or it
is on level 2 otherwise. Input and Circular Shifter re-
quire line storage for their functioning. Output and
Alphabetizer will require Circular Shifter, but since
Circular Shifter and line holder are in some sense
compatible, it would be easy to build a parameterized
version of those routines which could be used to
alphabetize or print out either the original lines or the
circular shifts. In the first usage they would not require
Circular Shifter; in the second they would. In other
words, our design has allowed us to have a single
representation for programs which may run at either
of two levels in the hierarchy.
In discussions of system structure it is easy to confuse
the benefits of a good decomposition with those of a
hierarchical structure. We have a hierarchical structure
if a certain relation may be defined between the modules
or programs and that relation is a partial ordering. The
relation we are concerned with is "uses" or "depends
upon." It is better to use a relation between programs
since in many cases one module depends upon only
part of another module (e.g. Circular Shifter depends
only on the output parts of the line holder and not on
the correct working of
SETWORD).
It is conceivable
that we could obtain the benefits that we have been
discussing without such a partial ordering, e.g. if all
the modules were on the same level. The partial ordering
gives us two additional benefits. First, parts of the
system are benefited (simplified) because they use the
services of lower 2 levels. Second, we are able to cut off
the upper levels and still have a usable and useful
product. For example, the symbol table can be used in
other applications; the line holder could be the basis of
a question answering system. The existence of the
hierarchical structure assures us that we can "prune"
off the upper levels of the tree and start a new tree on
the old trunk. If we had designed a system in which the
"low level" modules made some use of the "high level"
modules, we would not have the hierarchy, wewouldfind
it much harder to remove portions of the system, and
"level" would not have much meaning in the system.
Here "lower" means "lower numbered."
1057
Communications December 1972
of Volume 15
the ACM Number 12
Since it is conceivable that we could have a system
with the type of decomposition shown in version 1
(important design decisions in the interfaces) but
retaining a hierarchical structure, we must conclude
that hierarchical structure and "clean" decomposition
are two desirable but
independent
properties of a
system structure.
Conclusion
We have tried to demonstrate by these examples that
it is almost always incorrect to begin the decomposition
of a system into modules on the basis of a flowchart.
We propose instead that one begins with a list of
difficult design decisions or design decisions which are
likely to change. Each module is then designed to hide
such a decision from the others. Since, in most cases,
design decisions transcend time of execution, modules
will not correspond to steps in the processing. To
achieve an efficient implementation we must abandon
the assumption that a module is one or more sub-
routines, and instead allow subroutines and programs
to be assembled collections of code from various
modules.
Received August 1971; revised November 1971
References
1. Gauthier, Richard, and Pont, Stephen.
Designing Systems
Programs, (C),
Prentice-Hall, Englewood Cliffs, N.J., 1970.
2. Hoare, C. A. R. Proof of a program, FIND.
Comm. ACM 14,
1 (Jan. 1971), 39-45.
3. Parnas, D. L. A technique for software module specification
with examples.
Comm. ACM 15,
5 (May, 1972), 330-336.
4. Parnas, D. L. Information distribution aspects of design
methodology. Tech. Rept., Depart. Computer Science, Carnegie-
Mellon U., Pittsburgh, Pa., 1971. Also presented at the IFIP
Congress 1971, Ljubljana, Yugoslavia.
5. Dijkstra, E. W. The structure of "THE"-multiprogramming
system.
Comm. ACM 11,
5 (May 1968), 341-346.
6. Galler, B., and Perlis,
A. J. A View of Programming Languages,
Addison-Wesley, Reading, Mass., 1970.
7. Parnas, D. L. A course on software engineering. Proc. SIGCSE
Technical Symposium, Mar. 1972.
8. Parnas, D. L. On the criteria to be used in decomposing
systems into modules. Tech. Rept., Depart. Computer Science,
Carnegie-Mellon U., Pittsburgh, Pa., 1971.
9. Balzer, R. M. Dataless programming. Proc. AFIPS 1967
FJCC, Vol. 31, AFIPS Press, Montvale, N.J., pp. 535-544.
10. Mealy, G. H. Another look at data. Proc. AFIPS 1967 FJCC,
Vol. 31, AFIPS Press, Montvale, N.J., pp. 525-534.
11. Wulf, W. A., Russell, D. B., and Habermann, A. N. BLISS,
A language for systems programming.
Comm. ACM 14,
12 (Dec.
1971), 780-790.
1058
Communications
of
the ACM
December 1972
Volume 15
Number 12

Discussion

Here is a flowchart representation of this modularization: 
![](https://i.imgur.com/dj1isgi.png) Just listing out modules isn’t enough to start developing the system. A number of detailed specifications would be missing so that different teams can work on their parts without confusion. - **Data structures** – How will data be stored in memory? - **Pointer conventions** – How will different parts of the system reference the same data? - **Calling conventions** – How do functions interact? What parameters do they take? Here is a flowchart representation of this modularization: 
![](https://i.imgur.com/at1e3sP.png) Balzer [9], Mealy [10], and Wulf et al. [11] represent significant work from 1967-1971 regarding data structures and system organization. Balzer's paper on "Dataless programming" and Mealy's "Another look at data," both presented at the 1967 AFIPS conference, explored early ideas about data abstraction. The BLISS language, described by Wulf, Russell, and Habermann in 1971, put these principles into practice as a systems programming language that emphasized careful module design and data structure encapsulation. This textbook, "Designing Systems Programs”, focused on methods for building large systems software. Systems programs (like operating systems, compilers, and utility programs) were becoming increasingly complex in the early 1970s. The book aimed to establish ground rules for modular programming at a time when there weren't many practical guides available. Richard Gauthier and Stephen Pont were engineers at Honeywell's Computer Control Division, a major player in early computer systems. Their book came from hands-on experience building commercial systems, making it especially valuable when most programming literature was theoretical. ![](https://i.imgur.com/YNUKaOG.jpeg) Imagine you need to call a function in assembly language. Unlike modern languages where you just write `myFunction(x, y)`, in assembly you need precise instructions for how to set up that call. Here's a simplified example: ``` ; Modern high-level language: result = calculate(x, y) ; What actually happens in assembly: push ebp ; save old stack frame mov ebp, esp ; create new stack frame push x ; set up first parameter push y ; set up second parameter call calculate ; actual function call mov result, eax ; get return value pop ebp ; restore old stack frame ``` Parnas argues this calling sequence should be part of the same module as the function itself. Why? Because different machines might need different sequences, and as we find better ways to make the call, we want the function's author to be able to optimize both the function and how it's called. ### KWIC (Key Word In Context) A KWIC ("Key Word In Context") index was a key tool in libraries before full text search became widely available. It was a way to index content (books, articles, magazines, chapters, etc.). Researchers could find materials by searching for any significant word in titles. The KWIC index generates all circular shifts of a title's words, then sorts them alphabetically. Let's look at the title "Basic Principles of Software Engineering" (ignoring the stop word "of"): **Original line**: - Basic Principles Software Engineering **Circular shifts**: - Principles Software Engineering Basic - Software Engineering Basic Principles - Engineering Basic Principles Software **Sorted alphabetically**: - BASIC Principles Software Engineering - ENGINEERING Basic Principles Software - PRINCIPLES Software Engineering Basic - SOFTWARE Engineering Basic Principles KWIC was revolutionary for libraries because traditional card catalogs only let users search by the first word of a title. With KWIC, someone interested in "engineering" books could find relevant titles even if "engineering" wasn't the first word. The system was particularly valuable for technical and scientific libraries where specific terminology was important for research. Parnas describes how the circular shift module in the second modularization included more specification than necessary. While the implementation carefully hid the method of storing or calculating circular shifts, it unnecessarily specified their order in the list. The module could have been defined with just three core requirements: 1. All necessary circular shifts must exist in the table 2. No duplicates should be present 3. A function should exist to map from a shift back to its original line By prescribing a specific order for the shifts, the design ruled out some valid implementations. For example, one approach could have been to generate circular shifts in alphabetical order initially. In this case, ALPH (the function that sets up alphabetization) would be empty, and ITH (the function that returns the index of the ith alphabetically ordered shift) would simply return its input value since the shifts would already be in the desired order. This illustrates a key principle of modular design: modules should specify only what is necessary and hide implementation details. By exposing more information than needed (the order of shifts), the design unnecessarily restricted the class of systems that could be built without changing the module definitions. The software world of 1972 was struggling with growing pains. The field of software engineering was quite new, having just been named at a NATO conference in 1968. Most commercial code was written in assembly language or FORTRAN, and projects were getting too big to handle. The industry called this "the software crisis" - projects kept going over budget, missing deadlines, and shipping with lots of bugs. Some of the biggest software projects showed just how complex things were getting. IBM's OS/360 needed over 1000 developers. Multics, a large operating system project, had just wrapped up in 1969. Unix was brand new, having just been released at Bell Labs in 1971. A few important papers had started to question how software should be written, including Dijkstra's famous argument against GOTO statements in 1968, which pushed for more structured programming approaches. At this time, programmers still thought about software mainly as a linear sequence of steps. Most teams were so focused on just getting their programs to work that they weren't even thinking about making them maintainable. Hoare's FIND algorithm is a selection algorithm that finds the kth smallest element in an unordered list. While closely related to Quicksort (also invented by Hoare), it's more efficient when you only need to find a single element's position rather than sorting the entire array. To understand the difference between Hoare's FIND and full alphabetization, consider a librarian organizing books. Let's say we have six unsorted books: `”Dog Care", "Astronomy", "Cooking", "Biology", "Finance", "English”` If someone asks "What's the 3rd book alphabetically?": **Full Sort Approach (what the KWIC system uses)**: The librarian would fully sort all books: `”Astronomy", "Biology", "Cooking", "Dog Care", "English", "Finance”` Then report "Cooking" as the 3rd book Now all books are in perfect order for future queries **Hoare's FIND Approach**: - The librarian would just find the 3rd book without fully sorting - They might quickly separate books into "before C" and "after C" - Then identify "Cooking" as the 3rd book - The remaining books might still be unsorted: - `“Astronomy", "Biology", "Cooking", "Finance", "Dog Care", "English”`