Computer Analysis of Cultural Structures

David R. Heise

See Social Science Computer Review 6: 1, Spring 1988, for the final edited and paginated copy.

This article outlines a new approach to analyzing qualitative data such as are obtained in ethnographic and historical research. A computer program can conduct maximally efficient elicitations of information and organize data into cognitive structures which are visually represented in the form of directed graphs. Action grammars can be constructed, and these formal models of verbally defined happenings can be tested and refined through analysis of incidents. Keywords: ethnography, culture, modeling, data analysis.

After two decades in which the use of computers in social science meant "quantitative analysis," the benefits of computation are being discovered by qualitative researchers as well (Conrad &. Reinharz, 1984; Werner & Schoepfle, 1987). One reason is the microcomputer revolution, which has provided approachable hardware and powerful text ­processing software that helps manage field notes and transcriptions. Additionally, some researchers have developed software explicitly for qualitative analysis-of, for example, network data (Burt, 1976) and dichotomous properties (Ragin, Mayer, &. Drass, 1984)-proving that computers can crunch graphs and logical propositions as well as numbers.

Software could be developed further to aid ethnographers, historians, folklorists, and other researchers who study structures embedded in cultural texts of various kinds. Computer programs can elicit and organize verbal information into directed graphs--for example, taxonomic trees, part-whole charts, and causal diagrams. Routines can be included to allow production grammars to be formulated for analyzing narratives of events, and a computer can test and refine these structures empirically, then allow them to be used conjecturally to study possible happenings.

This paper identifies some goals for the computerization of qualitative analysis, pinpoints some of the problems, and outlines solutions which I have developed (in press). Then outcomes are illustrated with a sample of analysis using a famous folktale.

Analytic Goals

Efficient Elicitation

Materials for qualitative analysis might be acquired from lay consultants who volunteer information, more formally through interviews, or indirectly from field notes, recordings, and archives. In any case, a primary benefit of computerization is that a program can minimize the questions that have to be answered in order to define cultural structures logically and reliably.. Computers provide benefits in the elicitation of elements. Probe questions can be asked tirelessly and phrased as desired, and alternative phrasings can be interchanged on a random schedule. Probe questions also can incorporate context obtained from earlier answers in the elicitation.

However, the real value of computerization arises in questioning the relations of a new element to old elements, whereby the new entry is incorporated into the overall logical structure. In essence, to build a logical structure we must ask if a new element is implied by old elements, taking each old element in turn, then whether each old element is implied by the new element. For example, in a zoological elicitation this kind of questioning would be required to find that a new element, like collie, implies dog and mammal but does not imply bird, cat, bloodhound, or Siamese.

Mindless relational questions for every possible paired comparison could soon become overwhelming--for example, 600 questions would be required to add a new element to a set of 25 existing elements. A primary reduction is achieved when restricting attention to cultural domains in which implications are unidirectional: once we know that X implies Y, we do not need to ask if Y implies X, and the number of potential questions is reduced by half. The study of serial events constitutes a special case which offers additional opportunities for reduction in questioning. It can be assumed that later events are not prerequisites for earlier events, so one only has to determine which old elements are implied by the new element and half the conceivable questions are eliminated from consideration.

Beyond that, a computer program can make use of the logical structure that already has been elicited in order to eliminate the need for many questions. Logical derivations implicitly answer some questions, and therefore those questions need not be asked at all. Questions can be eliminated logically in two ways. The first method amounts to classical syllogistic reasoning: if we find that a new element X implies Y and we already know that Y implies Z, then we know without asking that X also implies Z. For example, if a consultant says that a collie is a dog, we presume without asking that a collie also is a mammal, assuming it has already been established that dogs are mammals.

The second technique depends on negations and is more subtle. Say that we discover that W does not imply X; then we might suppose without asking that anything that implies W also does not imply X. For example, if lizard is the new element and a consultant says that a dog is not a kind of lizard, then we presume without asking that a bloodhound is not a kind of lizard. Or say that we discover that new element X does not imply Z; then we might suppose without asking that X does not imply anything that implies Z. For example, on finding that a lizard is not a kind of cat, we presume without asking that a lizard cannot be a Siamese cat.

Applying both of these analytical stratagems to the logical structure which has been elicited previously reduces the amount of questioning dramatically. Unfortunately, results are sensitive to errors in answering the relational questions. The main problem in inferring answers from prior answers is that people may not have acknowledged an implication that exists when the implication involves a chain of reasoning. When a required chain of inference gets too long, a person may answer no though the answer to a relational question should be yes. Taking the no seriously and presuming additional answers from it then distorts the construction of the logical structure.

One solution to this problem is to derive answers to unasked questions only when the answers can be derived by applying the classical syllogism to prior yes answers, but not on the basis of denials of implication.. Reports of perceived implications are relatively reliable; it is denials of implications which are the suspect data. Inferring answers only from yes answers typically results in a more reliable elicitation of structure while still providing a substantial reduction in the amount of questioning.

Diagrams

An elicitation produces a list of elements and a list of connections between elements. A useful program would maintain both lists, updating as the elicitation proceeded, then storing the lists in data files when the elicitation was done so that the lists could be recalled later for further work and examination.

However, lists of elements and their connections are difficult to interpret, and a computer program ideally would provide diagrammatic representations. Logical structures can be represented as directed graphs. Nodes symbolize elements, and lines between nodes show relations--what implies what--directionality being signified by arrows or by vertical positioning. Lines that can be derived in transitive chains of implication need not be included. For example, if A implies B and B implies C, then lines A-B and B-C are drawn but a separate line from A to C is not drawn.

A graph can display the structure of a database in a transparent way that allows many analytical questions to be answered by direct inspection. Suppose, for example, that events are the elements of interest and that higher nodes imply lower ones. Then elements which are connected

directly to a higher node constitute the conjunctive set of prerequisites for the higher node; these subordinate events all have to occur before the superordinate event can occur. The elements which can be reached by tracing upward from an event are possible consequences of that event. Points where downward paths intersect constitute critical gateways which, once passed, can unfold a variety of happenings. Intersection points of upward paths show conditions which can be achieved only through a concert of prior doings.

The full-screen displays of microcomputers invite converting relational lists into diagrams. Not only is it easier to read the data this way, it also is easier to perform the functions of database management--adding, changing, and deleting elements and their relations. Editing may be accomplished directly on the graph by moving a cursor around in order to mark the nodes of interest.

Event Grammars

An important class of qualitative studies deals with narratives that can be transcribed as simple events--for example, "Doctor greets patient," "Doctor examines patient," and so on. The.1ogical structure in this case can be assessed with a relational question like "Is Doctor greets patient essential for Doctor examines patient?"

Applying production-system theory (Fararo &. Skvoretz, 1984; Heise, in press) can turn the resulting logical structure into a production grammar delineating event sequences. Essentially, this involves adding dynamic principles. An event cannot happen until its prerequisite events happen; an event depletes material conditions so that prerequisite events have to happen again if the event is to happen again; and an event cannot be repeated until it has been depleted--until its effects are used up by some other event that requires it. These rules and the implication structure can be taken as the starting point for understanding how events are generated in an incident; adjustments and exceptions are made to the rules as needed.

A production grammar--a logical structure plus auxiliary assumptions--may be refined by applying it to observed sequences of events. One goes through a sequence event by event and logs which events have occurred, which have unfulfilled prerequisites, and which ,need depletion before they can occur again. Usually, at some point an event is encountered which should not have happened according to the formulated model. Prerequisites for the event may not be fulfilled, or the event may have occurred before and not been depleted since its last occurrence.

Say an inconsistency arose because B occurred without a prior occurrence of prerequisite A. This inconsistency would be fixed if it were decided that A really was not a prerequisite for B. Sometimes it also would be fixed by deciding that A was not a prerequisite for another event, C, which had occurred recently and depleted A. Another kind of inconsistency, exemplified by A repeating even though nothing had happened to use it up since its last occurrence, would be fixed if it were decided that

A was a prerequisite for an event B which had happened between occasions of A, so that A actually was depleted and could happen again.

Inconsistencies can be handled by weakening auxiliary assumptions instead. If event A is unavailable to prime event B because A has been depleted recently by C, then we might decide that C has not depleted A even though A still is a prerequisite for C. Or if event A is repeating without being depleted, then we might decide simply that A is repeatable without depletion.

Finally, the data record itself can be fallible, and data might be corrected in order to eliminate a problem. For example, we may decide that an event B has not really occurred without the occurrence of its prerequisite A: instead, A has occurred without being recorded. Or, if an event A repeats a second time without being used up from the first time, we might decide that one of the recordings of A is in error, that A has occurred just once.

A computer program is the ideal tool for the meticulous reasoning required to ground a production grammar empirically. A program can work through a series of events automatically, recording which events have had their prerequisites fulfilled and which events have happened and not yet been exploited. When an unprimed or unused event is encountered, the program can interrupt the analysis, report the problem, and offer possible solutions, which might involve adjusting the logical structure, weakening a dynamic principle, or conceding an error in the data record. A computer can develop a complete list of solutions and implement the solution chosen by the analyst.

In principle, a computer program can list only solutions that are viable, checking every offered solution to a problem to make sure the suggestion works in the given context. For example, the program might

suggest making event B into a consequence of event A as a way of using up repetitions of A, but only after checking to make sure that an occurrence of B has been recorded between recent occurrences of A.

Event Priorities

A production-system model may include information about event priorities in order to deal with scheduling conflicts. The priorities rank events so that one event can be selected as a preferred implementation

when multiple events are enabled. Such information can be obtained by

analyzing an event series again after model and data are entirely consistent, this time noting which events are enabled simultaneously and which actually happen. The facts are recorded in a cumulative table, which is numerically processed at the end in order to provide an average priority score for each event. Sorting the scores provides a priority ranking of the events.

Simulations

A production-system model can be used to simulate happenings which fit the event grammar represented in the model. One application of this

is to continue testing a model beyond the formal sets of data which are

available. That is, one can make a model generate event series, evaluate the results for credibility, and change the model in order to eliminate seemingly impossible sequences.

Once confidence in a model is secure, the model can be used to simulate happenings that theoretically might occur. Such happenings fit the grammar of such incidents because the model is a formalization of knowledge about such events, and the computer is acting as an "artificially intelligent" expert speculating on possible futures.

Another potential application of a model is in assessing individual variations in prioritization. A user might be allowed to select events for "implementation" from the set of events that are enabled at a given time, thereby creating imaginary series of events which incorporate the respondent's preferences. The fabricated series could be analyzed to provide priority rankings of events for each respondent.

Sample Analysis

An existing program, ETHNO (Heise, 1987), is used to illustrate a computerized analysis of qualitative data. The illustration covers only selected highlights, but the same analysis is reported in detail elsewhere (ibid.). The example specifically shows that these ideas might be used for the analysis of archival texts in specialties like folklore and literary criticism as well as for the creation of ethnographic models by anthropologists and sociologists. The narrative is the familiar story "Little Red Riding Hood," in an authentic peasant version translated from the French and presented by Robert Darnton (1984) in his prize-winning book on French cultural history. (The original tale is more lurid and bloody than the tale recited to American children. Darnton gives an account of how the bowdlerized version came to us.)

Table 1 shows Darnton's text of the fable in one column and my rendition of events for ETHNO in the adjoining column. The first entry into ETHNO is a name for the general event or incident--"little red riding hood" in this case. The second entry is the first event. Thereafter the user switches back and forth between typing event descriptions and answering questions about how events are related. For example, here is the question that appears after the third element is acquired:

Is Mo sends bread & milk to grmo (or a similar event) essential for Mo sends girl to grmo?

This same frame is used over and over with various event descriptions: Is____ (or a similar event) essential for_____? What is asked is whether the earlier event must happen in order for the most recent event to occur--whether the earlier event is a prerequisite for the last event. If the earlier event is a necessity, then the occurrence of the last event implies the occurrence of the earlier one, and a diagram can show a line descending from one to the other. (The phrase or a similar event allows for the possibility of a set of disjunctive prerequisites.) Procedures continue in a similar way. ETHNO elicits the next event, then elicits information about how that event depends on others. As more information is acquired, ETHNO infers some answers in order to minimize questions.

The program presents diagrams of the growing structure continuously during the elicitation, though it is not practical to illustrate it here. Figure 1 shows the diagram at the end of the data entry phase.

The number of levels (with levels corresponding to the text-filled lines) measures the length of the longest chain and thus how elaborate the incident is. Just below the name of the scene (the uppermost element) are events whose completion ends the incident--goals or terminal consequences. A pinch in a diagram (as at "Wal," at the bottom of Figure 1) pinpoints an event that is a requirement for everything that follows.

Of course, the diagram also contains complete information about implications. For example, Figure 1 reveals that once the girl has entered Grandmother's house and greeted the wolf, her cannibalism disconnects from her disrobing and from the wolf's devouring her, each of these sequences of events running its own course without help from the others. On the other hand, the wolf's killing of Grandmother is required for every terminal event except the girl's handing over bread and milk.

A logical structure like that in Figure 1 specifies which events are essential for others, and with auxiliary principles mentioned previously it also serves as a production grammar that regulates event sequence. ETHNO provides a series analysis in order to test the explanatory model, consisting of implication structure and auxiliary principles, against the recorded sequence of events and allows the analyst to correct problems until model and data fit together consistently.

ETHNO'S series analysis proceeded all the way to the wolf's second query in the forest before a problem occurred. At that point ETHNO printed the following:

PROBLEM! Conditions for this event are not fulfilled. Is
Wolf approaches girl

NOT required for:

8 Wolf queries girl
Enter PGDN to skip, or number of nondependent event.

ETHNO stated the problem and immediately indicated one possible solution: withdrawing Wolf approaches girl as a prerequisite for Wolf queries girl. It was not a good solution because it ignored physical reality, but. ETHNO left that judgment to the analyst, merely suggesting a change that would fix the model officially.

ETHNO then presented its second suggestion about how to fix the problem:

Can
Wolf queries girl
happen without depleting
Wolf approaches girl
(y or n)?

Wolf approaching is the only direct prerequisite for wolf querying, and the wolf did approach the girl, but ETHNO thought that the consequences got used up the first time the wolf queried her. Now ETHNO allowed that there would not be a problem if the first query did not use up the approach. If querying did not deplete approaching, then the wolf's approach was still valid as a fulfilled prerequisite for the second query. This seemed to be the sensible solution, and indeed ETHNO offered no more suggestions about how to fix the problem. I had ETHNO implement the solution by tagging the relation between the two events as nondepletive.

ETHNO snagged on the next event as well, the second occurrence of Girl answers wolf, with the following message:

PROBLEM! This event undepleted since last occurrence.
Maybe a depleting event was unrecorded:
10 Wolf precedes girl to grmo
Enter PGDN to skip, or a number above.

One of the auxiliary principles is that events are not repeated until effects from the last occurrence are used up by some consequence. The problem here was that Girl answers wolf (Ans). was repeating without an intervening occurrence of its only consequence, Wolf precedes girl to grmo (Pre).

ETHNO'S first suggestion was that the data record might need to be corrected, that the consequence did occur without being recorded and that it was okay for Girl answers wolf to happen again. If this suggestion were accepted, ETHNO would insert an instance of Pre between the two cases of Ans, automatically reanalyze events from the beginning, and find no problem this time when it got to the second Ans. In actuality, though, this was not an acceptable solution to the problem, because the wolf did not run to Grandmother's and then run back to ask the second question.

ETHNO'S other suggestion was as follows:

Can this event be repeated without depletion? (y or n)

Here ETHNO allowed that the auxiliary assumption that events had to be used up before repeating simply might not apply in the case of Girl answers wolf. That struck me as reasonable--she would probably keep answering the wolf as long as he kept asking her questions--so I typed a Y. Thereupon ETHNO coded Ans as repeatable and continued the analysis.

As additional problems arose, ETHNO always provided at least one viable solution. Once it was necessary to specify a commutation, a bidirectional relation. The one in the example is not very transparent (Heise, 1987), but the general idea is simple. For example, if a model included leaving Grandmothers cottage, then: it would require arriving there, and arriving a second time would require leaving after the first arrival. ETHNO allows for such bidirectional relations between events on an ad hoc basis.

Processing during the series analysis changed the structure and provided a number of ad hoc adjustments. Figure 2 shows the result. Note that the printout now includes a set of statements about special relations in the model and that some of the events are tagged as repeatable (flip-flop is ETHNO's term for a bidirectional relation).

After the program confirms that events fit a model, it asks whether priorities are to be computed. Answering yes initiates a priorities analysis. ETHNO goes back through the event series, determines which events have been simultaneously possible at each point in time, and records which of the events have occurred before others. By this processing, still another kind of information is extracted from the event sequence by the program: which events have priority when more than one can happen at once.

The following is the list that I got when analyzing the Little Red Riding Hood story:

We see that Girl eats meat & Wine has a higher ranking than Girl gets in bed with grandmother/wolf, which in turn ranks higher than Girl departs from grandmother's house. According to testimony provided by this story, when a little girl has the option, she will feast before getting in bed with Grandmother and she will get in bed with Grandmother rather than go home. Feasting on meat and wine got a high ranking because it occurred as soon as its prerequisites were fulfilled. As soon as this event was possible, it happened, and none of the other events that were possible at the moment took precedence. Girl leaving grandmother ranks as low as it does because it is possible anytime after the girl delivers her gift, yet one thing after another intervenes and the girl defers leaving in favor of other events until ultimately it is too late to leave at all.

Priority is supposed to correspond directly with motivation, and high-priority events--those which are selected when there is a choice--should be the preferred and valued actions. Yet the good little girl in this story shows a distinct preference for deviant and shocking acts, while her more normal behavior tends to be deferred and delayed. Meanwhile, "the big bad wolf" in this story seems to loiter with and savor relatively innocuous acts, and he puts off much heinous behavior. It is as if this folktale were revealing the appropriateness of each act for each actor by inverted orderings of priorities--an interesting hypothesis to test through analysis of other folktales.

Conclusion

Computers allow a new approach to qualitative data analysis in the social sciences. I do not mean a new way of doing tabulations or other numerical analyses involving "qualitative variables." Rather, analysts now may produce graphs for studying conceptual structures and grammars for interpreting events and for exploring potential event sequences. These models are based on symbolic logic more than on arithmetic, and they are verified nonstatistically through validation and refutation in data.

Note

David R. Heise is Professor of Sociology at Indiana University. Correspondence may be addressed to him at the Department of Sociology, Indiana University, Bloomington, IN 47405 (telephone: 812-334-7963).

References

Burt, R. S. (19761. Positions in networks. Social Forces, 55, 93-122.

Conrad, P., &.Reinharz, S.jEds.).jI984). Computers and qualitative data [Special issue], Qualitative Sociology, 7(l, 2).

Damton, Robert.(I984). The great cat massacre and other episodes in French cultural history. New York: Basic Books.

Fararo, T: J., &. Skvoretz, J.(1984). Institutions as production systems. Journal of Mathematical Sociology, 10, 117-182.

Heise, D. R. (in press). Modeling event structures. Journal of Mathematical Sociology.

Ragin, C. C., Mayer, S. E., &. Drass, K. A. (1984). Assessing discrimination: A Boolean approach. American Sociological Review, 49: 221-234.

Werner, 0., &. Schoepfle, G. M.(1987). Systematic fieldwork: Foundations of ethnography and interviewing (2 vols.). Beverly Hills, CA: Sage.

Software Cited

Heise, D. R.. (I987). ETHNO. Program and documentation. Raleigh, NC: National Collegiate Software Clearinghouse, NCSU Box 8101, Raleigh, NC 27695. $23.00