Charles Goodwin's Practices of Seeing (2000)

Practices of Seeing:
Visual Analysis: An Ethnomethodological Approach

Charles Goodwin Applied Linguistics UCL A
Pp. 157-182 in
Handbook of Visual Analysis
edited by Theo van Leeuwen and Carey Jewitt London: Sage Publications
© Charles Goodwin

Practices of Seeing
Visual Analysis: An Ethnomethodological Approach

Charles Goodwin
A primordial site for the analysis of human language, cognition and action consists of a situation in which multiple participants are attempting to carry out courses of action together while attending to each other, the larger activities that their current actions are embedded within, and relevant phenomena in their surround. Vision can be central to this process.1. The visible bodies of participants provide systematic, changing displays about relevant action and orientation. Seeable structure in the environment can not only constitute a locus for shared visual attention, but can also contribute crucial semiotic resources for the organization of current action (consider for example the use of graphs and charts in a scientific discussion). For the past thirty years both Conversation Analysis and Ethnomethodology have provided extensive analysis of how human vision is socially organized. Both fields investigate the practices that participants use to build and shape in concert with each other the structured events that constitute the lifeworld of a community of actors. Phenomena investigated in which vision plays a central role range from sequences of talk, to medical and legal encounters, to scientific knowledge.
The approach taken by both ethnomethodology and conversation analysis to the study of visual phenomena is quite distinctive. At least since Saussure proposed studying langue as an analytically distinct subfield of a more encompassing science of signs, different kinds of semiotic phenomena (language, visual signs, etc.) have typically been analyzed in isolation from each other. However in the work to be described here neither vision, nor the images or other phenomena that participants look at, are treated as coherent, self-contained
1 Vision is not, however, essential as both the competence of the blind and telephone conversations demonstrate. Below it will be argued that situated action is accomplished through the juxtaposition of multiple semiotic fields, only some of which make vision relevant.
domains that can be subjected to analysis in their own terms. Instead it quickly becomes apparent that visual phenomena can only be investigated by taking into account a diverse set of semiotic resources and meaning-making practices that participants deploy to build the social worlds that they inhabit and constitute through ongoing processes of action. Many of these, such as structure provided by current talk, are not in any sense visual, but the visible phenomena that the participants are attending to cannot be properly analyzed without them. The focus of analysis is not thus not representations or vision per se, but instead the part played by visual phenomena in the production of meaningful action.
Both the methodology and the forms of analysis used in this approach can best be demonstrated through specific examples.
Gaze between Speakers and Hearers
In formulating the distinction between competence and performance Chomsky (1965: 3-4) argued that actual speech is so full of performance errors, such as sentence fragments, restarts and pauses, that both linguists and parties faced with the task of acquiring a language should ignore it. Investigating a corpus of conversation recorded on video Goodwin (1980a, 1981, Chapter 2) indeed found precisely the “false starts” and “changes of plan in mid-course” that Chomsky describes. In the following instead of producing an unbroken grammatical sentence the speaker says:2
2 Talk is transcribed using the system developed by Gail Jefferson (see Sacks, Schegloff and Jefferson Sacks, et al. 1974: 731-733). Talk receiving some form of emphasis is marked with underlining or bold italics. Punctuation is used to transcribe intonation: A period indicates falling pitch, a question mark rising pitch, and a comma a falling contour, as would be found for example after a non-terminal item in a list. A colon indicates lengthening of the current sound. A dash marks the sudden cut-off of the current sound (in English it is frequently realized as glottal stop). Comments (e.g., descriptions of relevant nonvocal behavior) are printed in italics within double parentheses. Numbers within single parentheses mark silences in seconds and tenths of a second. A degree sign (°) indicates that the talk that follows is being spoken with low
Cathy: En a couple of girls- One other girl from there,
However, when the video is examined it is found that the restart occurs at a specific place: precisely at the point where the speaker brings her gaze to her addressee, and finds that her addressee is looking elsewhere:
Speaker Brings Gaze to Recipient
Pam: En a couple of girls- One other girl from the:re,
page4image5352 page4image6280 page4image6704
page4image7400 page4image7824
Hearer Looking Away
Gaze Arrives
Hearer Starts Moving Gaze to Speaker
Moreover, the restart acts as a request for the Hearer’s gaze. Thus immediately after the restart the hearer starts to move her gaze to the speaker.
Paradoxically, if the speaker had not produced a restart at this point she could have said something that would appear to be an unbroken grammatical unit if one examined only the stream of speech (e.g., “En a couple of girls from there ...”), but which would in fact be interactively a sentence fragment since her addressee attended to only part of it.
The identities of speaker and hearer are the most generic participant categories relevant to the production of a strip of talk. The phenomena examined here (which occur pervasively in conversation) provide evidence that the work of
volume. Left brackets connecting talk by different speakers mark the point where overlap begins.
being a hearer in face-to-face interaction requires situated use of the body, and gaze in particular, as a way of visibly displaying to others the focus of one’s orientation. Moreover speakers not only use their own gaze to see relevant action in the body of a silent hearer, but actively change the structure of their emerging talk in terms of what they see.
What relevance do processes such as these have to the other issue raised by Chomsky (1965 :3), that of determining “from the data of performance the underlying system of rules mastered by the speaker-hearer”? Many repairs involve the repetition, with some significant change, of something said elsewhere in the utterance:
We wen t- I went to
If he could- If you could

Such repetition has the effect of delineating the boundaries and structure of many different units in the stream of speech. Thus, by analyzing what is the same and what is different in these examples one is able to discover: First, where the stream of speech can be divided into significant subunits; second, that alternatives are possible in a particular slot; third, what some of these alternatives are (here different pronouns); and fourth, that these alternatives contrast with each other in some significant fashion, or else the repair would not be warranted. Repairs in other examples not only delineate basic units in the stream of speech (noun phrases for example), but also demonstrate the different forms such units can take, and the types of operations that can be performed upon them (see Goodwin 1981 :170-173). Repairs further require that a listener learn to recognize that not all of the sequences within the stream of speech are possible sequences within the language, e.g., that “I” does not follow “to” in “We went t- I went to ...”. In order to deal with such a repair a hearer is thus required to make one of the most basic distinctions posed for anyone attempting to decipher the structure of a language: to differentiate what are and are not possible sequences in the language, that is between grammatical and ungrammatical structures. The fact that this task is posed may be crucial to any learning process. If the party attempting to learn the language did not have to
deal with ungrammatical possibilities, if for example she was exposed to only well-formed sentences, she might not have the data necessary to determine the boundaries, or even the structure of the system. Chomsky’s argument that the repairs found in natural speech so flaw it that a child is faced with data of very “degenerate quality” does not appear warranted. Rather it might be argued that if a child grew up in an ideal world where she heard only well-formed sentences she would not learn to produce sentences herself because she would lack the analysis of their structure provided by events such as repair. Crucial to this process is the way in which visual phenomena, such as dispreferred gaze states can both lead to repair, and demonstrate that the participants are in fact attending in fine detail to what might appear to be quite ephemeral structure in the stream of speech.
What has just been described provides one example of the methodologies and forms of analysis used to investigate visual phenomena within Conversation Analysis. Several observations can be made. First, the focus of analysis is not visual events in isolation, but instead the systematic practices used by participants in interaction to achieve courses of collaborative action with each other in the present case the interactive construction of turns at talk, and the utterances that emerge with those turns. Visual events, such as gaze, play a central role in this process but their sense and relevance is established through their embeddedness in other meaning making tasks and practices, such as the production of a strip of talk that is in fact heard and attended to by its addressee. This links vision to a host of other phenomena including language and the visible body as an unfolding locus for the display of meaning and action. Second, what the analyst seeks to do is not to provide his or her own gloss on how visual phenomena might be meaningful, but instead to demonstrate how the participants themselves not only actively orient to particular kinds of visual events (such as states of gaze), but use them as a constitutive feature of the activities they are engaged in (for example by modifying their talk in terms of what they demonstrably see). Third, in addition to the spatial dimension that is naturally associated with vision, these processes also have an intrinsic temporal dimension as changes in visual events are marked by, and lead to, ongoing changes in the organization of emerging action. If one had only a static snapshot,
or measured only a single structural possibility, such as mutual gaze instead of looking at the temporally unfolding interplay of different combinations of participant gaze, the type of analysis being pursued here would be impossible. Fourth, such analysis requires data of a particular type, specifically a record that maintains as much information as possible about the setting, embodied displays and spatial organization of all relevant participants, their talk, and how events change through time. In practice no record is completely adequate. Every camera position excludes other views of what is happening. The choice of where to place the camera is but the first in a long series of crucial analytical decisions. Despite these limitations a video or film record does constitute a relevant data source, something that can be worked with in an imperfect world.
Fifth, crucial problems of transcription are posed. The task of translating the situated, embodied practices used by participants in interaction to organize phenomena relevant to vision poses enormous theoretical and methodological problems. Our ability to transcribe talk is built upon a process of analyzing relevant structure in the stream of speech, and marking those distinctions with written symbols, that extends back thousands of years, and is still being modified today (for example the system developed by Gail Jefferson (Sacks, Schegloff and Jefferson 1974: 731-733) for transcribing the texture of talk-in-interaction, including phenomena, such as momentary restarts and sound stretches, that are crucial for the analysis being reported here). When it comes to the transcription of visual phenomena we are at the very beginning of such a process. The arrows and other symbols I’ve used to mark gaze on a transcript (see Goodwin 1981) capture only a small part of a larger complex constituted by bodies interacting together in a relevant setting. The decision to describe gaze in terms of the speaker-hearer framework is itself a major analytic one, and by no means simple, neutral description. Moreover a gazing head is embedded within a larger postural configuration, and indeed different parts of the body can simultaneously display orientation to different participants or regions (see Kendon 1990b, Schegloff 1998), creating participation frameworks of considerable complexity. Thus on occasion a transcriber wants some way of indicating on the printed page posture and alignment. In addition, not only the bodies of the participants, but also phenomena in their surround, can be crucial
to the organization of their action. To try to make the phenomena I’m analyzing independently accessible to the reader so that she or he can evaluate my analysis, I’ve experimented with using transcription symbols, frame grabs, diagrams, and movies embedded in electronic versions of papers. Multiple issues are involved and no method is entirely successful. On the one hand the analyst needs materials that maintain as much of the original structure of the events being analyzed as possible, and which can be easily and repetitively replayed. On the other hand, just as a raw tape recording does not display the analysis of segmental structure in the stream of speech provided by transcription with a phonetic or alphabetic writing system, in itself a video, even one that can be embedded within a paper, does not provide an analysis of how visible events are being parsed by participants. The complexity of the phenomena involved requires multiple methods for rendering relevant distinctions (e.g., accurate transcription of speech, gaze notation, frame grabs, diagrams, etc., see also Ochs 1979). Moreover, like the two-faced Roman god Janus, any transcription system must attend simultaneously to two separate fields, looking in one direction at how to accurately recover through a systematic notation the endogenous structure of the events being investigated, while simultaneously keeping another eye on the addressee/reader of the analysis by attempting to present relevant descriptions as clearly and vividly as possible. In many cases different stages of analysis and presentation will require multiple transcriptions. There is a recursive interplay between analysis and methods of description.
Work in Conversation Analysis has provided extensive study of how the gaze of participants toward each other is consequential for the organization of action within talk-in-interaction. Phenomena investigated include the way in which speakers change the structure of an emerging utterance, and the sentence being constructed within it, as gaze is moved from one type of recipient to another, so that the utterance maintains its appropriateness for its addressee of the moment (Goodwin 1979, 1981); how speakers modify descriptions in terms of their hearer’s visible assessment of what is being said (M.H. Goodwin 1980b); how genres such as stories are constructed not by a speaker alone, but instead through the differentiated visible displays of a range of structurally different kinds of recipients (speaker, primary addressee, principal character, etc. See
Goodwin 1984); the organization of gaze and co-participation in medical encounters (Heath 1986, Robinson 1998); the interactive organization of assessments (Goodwin and Goodwin 1987), gesture (Goodwin in press, Streeck 1993, 1994), the use of gaze in activities such as word searches (M.H. Goodwin and C. Goodwin 1986), etc. Though not strictly lodged within Conversation Analysis the work of Kendon (1990a. 1994, 1997) on both the interactive organization of bodies as they frame states of talk, and on gesture, is central to the study of visible behavior in interaction. Haviland (1993) provides important analysis of the interactive organization of gesture within narration (for extensive analysis of gesture from a psychological perspective see McNeill 1992).
Scientific Images
The visible, gazing body, and the orientation of participants toward each other as they co-produce states of talk is central to the work in ConversationalAnalysis just examined. By way of contrast much work within Ethnomethodology has focused not on the bodies of actors, but instead on the images, diagrams, graphs and other visual practices used by scientists to construct the crucial visual working environments of their disciplines. As noted by Lynch and Woolgar (1990:5):
Manifestly, what scientists laboriously piece together, pick up in their hands, measure, show to one another, argue about, and circulate to others in their communities are not “natural objects” independent of cultural processes and literary forms. They are extracts, “tissue cultures,” and residues impressed within graphic matrices; ordered, shaped, and filtered samples; carefully aligned photographic traces and chart recordings; and verbal accounts. These are the proximal “things" taken into the laboratory and circulated in print and they are a rich repository of “social” actions.”
Despite important differences in subject matter and methodology both fields emphasize the importance of focusing not on representations or other visual phenomena as self-contained entities in their own right, but instead on how they are constructed, attended to, and used by participants as components of the
endogenous activities that make up the lifeworld of a setting. Thus, in introducing their important volume on Representation in Scientific Practice Lynch and Woolgar (1990: 11) define their inquiry as follows:
Instead of asking “what do we mean, in various contexts, by ‘representation’?” the studies begin by asking, “What do the participants, in this case, treat as representation?”
Note that what must be investigated is specified both in terms of the orientation of the participants, and with respect to the features of the relevant local setting (e.g., “in this case”). This leads to a distinctive ethnomethodological perspective on reflexivity:
“Reflexivity” in this usage means, not self-referential nor reflective awareness of representational practice, but the inseparability of a “theory” of representation from the heterogeneous social contexts in which representations are composed and used” (Lynch and Woolgar 1990 12).
In a classic article Lynch (1990 :153-154) formulates the task of analyzing scientific representations as that of describing the publicly visible “externatized retina” that is the site for the practices implicated in the social constitution of the objects that are the focus of scientific work:
This study is based on the premise that visual displays are more than a simple matter of supplying pictorial illustrations for scientific texts. They are essential to how scientific objects and orderly relationships are revealed and made analyzable. To appreciate this, we first need to wrest the idea of representation from an individualistic cognitive foundation, and to replace a preoccupation with images on the retina (or alternatively ‘mental images’ or ‘pictorial ideas’) with a focus on the ‘externalized retina’ of the graphic and instrumental fields upon which the scientific image is impressed and circulated.
Using as data images from scientific journal articles and books Lynch describes two families of practices used to constitute the visible scientific object: “selection” and “mathematization.” Selection, illustrated through double images in which a photograph and a diagram of entities visible in the photograph are presented side by side, is described as a host of practices that iteratively transform one
image of an entity into another (e.g. the photograph to the diagram) while simultaneously structuring and shaping what it is that is being represented. Crucial to this process is that fact that different selective/shaping practices, including Filtering, Uniforming, Upgrading and Defining can be repetitively applied creating not just a single image, but a linked, directional chain of representations Indeed much of the work of actually doing science consists in building and shaping what Latour (1986) (see also Latour and Woolgar 1979) have called inscriptions in this fashion. “Mathematization” refers not simply to the use of numbers, but instead to the host of practices used to transform recalcitrant events into mathematically tractable visual and graphic displays e.g., graphs, charts and diagrams. Thus an image showing a map of lizard territories is assembled through, among other operations, driving stakes into the lizards’ environment to create a grid for measurement (and thus injecting a scientifically relevant Cartesian space into the very habitat being studied), repetitively capturing lizards, distinguishing them from each other by cutting off a different pattern of toes on each lizard, recording each capture on a paper map of the staked out territory, and finally drawing lines around collections of points to create the map. As noted by Lynch (1990: 171) the product of these practices, e.g., the published map, “is a hybrid object that is demonstrably mathematical, natural and literary.” Note how in all of these cases the focus of analysis is on the contextually based practices of the participants who are assembling and using these images to accomplish the work that defines their profession.
Though emerging from psychological anthropology, rather than ethnomethdology, Hutchins’ (1995) ground breaking study of the cognitive practices required to navigate a ship outlines a major perspective for the analysis of both images and seeing as forms of work-relevant practice. Hutchins demonstrates how the practices required to navigate a ship are not situated within the mental life of a single individual, but are instead embedded within a distributed system that encompasses visual tools such as maps and instruments for juxtaposing a landmark and compass bearing within the same visual field, and actors in structurally different positions who use alternative tools and, in part because of this, perform different kinds of cognitive operations, many of
which have a strong visual component (e.g., locating landmarks, plotting positions on a map, etc.).
Images in Interaction
All of the work discussed so far takes as its point of departure for the investigation of visual phenomena the task of describing and analyzing the practices used by participants to construct the actions and events that make up their lifeworld. Rather than standing alone as a self-contained analytic domain, visual phenomena are constituted and made meaningful through the way in which they are embedded within this larger set of practices. However, within this common focus, two quite different orders of visual practice have been examined. Research in science studies has investigated the images produced by scientists, and the way in which they visually and mathematically structure the world that is the focus on their inquiry, without however looking in much detail at how scientists attend to each other as living, meaningful bodies, or structure what they are seeing through the organization of talk-in-interaction. By way of contrast studies of the interactive organization of vision in conversation looked in considerable detail at how participants treat the visual displays of each other’s bodies as consequential, and how this is relevant to the moment-by- moment production of talk, but did not focus much analysis on images in the environment. Clearly all of the phenomena noted the visible body, participation, gesture, the details of talk and language use, visual structure in the surround, images, maps and other representational practices, the public organization of visual practice within the worklife of a profession, etc. are relevant. The question arises as to whether it is possible to analyze such disparate phenomena within a coherent analytic framework.
Before turning to studies that have probed such questions several issues must be noted. First, it is clearly not the case that the only acceptable analysis is one that includes this full range of all possible visual phenomena. Both participants and the structures that provide organization for action and events use visible phenomena selectively. Parties speaking over the telephone can see neither either other’s bodies nor events in a common surround. A scientific journal can be read in the absence of the parties who constructed its text and diagrams. More
interestingly within face-to-face interaction participants can continuously shift between actions that invoke, and perhaps require, gaze toward specific events in the surround, and those make relevant gaze toward no more than each other’s bodies, and even in this more limited case there may be a real issue as to whether it is relevant to attend to everything that a body does, e.g., some gestures made by a speaker may not require gaze toward them from an addressee. There is thus an essential contingency, not only for the analyst but more crucially for the participants themselves, as to what subset of possible visual events are in fact relevant to the organization of the actions of the moment. Moreover, this means that in addition to investigating how different kinds of visible phenomena are organized, the analyst must also take into account how participants show each other what kinds of events they are expected to take into account at a particular moment, for example to indicate that a participant, gesture, or entity in the surround should be gazed at. There is thus not only communication through vision, but also ongoing communication about relevant vision (Goodwin 1981, 1986; in preparation, Streeck 1988).
Second visual events are quite heterogeneous, not only in what they make visible, but more crucially in their structure. Consider for example the issue of temporality. Both gestures and the displays of postural orientation used to build participation frameworks are performed by the body within interaction. However, while gestures, like the bits of talk they accompany, are typically brief (e.g. they frequently fall within the scope of a single utterance) and display semantic content relevant to the topic of the moment, participation displays frame extended strips of talk and typically provide information about the participants’ orientation rather than the specifics of what is being discussed. Bodily displays with one kind of temporal duration (and information content) are thus embedded within another class of visual displays being made by the body which have a quite different structure.
Third, the structure of visual signs, including their possibilities for propagation through space and time, can be intimately tied to the medium used to construct them. A major theme of Shakespeare’s sonnets focuses on the contrast between the temporally constrained human body, condemned to inevitable decay, and the (limited) possibilities for transcending such corruption
provided by language inscribed on the printed page which can remain fresh and alive long after its author and subject have passed into dust. This contrast between the temporal possibilities provided by alternative media (e.g., the body and documents) constitutes an ongoing resource for participants in vernacular settings as they build, through interaction with each other, the events that make up their lifeworld. In addition to the displays made by a fleeting gesture or local participation framework, participants also have access to images and documents which can encompass multiple interactions and quite diverse settings. This arises in part from the specific media used to constitute the signs they contain. Rather than being lodged within an ever changing human body, such documents constitute what Latour (1987: 223) has called immutable mobiles, portable material objects that can carry stable inscriptions of various types from place to place and through time.
However, despite the way in which crucial aspects of the structure of images and documents remain constant in different environments, they are not self- contained visual artifacts that can be analyzed in isolation from the processes of interaction and work practices through which they are made relevant and meaningful. The same image or document can be construed in quite different ways in alternative settings. For example, a schedule listing all arriving and departing flights was a major tool for almost all workgroups at the airport studied by the Xerox PARC workplace project (Brun-Cottan et al. 1991, Goodwin and Goodwin 1996, Suchman 1992), and indeed it linked diverse workers throughout North America into a common web of activity. However while baggage loaders carefully structured their work to anticipate arriving flights, so that planes could be speedily unloaded, these same arrival times were almost ignored by gate agents looking at the same schedule, but concerned with the departure of passengers. Each work group highlighted the common document in ways relevant to the specific work tasks it faced. Similarly, on the oceanographic ship reported in Goodwin (1995) a map showing where samples would be taken in the Atlantic at the mouth of the Amazon, was a major document at all stages of the research project. Before the ship sailed the places where samples could be taken was the focus of intense political debate between different groups of scientists and the Brazilian and American governments; after the project was
completed the map provided an infrastructure for graphic displays that could be used in published journal articles to show what the scientists had found about how the waters of the Amazon and the Atlantic interacted with each other, i.e., a way of making visible relevant scientific phenomena; during the voyage itself the map not only provided a common framework for the quite different work of various teams of scientists and the crew navigating the ship, but could also be looked at by lab technicians not able to go to bed for days at a time because of the map’s incessant sampling demands, to locate places where stations were far apart and rest was possible. In brief, though the material form of images and documents gives them an extended temporal scope, and the ability to travel from setting to setting, they cannot be analyzed as self-contained fields of visually organized meaning, but instead stand in a reflexive relationship to the settings and processes of embodied human interaction through which they are constituted as meaningful entities. To explicate such events analysis must deal simultaneously with the quite different structure and temporal organization of both local embodied practice and enduring graphic displays.
Finally, the visual (and other properties) of settings structure environments that shape, on an historical time scale, the activities systematically performed within those settings. A very simple example is provided by the bridge of the oceanographic ship which not only had a window facing forward so the helmsman could steer the ship and watch for trouble, but also a window facing backwards. This was used by a winch operator who had the task of lifting heavy instrument packages in and out of the sea. Though being used here to do science, this arrangement is in fact a systematic solution to a repetitive problem faced by sailors, such as fishermen using nets, who have to maneuver heavy objects while a sea. Solutions found to these tasks, such as the rear facing window with the visual access it provides (as well as the forward window facilitating navigation), are built into the tools that constitute the work environments used by subsequent actors faced with similar tasks. See Hutchins (1995) for illuminating analysis of this process, including tools that visually structure complex mathematical calculations, as well as maps. Both work environments and many of the tools used within them (computer displays, etc.) structure in quite specific ways the embodied visual practices of those who inhabit such settings.
In an attempt to come to terms with such issues Goodwin (in press) has proposed that images in interaction are lodged within endogenous activity systems constituted through the ongoing, changing deployment of multiple semiotic fields which mutually elaborate each other. The term semiotic field is intended to focus on signs-in-their-media, i.e., the way in which what is typically been attended to are sign phenomena of various types (gestures, maps, displays of bodily orientation, etc.) which have variable structural properties that arise in part from the different kinds of materials used to make them visible (e.g., the body, talk, documents, etc.). Bringing signs lodged within different fields into a relationship of mutual elaboration produces locally relevant meaning and action that could not be accomplished by one sign system alone. Consider for example a place on a map indicated by a pointing finger which is being construed in a specific fashion by the accompanying talk. Neither the map as a whole, that is a self-sufficient representation, nor the pointing finger in isolation from a) its target (the spot on the map) and b) the construal being provided by the talk, nor the talk alone would be sufficient to constitute the action made visible by the conjoined use of the three semiotic fields, each of which provides resources for specifying how to relevantly see and understand the others (see the brief discussion of the Rodney King data below for a specific example; see Goodwin in preparation for more detailed analysis of pointing). The particular subset of semiotic fields available in a setting that participants orient to as relevant to the construction of the actions of the moment constitutes a contextual configuration. As interaction unfolds contextual configurations can change as new fields are added to, or dropped from, the specific mix being used to constitute the events of the moment. Thus, as contextual configurations change there is both unfolding public semiotic structure and contingency(and indeed in some circumstances actions can misfire when addressees fail to take into account a relevant semiotic field, such as the sequential organization provided by a prior unheard utterance see Goodwin in preparation for an example).
Professional Vision
Work settings provide one environment in which the interplay between situated, embodied interaction, and the use of visual images of different types, can be
systematically investigated. In many work settings participants face the task of classifying visual phenomena in a way that is relevant to the work they are charged with performing. Frequently they must also construct different kinds of representations of visual structure in the environment that is the focus of their professional scrutiny. We will now briefly examine how such vision is socially organized in two tasks faced by archaeologists: 1) color classification and 2) Map making, and then look at how such professional vision was both constructed and contested in the trial of four policemen charged with beating an African American motorist, Mr. Rodney King. The key evidence at the trial was a videotape of the beating.
Color Classification as Historically Structured Professional Practice
As part of the work involved in excavating a site, archaeologists make maps showing relevant structure in the layers of dirt they uncover. In addition to artifacts, such as stone tools, archaeologists are also interested in features, such the remains of an old hearth or the outlines of the posts that held up a building. Such features are typically visible as color differences in the dirt being examined (e.g., the remains of a cooking fire will be blacker than the surrounding soil, and the holes used for posts will also have a different color from the soil around them). Field archaeologists thus face the task of systematically classifying the color of the dirt they are excavating. The methods they use to accomplish this task constitute a form of professional visual practice. As demonstrated by the discussion of Lynch’s analysis of scientific representation, and the brief description of the oceanographers, crucial work in many different occupations takes the form of classifying and constructing visual phenomena in ways that help shape the objects of knowledge that are the focus of the work of a profession (e.g., architects, sailors plotting courses on charts, air traffic controllers, professors making graphs and overheads for talks and classes, etc.). Such professional vision constitutes a perspicuous site for systematic study of how different kinds of phenomena intersect to organize a community’s practices of seeing.
Goodwin (1996, in press) describes how archaeologists code the color of the dirt they are excavating through use of a Munsell chart. The following shows two
archaeologists performing this task, the Munsell page that they are using, and the coding form where they will record their classification:
page18image2368 page18image2800
17 Pam: 18
19 Jeff: 20

21 Pam:
En this one. ((Points at color patch)) (0.4) ((Jeff moves trowel))
nuhhh? (1.8)
Or that one? ((Points at color patch))
Within this scene are a number of different kinds of phenomena relevant to the organization of visual practice, including tools that structure the process of seeing and classification, and documents that organize cognition and interaction in the current setting while linking these processes to larger activities and other settings. These archaeologists are intently examining the color of a tiny sample of dirt because they have been given a coding form to fill out. That form ties their work at this site to a range of other settings, such as the offices and lab of the senior investigator, where the form being filled in here will eventually become part of the permanent record of the excavation, and a component of subsequent analysis. The multivocality of this form, the way in which it displays on a single
surface the actions of multiple actors in structurally different positions, is shown visually in vivid fashion by the contrast between the printed coding categories, and the hand written entries of the field workers.
The use of a coding form such as this to organize the perception of nature, events, or people within the discourse of a profession carries with it an array of perceptual and cognitive operations that have far reaching impact. Coding schemes distributed on forms allow a senior investigator to inscribe his or her perceptual distinctions into the work practices of the technicians who code the data. By using such a system a worker views the world from the perspective it establishes. Of all the possible ways that the earth could be looked at, the perceptual work of field workers using this form is focused on determining the exact color of a minute sample of dirt. They engage in active cognitive work, but the parameters of that work have been established by the classification system that is organizing their perception. In so far as the coding scheme establishes an orientation toward the world, a work-relevant way of seeing, it constitutes a structure of intentionality whose proper locus is not the isolated, Cartesian mind, but a much larger organizational system, one that is characteristically mediated through mundane bureaucratic documents such as this form.
Rather than standing alone as self-explicating textual objects, forms are embedded within webs of socially organized, situated practices. In order to make an entry in the slot provided for color an archaeologist must make use of another tool, the set of standard color samples provided by a Munsell chart. This chart incorporates into a portable physical object the results of a long history of scientific investigation of the properties of color.
The Munsell chart being used by the archaeologists contains not just one, but three different kinds of sign systems for describing each point in the color space it provides: 1) a set of carefully controlled color samples arranged in a grid to demonstrate the changes that result from systematic variation of the variables of Hue , Chroma and Value used to define each color (each page displays an ordered set of Value and Chroma variables for a single hue); 2) numeric coordinates for each row and column, the intersection of what specifies each square as a pair of numbers (e.g., 4/6 on the 10YR Hue page); and 3) standard color names such as “dark yellowish brown” (these names are on the left facing
page which is not reproduced here). Moreover these systems are not precisely equivalent to each other. For example several color squares can fall within the scope of a single name.
Why does the Munsell page contain multiple, overlapping representation of what is apparently the same visual entity (e.g., a particular choice within a larger set of color categories)? The answer seems to like in the way that each representation as a semiotic field with its own distinctive properties makes possible alternative operations and actions, and thus fits into different kinds of activities. Both the names and numbered grid coordinates can be written, and thus easily transported from the actual excavation to the other work sites, such as laboratories and journals, that constitute archaeology as a profession. The numbers provide the most precise description, and do not require translation from language to language. However locating the color indexed by the coordinates requires that the classification be read with a Munsell book at hand. By way of contrast the color names can be grasped in a way that is adequate for most practical purposes by any competent speaker of the language used to write the report. The outcome of the activity of color classification initiated by the empty square on the coding form is thus a set of portable linguistic objects that can easily be incorporated into the unfolding chains of inscription that lead step by step from the dirt at the site to reports in the archaeological literature. However, as arbitrary linguistic signs produced in a medium that does not actually make visible color, neither the color names nor the numbers, allow direct visual comparison between a sample of dirt and a reference color. This is precisely what the color patches and viewing holes make possible. In brief, rather than simply specifying unique points in a larger color space, the Munsell chart is used in multiple overlapping activities (comparing a reference color and a patch of dirt as part of the work of classification, transporting those results back to the labe, comparing samples, publishing reports, etc.), and thus represents the “same” entity, a particular color, in multiple ways, each of which makes possible different kinds of operations because of the unique properties of each representational system.
In addition to its various sign systems it also contains a set of circular holes, positioned so that one is adjacent to each color patch. To classify color the
archaeologist puts a small sample of dirt on the tip of a trowel, puts the trowel directly under the Munsell page and then moves it from hole to hole until the best match with an adjacent color sample is found. With elegant simplicity the Munsell page with its holes for viewing the sample of dirt on the trowel juxtaposes in a single visual field two quite different kinds of spaces: 1) actual dirt from the site at the archaeologists’ feet is framed by 2) a theoretical space for the rigorous, replicable classification of color. The latter is both a conceptual space, the product of considerable research into properties of color, and an actual physical space instantiated in the orderly modification of variables arranged in a grid on the Munsell page. The pages juxtaposing color patches and viewing holes that allow the dirt to be seen right next to the color sample provide an historically constituted architecture for perception, one that encapsulates in a material object theory and solutions developed by earlier workers at other sites faced with the task of color classification. By juxtaposing unlike spaces, but ones relevant to the accomplishment of a specific cognitive task, the chart creates a new, distinctively human, kind of space. It is precisely here, as bits of dirt are shaped into the work relevant categories of a specific social group, that “nature” is transformed into culture.
How are the resources provided by the chart made visible and relevant within talk-in-interaction? At line 17 Pam moves her hand to the space above the Munsell chart and points to a particular color patch while saying “En this one.” Within the field of action created by the activity of color classification, what Pam does here is not simply an indexical gesture, but a proposal that the indicated color might be the one they are searching for. By virtue of such conditional relevance (Schegloff 1968) it creates a new context in which reply from Jeff is the expected next action. In line 19 Jeff rejects the proposed color. His move occurs after a noticeable silence in line 18. However that silence is not an empty space, but a place occupied by its own relevant activity. Before a competent answer to Pam’s proposal in line 17 can be made, the dirt being evaluated has to be placed under the viewing hole next to the sample she indicated, so that the two can be compared. During line 18 Jeff moves the trowel to this position. Because of the spatial organization of this activity, specific actions have to be performed before a relevant task, a color comparison, can be competently performed. In brief, in
this activity the spatial organization of the tools being worked with, and the sequential organization of talk in interaction interact with each other in the production of relevant action (e.g. getting to a place where one make an expected answer requires rearrangement of the visual field being scrutinized so that the judgment being requested can be competently performed). Here socially organized vision requires embodied manipulation of the environment being scrutinized.
It is common to talk about structures such as the Munsell chart as “representations.” However exclusive focus on the representational properties of such structures can seriously distort our understanding of how such entities are embedded within the organization of human practice. With its viewholes for scrutinizing samples, the page is not simply a perspicuous representation of current knowledge about the organization of color, but a space designed for the ongoing production of particular kinds of action.
We will now look at how a group of archaeologists make a map. This process will allow us to examine the interface between seeing, writing practices, talk, human interaction and tool use (see Goodwin 1994 for more detailed analysis).
Map Making and the Practices of Seeing it Requires
Maps are central to archaeological practice. The professional seeing required to produce and make use of a visual document, such as a map, encompasses not only the image itself but also the ability to competently see relevant structure in the territory being mapped, mastery of appropriate tools, and on occasion the ability to analyze the work-relevant actions of another’s body. These different kinds of phenomena can be brought together within the temporally unfolding process of human interaction used to accomplish the activity of making a map. In the following, two archaeologists are making a map to record what they have found in a profile of the dirt on the side of one of the square holes they have excavated. Before actually setting pen to paper some relevant events in the dirt, such as the boundary between two different kinds of soil, are highlighted by outlining them with the tip of a trowel. The structure visible in the dirt is then mapped on a sheet of graph paper. Typically this task is done by two participants working together. One uses a pair of rulers (one laid horizontally on the surface, and the other a hand held tape measure used to measure depth
beneath the surface) to measure the length and depth coordinates of the points in the dirt that are to be transferred to the map, and then speaks these coordinates as pairs of numbers (e.g., “at fifteen three point two)”. The second person plots the points specified on the graph paper, and draws lines between successive measurements. What we find here is a small activity system that encompasses talk, writing, tools and distributed cognition as two parties collaborate to inscribe events they see in the earth onto paper. Here Ann, the party drawing the map, is the senior Archaeologist at the site, and Sue, the person making measurements is her Student:
Line Drawn Surface With Trowel
1 Ann: 2
4 Ann: 5

  1. 7  Sue:
  2. 8  Ann:
  1. 10  Sue:
  2. 11  Ann:
12 13
Give me the ground surface over here to about ninety.
No- No- Not
at ninety. From you to about ninety.
Wherever there's a change in slope.
(0.6) °Mm kay.
See so if its fairly flat
I'll need one
where it
stops being fairly flat.
14 Sue: page24image14064Okay.
15 Ann:
Like right there.
The sequence to be examined begins with a directive. Ann, the writer, tells Sue the measurer, to “Give me the ground surface over here to about ninety.” However before Sue has produced any numbers, indeed before she has said anything whatsoever, Ann in lines 4 and 5 challenges her, telling her that what she is doing is wrong: ”No- No- Not at ninety. From you to about ninety.”
Directives are a classic form of speech action that sociolinguists have used to probe the relationship between language and social structure, and in particular issues of power and gender. Here Sue formats both her directive and her correction in very strong, direct “aggravated” fashion. No forms of mitigation are found in either utterance, and Ann is not given an opportunity to find and correct the trouble on her own. Directives formatted in this fashion have frequently been argued to display a hierarchical relationship, i. e., Ann is treating Sue as someone that she can give direct, unmitigated orders to. And indeed Ann is a professor and Sue is her student.
Issues of power do not however exhaust the social phenomena visible in this sequence. Equally important are a range of cognitive processes that are as socially organized as the relationships between the participants. For example, in that Sue has not produced an answer to the directive, how can Ann see that there is something wrong with a response that has not even occurred yet? Crucial to this process is the phenomenon of conditional relevance first described by Schegloff (1968). Basically a first utterance creates an interpretive environment that will be used to analyze whatever occurs after it. Here no subsequent talk has yet been produced. However, providing an answer in this activity system encompasses more than talk. Before speaking the set of numbers that counts as a proper next bit of talk, Sue must first locate a relevant point in the dirt and measure its coordinates. Both her movement through space, and her use of tools such as a tape measure, are visible events. As Ann finishes her directive Sue is holding the tape measure against the dirt at the left or zero end of the profile. However, just after hearing “ninety” Sue moves both her body and the tape measure to right, stopping near the “90” mark on the upper ruler. By virtue of the field interpretation opened up through conditional relevance, Sue’s movement and tool use can now be analyzed by Ann as elements of the activity she has been asked to perform, and found wanting. Sue has moved immediately
to ninety instead of measuring the relevant points between zero and ninety. The sequential framework created by a directive in talk thus provides resources for analyzing and evaluating the visible activity of an addressee’s body interacting with a relevant environment.
Additional elements of the cognitive operations and kinds of seeing that Ann requires from Sue in order to make her measurements are revealed as the sequence continues to unfold. Making the relevant measurements presupposes the ability to locate where in the dirt measurements should be made. However Sue’s response calls this presupposition into question and leads to Ann telling her explicitly, in several different ways, what she should look for in order to determine where to measure. After Ann tells Sue to measure points between zero and ninety, Sues does not immediately move to points in that region but instead hesitates for a full second before replying with a weak “°Oh” (line 7). Ann then tells her what she should be looking for “Wherever there’s a change in slope” (line 8). This description of course presupposes Sue’s ability to find in the dirt what will could as “a change in slope.” Sue again moves her tape measure far to the right. At this point, instead of relying upon talk alone to make explicit the phenomena that she wants Sue to locate, Ann moves into the space that Sue is attending to and points to one place that should be measured while describing more explicitly what constitutes a change in slope: “See so if it’s fairly flat I’ll need one where it stops being fairly flat like right there.”
One of the things that is occurring within this sequence is a progressive expansion of Sue’s understanding as the distinctions she must make to carry out the task assigned to her are explicated and elaborated. In this process of socialization through language there is a growth in intersubjectivity as domains of ignorance that prevent the successful accomplishment of collaborative action are revealed and transformed into practical knowledge, a way of seeing, that is sufficient to get the job at hand done, such that Sue is finally able to understand what Ann is asking her to do (that is to see the scene in front of her in a manner that permits her to make an appropriate, competent response to the directive). It would however be wrong to see the unit within which this intersubjectivity is lodged as simply these two minds coming together in the work at hand. Instead the distinction being explicated, the ability to see in the very complex perceptual
field provided by the landscape they are attending to, those few events that count as points to be transferred to the map, are central to what it means to see the world as an archaeologist, and to use that seeing to build the artifacts, such as this map, which are constitutive of archaeology as a profession. Such seeing would be expected of any competent archaeologist. It is an essential part of what it means to be an archaeologist, and it is these professional practices of seeing that Sue is being held accountable to. The relevant unit for the analysis of the intersubjectivity at issue here the ability of separate individuals to see a common scene in a congruent, work-relevant fashion is thus not these individuals as isolated entities, but instead archaeology as a profession, a community of competent practitioners, most of whom have never met each other, but who nonetheless expect each other to be able to see and categorize the world in ways that are relevant to the work, scenes, tools and artifacts that constitute their profession.
The phenomena examined so far provide some demonstration of how what is to be seen in a map, scene, human body or image stands in a reflexive relationship to other semiotic structures that participants are using to constitute visual phenomena as a relevant component of the events and activities that make up their lifeworld. These structures include language, the constitution of action and context provided by sequential organization, and ways of seeing events and using images of different types that are lodged within the practices of particular social communities, such as the profession of archaeology.
Professional Vision in Court
Parties who are not competent members of relevant social communities can lack the ability, and/or the social positioning, to see and articulate visual events in a consequential way. These issues were made dramatically visible in the trial of four Los Angeles policemen who were recorded on videotape administering a beating to an African American motorist, Mister Rodney King, whom they had stopped after a high speed pursuit triggered by a traffic violation. When the tape of the beating was shown on national television there was outrage, and even the head of the Los Angeles police department thought that conviction of the officers was almost automatic. However, at their first trial (they were later tried again in Federal rather than state court for violating Mister King’s civil rights) all four
policemen were acquitted, a verdict that triggered an uprising in the city of Los Angeles, with neighborhoods being burned, federal troops being called in, etc. The crucial evidence at the trial was a visual document: the videotape of the beating. Rather than transparently proving the guilt of the policemen who were seen on it beating a man lying prone on the ground, the tape in fact provided the policemen’s lawyers with their evidence for convincing the jury that their clients were not guilty of any wrongdoing. They did this by using language, pointing and expert testimony to structure how the jury saw the events on the tape in a way the exonerated the policemen. In essence they used the tape of the beating to demonstrate that Mr. King was the aggressor, not the policemen, and that the policemen were following proper police practice for subduing a violent, dangerous suspect (see Goodwin 1994 for more detailed analysis of such professional vision). Crucial to their success was their use of another policeman, Sargent Duke, as an expert witness. It was argued that laymen could not properly see the events on the tape. Instead, the ability to legitimately see what the body of a suspect was doing, such as Mr. King’s as he lay on the ground being beaten, and specifically whether the suspect was being aggressive or compliant, was lodged within the work practices of the social group charged with arresting suspects: the police. The ability to see such a body, and code it in terms of its aggressiveness, was a component of the professional practices that policemen use to code the events that are the focus of their work. It so far as such vision is a public component of the work practices of a particular social group, someone who wasn’t present but who is a member of the profession, a policeman, can make authoritative statements about what can be legitimately seen on the tape. However, while policemen constitute a socially organized profession, suspects and victims of beatings don’t. Therefore there is no one with the social standing, i.e., membership and mastery of the practices of a relevant social group, to act as an expert witness to articulate what was happening from Mister King’s perspective.
What was to be seen on the tape was structured through the way in which different semiotic fields, such as structure in the stream of speech, pointing which highlighted specific places and phenomena in the image being looked at, and events in the image itself, mutually elaborated each other to provide a
construal of events that served the purposes of the party articulating the image. The following provides an example. At the point where we enter this sequence the prosecutor has noted that Mr. King appears to be moving into a position appropriate for handcuffing him, and that one officer is in fact reaching for his handcuffs, i.e. the suspect is being cooperative.
1 Prosecutor: 2

  1. 4  Sgt. Duke:
  2. 5  Prosecutor:
  3. 6  Sgt. Duke:
9 10 11 12 13
So uh would you,
again consider this to be:
a nonagressive, movement by Mr. King? At this time no I wouldn't. (1.1)
It is aggressive.
Yes. (0.9)
This foot, is laying flat, (0.8)
this leg (0.4)
in his butt (0.4)
The buttocks area has
which would put us,

indicates that Sgt. Duke is pointing on the screen at the body part described in his talk.
It's starting to be.
starting to be a bend.
in uh (0.6)
started to rise.
(0.7) again.
at the beginning of our spectrum
By noting the submissive elements in Mr. King’s posture, and the fact that one of the officers is reaching for his handcuffs the prosecutor has shown that the tape demonstrates that Mr. King is being cooperative. If he can establish this point hitting Mr. King again would be unjustified, and the officers should be found guilty of the crimes they are charged with. The contested vision being debated here has very high stakes.
To rebut the vision proposed by the prosecutor, Sgt. Duke uses the semantic resources provided by language to code as aggressive extremely subtle body movements of a man lying face down beneath the officers (lines 7-11). Note for example not only line 13’s explicit placement of Mr. King at the very edge, the beginning, of an aggressive spectrum introduced in earlier testimony, but also
how very small movements are made much larger by situating them within a prospective horizon through the repeated use of “starting to” (lines 6, 8, 13). The events visible on the tape are structured, enhanced and amplified by the language used to describe them.
This focusing of attention organizes the perceptual field provided by the videotape into a salient figure, the aggressive suspect, who is highlighted against an amorphous background containing nonfocal participants, the officers doing the beating. Such structuring of the materials provided by the image is accomplished not only through talk, but also through gesture. As Sergeant Duke speaks he brings his hand to the screen and points to the parts of Mr. King’s body that he is arguing display aggression. The pointing gesture and the perceptual field which it is articulating mutually elaborate each other. The touchable events on the television screen provide visible evidence for the description constructed through talk. What emerges from Sgt. Duke’s testimony is not just a statement, a static category, but a demonstration built through the active interplay between coding scheme and the image to which it is being applied. As talk and image mutually enhance each other a demonstration that is greater than the sum of its parts emerges, while simultaneously Mr. King, rather than the police officers becomes the focus of attention as the expert’s finger articulating the image delineates what is relevant within it.
By virtue of the category system erected by the defense, the minute rise in Mr. King’s buttocks noted on the tape unleashes a cascade of perceptual inferences that have the effect of exonerating the offers. A rise in Mr. King’s body becomes interpreted as aggression, which in turn justifies the escalation of force. Like other parties, such as the archaeologists, faced with the task of coding a visual scene, the jury was led to engage in intense, minute cognitive scrutiny as they looked at the tape of the beating to decide the issues at stake in the case. However, once the defense coding scheme is accepted as a relevant framework for looking at the tape, the operative perspective for viewing it is no longer a layperson’s reaction to a man lying on the ground being beaten, but instead a micro-analysis of the movements being made by that man’s body to see if it is exhibiting aggression.
In the first trial, though the prosecution disputed the analysis of specific body movements as displays of aggression, the relevance of looking at the tape in terms of such a category system was not challenged. A key difference in the second trial, which led to the conviction of two of the officers, was that there the prosecution gave the jury alternative frameworks for interpreting the events on the tape. These included ways of seeing the movements of Mr. King’s body that Sgt. Duke highlighted as normal reactions of a man to a beating rather than as displays of incipient aggression. In the prosecution’s argument Mr. King “cocks his leg,” not in preparation for a charge, but because his muscles naturally jerk after being hit with a metal club.
The study of the practices used to structure relevant vision in scientific and workplace environments, what Hutchins (1995) has called Cognition in the Wild, has become the focus of considerable research. A major initiative for such studies was provided by Lucy Scuhman in the early 1990’s when she initiated the Workplace project while at Xerox PARC. The site chosen for research was ground operations at a mid-sized airport. Documents and images of many different types, and the ability of actors in alternative structural positions to see and analyze events in relevant ways, were crucial to the work of the airport. Phenomena that received extensive study included work relevant seeing of documents, airplanes and events (Goodwin and Goodwin 1996), the constitution of shared workspaces (Suchman 1996), the study of how a common document coordinated different kinds of work in different work settings, and the practices involved in seeing and shaping phenomena in collaborative work (Suchman and Trigg 1993, Brun-Cottan 1991). In part because of the central role played by visual phenomena in the work being analyzed, the project’s final report was submitted as a videotape (Brun-Cottan et al. 1991). Subsequent analysis growing from this project has focused on the organization of both documents and visual phenomena in a range of occupational settings, such as law firms and the work of architects. In England Christian Heath and his collaborators have investigated the structuring of vision within interaction in a range of settings including the control room for the London Underground, centers for the production of electronic news, art classes, etc. (Heath and Luff 1992, 1996, Heath and Nicholls 1997). In much of this research there is a focus on how core practices for the
organization of talk, reference, gesture and other phenomena central to the production of action within human interaction can encompass not only talk but also embodiment in a world populated by work-relevant objects. Hindermaash and Heath (in press) investigate reference within such a framework. LeBaron (1998), Streeck (1996) and LeBaron and Streeck (in press) examine how gesture emerges from the interaction of working hands acting in the world in settings such as architect’s meetings and auto body shops. Robinson (1998) has provided analysis of participants in medical interviews organize their interaction by attending to how gaze is shifted from other participants to relevant visual materials in the setting, such as medical records. Whalen (1995) analyzed how the talk of operators responding to emergency 911 calls was organized in part by the task of filling in required information on a computer screen with a specific visual organization. Rogers Hall and Reed Stevens have investigated visual practices in a range of school, scientific and occupational settings (Hall and Stevens 1995; Stevens and Hall in press). Research in Computer Supported Cooperative Work has focused specifically on new forms of visual access created by electronic media. Heath (in press) and Heath and Luff (1993) have done considerable research on interaction mediated through video, demonstrating the crucial ways in which resources available to parties who are actually co-present to each other are not available in media spaces. Yamazaki and his colleagues,have explored the systematic problems that arise when particular kinds of directives, such as instructions for how to use CPR to start a heart attack victim’s heart, are given through talk alone, for example over the phone, without access to a relevant visual environment. Patients usually die, since the novice is not able to place his or her hands at the appropriate spot on the patient’s body. To remedy some of these issues technologies that incorporate basic resources available for doing reference in face-to-face interaction, such as pointing, have been developed. These include a remote controlled car with a laser that has the ability to move while clearly marking the specific places being pointed at in a remote environment (Yamazaki et al. 1999). Nishizaka (in press) has investigated how participants coordinate gaze both spatially and temporally on electronic documents such as computer screens. Kawatoko (in press) has investigated how lathe workers organize perceptual fields so to make visible the invisible
movements of their cutting tools.. Both Kwatoko and Ueno (in press) have examined the organization of vision on many different levels (from documents to systematic placement of objects on the warehouse floor as part of its work flow) in the work practices of large warehouse. In all of this work practices for seeing relevant phenomena are systematically embedded within processes of social organization, structures of mutual accountability and the organization of activity.
Within both Conversation Analysis and Ethnomethodology visual phenomena have been analyzed by investigating how they are made meaningful by being embedded within the practices that participants in a variety of settings use to construct the events and actions that make up their lifeworld. This has led to the detailed study of a range of quite different kinds of phenomena, from the interplay between gaze, restarts and grammar in the building of utterances within conversation, to the construction and use of visual representations in scientific practice, to how the ability of lawyers to shape what can be seen in the videotape of policemen beating a suspect can contribute to disruption of the body politic that leaves a city in flames, to the part played by visual practices in both traditional and electronic workplaces. Visual phenomena that have received particular attention include 1) the body as a visible locus for displays of intentional orientation through both gaze and posture; 2) the body as a locus for a variety of different kinds of gesture, from iconic elaboration on what is being said in the stream of speech, to pointing, to the hand as an agent engaged with the world around it; 3) visual documents of many different types used in both scientific practice and the workplace, e.g., maps, graphs, Munsell charts, coding forms, schedules, television screens providing access to distant sites, architectural drawings, computer screens, etc. 4) material structure in the environment where action and interaction are situated. This perspective brings together within a common analytic framework both the details of how the visible body is used to build talk and action in moment to moment interaction, and the way in which historically structured visual images and features of a setting participate in that process. Rather than standing alone as self-contained, self- explicating images, visual phenomena become meaningful through the way in
which they help elaborate, and are elaborated by, a range of other semiotic fields sequential organization, structure in the stream of speech, encompassing activities, etc. that are being used by participants to both construct and make visible to each other relevant actions. The focus of analysis is always on how the participants in a setting themselves display a consequential orientation to visual phenomena (e.g., by shifting gaze after a restart, focusing their work on a Munsell chart, building images as a core component of the practices used to make visible scientific phenomena, etc.). A variety of different methodologies are employed. However a basic component of many research projects includes going to the site where the activities being investigated are actually performed, and examining what the participants are doing there as carefully as possible. Videotape records are frequently most useful because of the way in which they preserve limited but crucial aspects of the spatial and environmental features of a setting, the temporal unfolding organization of talk, the visible displays of participants’ bodies, and changes in relevant phenomena in the setting as relevant courses of action unfold. Analysis typically requires not only viewing the tape, ethnographic records and documents collected in a setting, but also the construction of new visual representations such as transcripts of many different types (note how some in this paper incorporate both detailed transcription of the talk and a variety of different kinds of graphic representations). While this analysis sheds much important new light on how visual phenomena are organized through systematic discursive practice, it is not restricted to vision per se but is instead investigating the more general practices used to build action within situated human interaction.
References Cited
Brun-Cottan, Françoise
1991 Talk in the Work Place: Occupational Relevance.
Research on Language
and Social Interaction 24:277-295. Brun-Cottan, Françoise, et al.
1991 The Workplace Project: Designing for Diversity and Change. Video produced by Xerox Palo Alto Research Center.
Chomsky, Noam
Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press.
Goodwin, Charles
1979 The Interactive Construction of a Sentence in Natural Conversation. In

Everyday Language: Studies in Ethnomethodology. George Psathas, ed. Pp.
97-121. New York: Irvington Publishers. Goodwin, Charles
1980a Restarts, Pauses, and the Achievement of Mutual Gaze at Turn- Beginning. Sociological Inquiry 50:272-302.
Goodwin, Charles
Conversational Organization: Interaction Between Speakers and Hearers.
New York: Academic Press. Goodwin, Charles
1984 Notes on Story Structure and the Organization of Participation. In Structures of Social Action. Max Atkinson and John Heritage, eds. Pp. 225-246. Cambridge: Cambridge University Press.
Goodwin, Charles
1986 Gesture as a Resource for the Organization of Mutual Orientation.

Semiotica 62(1/2):29-49. Goodwin, Charles
1994 Professional Vision. American Anthropologist 96(3):606-633. Goodwin, Charles
1995 Seeing in Depth. Social Studies of Science 25:237-274. Goodwin, Charles
1996 Practices of Color Classification. Ninchi Kagaku (Cognitive Studies: Bulletin of the Japanese Cognitive Science Society) 3(2):62-82.
Goodwin, Charles
in preparation Pointing as Situated Practice. In
Pointing: Where Language,
Culture and Cognition Meet. Sotaro Kita, ed. Goodwin, Charles
inpress ActionandEmbodimentWithinSituatedHumanInteraction.Journalof Pragmatics .
Goodwin, Charles, and Marjorie Harness Goodwin
1987 Concurrent Operations on Talk: Notes on the Interactive Organization

of Assessments. IPrA Papers in Pragmatics 1, No.1:1-52.
Goodwin, Charles, and Marjorie Harness Goodwin
1996 Seeing as a Situated Activity: Formulating Planes. In
Cognition and
Communication at Work. Yrjö Engeström and David Middleton, eds. Pp.
61-95. Cambridge: Cambridge University Press. Goodwin, Marjorie Harness
1980b Processes of Mutual Monitoring Implicated in the Production of Description Sequences. Sociological Inquiry 50:303-317.
Goodwin, Marjorie Harness, and Charles Goodwin
1986 Gesture and Coparticipation in the Activity of Searching for a Word.

Semiotica 62(1/2):51-75. Hall, Rogers, and Reed Stevens
1995 Making Space: A Comparison of Mathematical Work in School and Professional Design Practices. Sociological Review :118-275.
Haviland, John B.
1993 Anchoring, iconicity, and orientation in Guugu Yimidhirr pointing

gestures. Journal of Linguistic Anthropology 3(1):3-45. Heath, Christian
1986 Body Movement and Speech in Medical Interaction. Cambridge: Cambridge University Press.
Heath, Christian
inpress VirtualLooking:SpatialTransformationandCommunicative

Asymmetries. In Proceedings of the Colloquiem on the Semiotics of Space. P.
Pelligrino, ed. Geneva: University of Geneva. Heath, Christian, and Paul Luff
1993 Disembodied Conduct:Interactional Asymmetries in Video-Mediated Communiation. In Technology in Working Order. Graham Button, ed. Pp. 35-54. London and New York: Routledge.
Heath, Christian, and Paul Luff
1996 Convergent Activities: Line Control and Passenger Information on the

London Underground. In Cognition and Communication at Work. Yrjö Engeström and David Middleton, eds. Pp. 96-129. Cambridge: Cambridge University Press.
Heath, Christian, and Gillian Nicholls
1997 Animating Texts: Selective Readings of News Stories. In Discourse, Tools and Reasonsing: Essays on Situated Cognition. Lauren B. Resnick, Roger Säljö, Clotilde Pontecorvo, and Barbara Burge, eds. Pp. 63-86. Berlin, Heidelberg, New York: Springer.
Heath, Christian C., and Paul K. Luff
1992 Crisis and Control: Collaborative Work in London Underground

Control Rooms. Journal of Computer Supported Cooperative Work 1(1):24-
Hindmarsh, Jon, and Christian Heath

inpress TheInteractionalPracticeofReference.JournalofPragmatics Hutchins, Edwin
1995 Cognition in the Wild. Cambridge MA: MIT Press. Kawatoko, Yasuko
inpress OrganizaingMultipleVision.Mind,CultureandActivity. Kendon, Adam
1990a Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge: Cambridge University Press.
Kendon, Adam
1990b Spatial Organization in Social Encounters: The F-Formation System. In

Conducting Interaction: Patterns of Behavior in Focused Encounters. Adam
Kendon, ed. Pp. 209-238. Cambridge: Cambridge University Press. Kendon, Adam
1994 Introduction to the Special Issue: Gesture and Understanding in Social Interaction. Research on Langauge and Social Interaction 27(3):171-174.
Kendon, Adam
1997 Gesture.
Annual Review of Anthropology 26:109-128.
Latour, Bruno
1986 Visualization and Cognition: Thinking with Eyes and Hands.

Knowledge and Society: Studies in the Sociology of Culture Past and Present
6:1-40. Latour, Bruno
1987 Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press.
Latour, Bruno, and Steve Woolgar
Laboratory Life: The Social Construction of Scientific Facts. London: Sage.
LeBaron, Curtis
1998 Building Communication: Architectural Gestures and the Embodiment

of Ideas. Dissertation submitted in Partial Fulfillment of the Requirments for the Degree of Doctor of Philosophy, The University of Texas at Austin
LeBaron, Curtis D., and Jürgen Streeck
inpress Gestures,Knowledge,andtheWorld.InGesturesinAction,Language,

and Culture. David McNeill, ed. Cambridge: Cambridge University
Press. Lynch, Michael
1990 The Externalized Retina: Selection and Mathematization in the Visual Documentation of Objects in the Life Sciences. In Representation in Scientific Practice. Michael Lynch and Steve Woolgar, eds. Pp. 153-186. Cambridge MA: MIT Press.
Lynch, Michael, and Steve Woolgar
1990 Introduction: Sociological Contributions to Representational Practice in

Science. In Representation in Scientific Practice. Michael Lynch and Steve
Woolgar, eds. Pp. 1-18. McNeill, David
1992 Hand & Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press.
Nishizaka, Aug
inpress SeeingWhatOneSees:Perception,EmotionandActivity.Mind,

Culture and Activity . Ochs, Elinor
1979 Transcription as Theory. In Developmental Pragmatics. Elinor Ochs and Bambi B. Schieffelin, eds. Pp. 43-72. New York: Academic Press.
Robinson, Jeffrey David
1998 Getting Down to Business: Talk, Gaze, and Body Orientation During

Openings of Doctor-Patient Consultations. Human Communication Research 25(1):97-123.
Sacks, Harvey, Emanuel A. Schegloff, and Gail Jefferson
1974 A Simplest Systematics for the Organization of Turn-Taking for

Conversation. Language 50:696-735. Schegloff, Emanuel A.
1968 Sequencing in Conversational Openings. American Anthropologist 70:1075-1095.
Schegloff, Emanuel A.
1998 Body Torque.
Social Research 65(3):535-596.
Stevens, Reed, and Rogers Hall
inpress DisciplinedPerception:LearningtoseeinTechnoscience.In

Mathematical Talk and Classroom Learning: What, Why and How. Magdelene Lampert and Merrie Blunk, eds. Cambridge: Cambridge University Press.
Streeck, Jürgen
1988 The Significance of Gesture: How it is Established.
IPRA Papers in
Pragmatics 2(1):60-83. Streeck, Jürgen
1993 Gesture as Communication I: Its Coordination with Gaze and Speech. Communication Monographs 60(4):275-299.
Streeck, Jürgen
1994 Gestures as Communication II: The Audience as Co-Author.
on Langauge and Social Interaction 27(3):223-238. Streeck, Jürgen
1996 How to Do Things with Things. Human Studies 19:365-384. Suchman, Lucy
1992 Technologies of Accountability: Of Lizards and Airplanes. In Technology in Working Order: Studies of Work, Interaction and Technology. Graham Button, ed. Pp. 113-126. London: Routledge.
Suchman, Lucy
1996 Constituting Shared Workspaces. In
Cognition and Communication at
Work. Yrjö Engeström and David Middleton, eds. Pp. 35-60.
Cambridge: Cambridge University Press. Suchman, Lucy, and Randy Trigg
1993 Artificial Intelligence as Craftwork. In Understanding Practice: Perspectives on Activity and Context. Seth Chaiklin and Jean Lave, eds. Pp. 144-178. Cambridge: Cambridge University Press.
Ueno, Naoki
inpress TechnologiesofMutualAccountabilityofSociety,SocialOrganization,

and Activity for Collaborative Activity. Mind, Culture and Activity . Whalen, Jack
1995 A Technology of Order Production: Computer-Aided Dispatch in Public Safety Communications. In Situated Order: Studies in the Social Organization of Talk and Embodied Action. Paul ten Have and George Psathas, eds. Pp. 187-230. Washington D.C.: University Press of America.
Yamazaki, Keiichi, et al.
1999 gestureLaser and GestureLaser Car: Cevelopment of an Embodied

Space to Support Remote Instruction. In Proceedings of the European Conference on Computer-Supported Cooperative Work, 12-16 September, 1999, Copenhjaven, Denmark. Susanne Bødker, Morten Kyng, and Kjeld Schmidt, eds. Pp. 239-259. Dordrecht/Boston/London: Kluwer Academic publishers.