3PERCEPTIONRecognizing Patternsand ObjectsCHAPTER OUTLINEGestalt Approaches to PerceptionBottom-Up ProcessesTemplate MatchingFeatural AnalysisPrototype MatchingTop-Down ProcessesPerceptual LearningThe Word Superiority EffectA Connectionist Model of Word PerceptionDirect PerceptionDisruptions of Perception: Visual AgnosiasLook across the room right now and notice the objects you see. Ifyou are looking out a window, maybe you see some trees or bushes,perhaps a bicycle or car, a person walking or a group of childrenplaying.What you’ve just done, cognitively speaking, is an amazing achievement:You’ve taken sensory input and interpreted it meaningfully, in a process knownas perception. In other words, you have perceived patterns, objects, people,and possibly events in your world. You may not consider this achievement at allremarkable—after all, you do it every day. However, computer scientists tryingto create artificially intelligent systems have discovered just how complicatedthe process of perception is. Neuroscientists have estimated that the areas ofour brain responsible for visual processing occupy up to half of the total cortexspace (Tarr, 2000).The central problem of perception is explaining how we attach meaning to the sensory information we receive. In the example just given, you received and somehowinterpreted a great deal of sensory information: You “saw” certain objects as trees,people, and so forth. You recognized certain objects—that is, saw them as thingsyou had seen before. The question for cognitive psychologists is how we manage toaccomplish these feats so rapidly and (usually) without error.38
The vast topic of perception can be subdivided into visualperception, auditory perception, olfactory perception,haptic (touch) perception, and gustatory (taste) perception. For the purposes of this chapter, we will concentrateon visual and auditory perception—in part to keep ourdiscussion manageable and in part because those twoare the kinds of perception psychologists study most.From time to time, however, we will also look at examplesof other kinds of perception to illustrate different points.Notice that when you look at an object, you acquirespecific bits of information about it, including its location, shape, texture, size, and (for familiar objects)name. Some psychologists—namely, those workingin the tradition of James Gibson (1979)—would arguethat you also immediately acquire information aboutthe object’s function. Cognitive psychologists seek todescribe how people acquire such information andwhat they then do to process it.Several related questions suggest themselves. Howmuch of the information we acquire through perceptiondraws on past learning? How much of our perceptiondo we infer, and how much do we receive directly?What specific cognitive processes enable us to perceiveobjects (and events, and states, and so on)? Where canthe line be drawn between perception and sensation,which is the initial reception of information in a specificsensory modality—vision, hearing, olfaction? Wherecan the line be drawn between perception and otherkinds of cognition, such as reasoning or categorization?Clearly, even defining perception so as to answer thesequestions is a challenge.For the present, we will adopt what might be called the“classic” approach to defining perception. Figure 3.1illustrates this approach for visual perception. Out in thereal world are objects and events—things to be perceived—such as this book or, as in my earlier example,trees and shrubs. Each such object is a distal stimulus.For a living organism to process information about thesestimuli, it must first receive the information through oneor more sensory systems—in this example, the visualsystem. The reception of information and its registrationby a sense organ make up the proximal stimulus. In ourearlier example, light waves reflect from the trees andcars to your eyes, in particular to a surface at the backof each eye known as the retina. There, an image of thetrees and cars, called the retinal image, is formed. Thisimage is two-dimensional, and its size depends on yourdistance from the window and the objects beyond (thecloser you are, the larger the image). In addition, theimage is upside down and is reversed with respect to leftand right.The meaningful interpretation of the proximalstimulus is the percept—your interpretation that thestimuli are trees, cars, people, and so forth. From theupside-down, backward, two-dimensional image, youquickly (almost instantaneously) “see” a set of objectsyou recognize. You also “recognize” that, say, thegiant oak tree is closer to you than are the lilac shrubs,which appear to recede in depth away from you. ThisPercept (recognitionof object as a book)Distal stimulus(book)Proximal stimulus(retinal image of book)Figure 3.1: Distal stimuli, proximal stimuli, and percepts.Chapter 3: Perception: Recognizing Patterns and Objects39
information is not part of the proximal stimulus; somehow, you must interpret the proximal stimulus to knowthis information.Although researchers studying perception disagree aboutmuch, they agree that percepts are not the same thingsas proximal stimuli. Consider a simple demonstration ofsize constancy. Extend your arm away from your bodyand look at the back of your hand. Now, keeping the backof your hand facing you, slowly bring it toward you a fewinches, then away from you. Does your hand seem to bechanging size as it moves? Probably not, although thesize of the hand in the retinal image is most certainlychanging. The point here is that perception involvessomething other than the formation of retinal images.Related to perception is a process called patternrecognition. This is the recognition of a particular object,event, and so on, as belonging to a class of objects,events, and so on. Your recognition of the object youare looking at as belonging to the class of things called“shrubs” is an instance of pattern recognition. Becausethe formation of most percepts involves some classification and recognition, most, if not all, instances ofperception involve pattern recognition.We will begin by considering proposals from the Gestaltschool of psychology that perception involves thesegmentation, or “parsing,” of visual stimuli into objectsand backgrounds (and just how complicated this seemingly easy process is). We will then turn to examinesome (mostly) bottom-up models of perception. Thenwe will examine phenomena that have led manycognitive psychologists to argue that some top-downprocesses must occur in interaction with bottom-upprocessing. We will examine some neurological findingspertaining to object perception and will also consider aconnectionist model of word perception.We will also review a very different view: work inspiredby J. J. Gibson (1979) on “direct perception.” Gibson’sview departs from most other theories of perception inthat he claims perceivers actually do little “processing”of information, either bottom-up or top-down. Instead,he believes the information available in the world is sufficiently rich that all the perceiver needs to do is detector “pick up on” that information. We will conclude bylooking at some neuropsychological work on patientswho have an inability to perceive (but have intact visualabilities) to illustrate just what the process of perceptionis all about.GESTALT APPROACHES TO PERCEPTION.When stimuli occur close to one another in space and in time, they may groupperceptually into coherent, salient patterns or wholes. Such Gestalts, as they arecalled, abound in our perceptual world, as when leaves and branches cluster into trees,and when trees merge into forests; when eyes, ears, noses and mouths configure intofaces; when musical notes coalesce into chords and melodies; and when countlessdots or pixels blend into a photograph.The resulting wholes may have properties their component parts lack, such asthe identity or expression on a face that is unrecognizable from any one part, orthe key in which a melody is played that cannot be deduced from any single note.Understanding how parts combine into perceptual wholes was recognized as acentral challenge in perceptual theory nearly 100 years ago . . .—Pomerantz & Portillo, 2011, p. 1331One of the most important aspects of visual perception has to do with how we interpretstimulus arrays as consisting of objects and backgrounds. Consider, for instance,Figure 3.2. This stimulus pattern can be seen in two distinct ways: as a landscapewith two people standing in the lower right or as a baby framed by black lines.This segregation of the whole display into objects (also called the figure) and thebackground (also called the ground) is an important process known to cognitivepsychologists as figure-ground organization.40COGNITIVE PSYCHOLOGY IN AND OUT OF THE LABORATORY
Reversible figures aren’t just for perceptual psychologists,either! The artist Salvador Dali exploits the existenceof reversible figures in his work The Slave Market WithDisappearing Bust of Voltaire, shown in Figure 3.3.The segregation of figure from ground has manyconsequences. The part of the display seen as figureis seen as having a definite shape, as being some sortof “thing,” and is better remembered than the partof the display interpreted as ground, which is seen asmore shapeless, less formed, and farther away in space(Brown & Deffenbacher, 1979). Form perception is acognitive task most of us perform quickly and easilyand thus take for granted. We assume, intuitively, thatwe perceive objects and backgrounds because therereally are objects and backgrounds and all we do issee them.Figure 3.2: Find the baby in the branches of this tree. This is aclever, modern illustration of a reversible figure: When you see the“baby,” the branches become background; when you see the treeand people, the “baby” disappears into the background.Figure 3.3: Salvador Dali, The Slave Market With Disappearing Bust of Voltaire. The two nuns standing in the archway atleft-center reverse to form a bust of Voltaire. The painting exploits the reversible figures phenomenon.Chapter 3: Perception: Recognizing Patterns and Objects41
But consider Figure 3.4. Almost everyone sees this figure as consisting of two triangles,overlaid so as to form a six-pointed star. The corners of the top triangle are typicallyseen as resting on three colored circles. Now look closely at the figure, in particular atthe top triangle. Recall that a triangle is defined as a closed geometric figure that hasthree sides. Notice that in the figure itself there are no sides. There is only white spacethat you, the viewer, interpret as a triangle. You, the viewer, are somehow adding thethree sides or contours.Figure 3.4: Subjective, orillusory, contours.Gregory (1972), who studied this phenomenon (called illusory or subjective contours),believes that this relatively complex display is subject to a simplifying interpretationthe perceiver makes without even being aware of making it: A triangle is lying ontop of other parts of the figure and blocking them from view. The point here is thatthis perception is not completely determined by the stimulus display; it requires theperceiver’s active participation.A number of individuals in the early part of the 20th century—among them MaxWertheimer, Kurt Koffka, and Wolfgang Köhler—were deeply interested in howperceivers come to recognize objects or forms. As we saw in Chapter 1, these researchers,who formed the Gestalt school of psychology, were particularly concerned with howpeople apprehend whole objects, concepts, or units. TheGestalt psychologists believed that perceivers followcertain laws or principles of organization in coming to theirinterpretations. They asserted that the whole, or Gestalt,is not the same as the sum of its parts. To put it anotherway, Gestalt psychologists rejected the claim that werecognize objects by identifying individual features orparts; instead, we see and recognize each object or unit asa whole.(A)(B)(C)(D)What are the Gestalt principles of perceptual organizationthat allow us to see these wholes? The complete list istoo long to explore here (see Koffka, 1935), so we willexamine only five major principles. The first is theprinciple of proximity, or nearness. Look at Figure 3.5(A).Notice that you tend to perceive this as a set of rowsrather than as a set of columns. This is because theelements within rows are closer than the elements withincolumns. Following the principle of proximity, we grouptogether things that are nearer to each other.Figure 3.5(B) illustrates the principle of similarity. Noticethat you perceive this display as formed in columns(rather than rows), grouping together those elements thatare similar.(E)(F)Figure 3.5: Gestalt principles of perceptual organization:(A) the principle of proximity; (B) the principle of similarity;(C) and (D) the principle of good continuation; (E) the principleof closure; and (F) the principle of common fate.42COGNITIVE PSYCHOLOGY IN AND OUT OF THE LABORATORYA third principle, the principle of good continuation, depictedin Figure 3.5(C), states that we group together objects whosecontours form a continuous straight or curved line. Thus wetypically perceive Figure 3.5(C) as two intersecting curvedlines and not as other logically possible elements, such asthose shown in Figure 3.5(D).
We encounter the fourth principle, the principle of closure, when we look at subjectivecontours in Figure 3.4. Figure 3.5(E) illustrates this principle more exactly. Note that weperceive this display as a rectangle, mentally filling in the gap to see a closed, complete,whole figure.The fifth principle, the principle of common fate, is difficult to illustrate in a staticdrawing. The idea is that elements that move together will be grouped together, asdepicted in Figure 3.5(F). You can construct a better demonstration of this principleyourself (Matlin, 1988). Take two pieces of transparent plastic (such as report coverscut in half). Glue some scraps of paper on each. Lay one sheet upside down on top ofthe other, and you will have a hard time telling which sheet of plastic any particularscrap is on. Now move one sheet, holding the other still. You will suddenly see twodistinct groups of scraps.Most of the Gestalt principles are subsumed under a more general law, the law ofPrägnanz (Koffka, 1935). This law states that of all the possible ways of interpretinga display, we will tend to select the organization that yields the simplest and moststable shape or form. Thus, simple and symmetric forms are seen more easilythan more complicated and asymmetric forms. This law may help to explain ourexperience of Figure 3.4 with subjective contours. Because the phantom “triangle”forms a simple, symmetric form, we “prefer” to interpret the pattern as if thetriangle were there.In recent work, psychologists James Pomerantz and Mary Portillo (2011) are tryingto dig deeper into the principles underlying what makes a Gestalt. They focus onthe property of emergence in perception—the idea that “qualitative differences . . . [in apercept] appear as parts are added, such that wholes take on properties that are novel,unpredictable, even surprising” (p. 1331).To demonstrate the property of emergence, Pomerantz and Potillo (2011) use anodd-quadrant discrimination task, depicted in Figure 3.6. Consider the top leftmost box(called the base display) containing four letters. The task of the research participantis to identify the stimulus that differs from the other three. In this case, it is the letterB. The second box in the row presents a contextual stimulus (in this case, the letterC), that is added to each stimulus in the base display to produce the stimuli in thecomposite display (the top rightmost box). Experimenters compare the length of timeit takes a participant to correctly identify the “odd” stimulus (e.g., the B in the basedisplay or the BC in the composite display) in the base to the length of time it takesin the composite display.Although there are many good reasons to predict it will take longer with the compositedisplays (e.g., more information to process, more stimuli to distract attention), withsome specific stimuli—the opposite result occurs. That is, perception of the “oddstimulus out” is faster in the composite stimulus display than in the base stimulusdisplay (this is called a configural superiority effect, or CSE). In fact, the secondand fourth rows of Figure 3.6 yield just such a pattern; the odd stimulus seems to“pop” out more dramatically in the composite display than it does in the base display.Pomerantz and Potillo (2011) believe that CSEs demonstrate Gestalt groupingprinciples, but in such a way as to make the strength of different principles measureableand comparable.Chapter 3: Perception: Recognizing Patterns and Objects43
CC 44((((BaseCOGNITIVE PSYCHOLOGY IN AND OUT OF THE LABORATORY ((((Context (((((((((((( (((AC BC ((C((C(Figure 3.6: Theodd-quadrant discriminationtask. Top row shows aschematic odd-quadrantdiscrimination task.Participants only see baseand composite displays, notcontext alone. A, B, and Care symbols standing for anystimulus component. Thesame base stimuli producesconfigural superiority effects(CSEs) in rows 2 and 4 but notin rows 3 or 5. This shows thatemergent features depend onthe context added.B (AAC AC(A(A(Composite
Many researchers of visual perception consider the Gestalt principles fundamental(Tarr, 2000; van den Berg, Kubovy, & Shirillo, 2011). Investigators have demonstratedthe use of some Gestalt principles by infants as young as 3 to 6 months (Quinn,Bhatt, Brush, Grimes, & Sharpnack, 2002). Moreover, fMRI studies of thevisual cortex activity during perception of CSEs are beginning to show neuralcorrelates of the Gestalt grouping principles in action (Kubilius, Wagemans, & Opde Beeck, 2011).BOTTOM-UP PROCESSES.Psychologists studying perception distinguish between bottom-up and top-downprocesses. The term bottom-up (or data-driven) essentially means that the perceiverstarts with small bits of information from the environment and combines them in variousways to form a percept. A bottom-up model of perception and pattern recognitionmight describe your seeing edges, rectangular and other shapes, and certain lightedregions and putting this information together to “conclude” you are seeing the sceneoutside your window. That is, you would form a perception from only the informationin the distal stimulus.In top-down (also called theory-driven or conceptually driven) processing, the perceiver’sexpectations, theories, or concepts guide the selection and combination of theinformation in the pattern-recognition process. For example, a “top-down” descriptionof the scene-outside-your-window example might go something like this: You knew youwere in your dorm room and knew from past experience approximately how close tothe window the various trees, shrubs, and other objects were. When you looked in thatdirection, you expected to see trees, shrubs, walkways with people on them, a street withcars going by, and so on. These expectations guided where you looked, what you lookedat, and how you put the information together.In this section, we will focus on bottom-up models. The idea here is that the systemworks in one direction, starting from the input and proceeding to a final interpretation.Whatever happens at a given point is unaffected by later processing; the system has noway of going back to an earlier point to make adjustments.To picture bottom-up processing, imagine a row of students seated at desks. Thestudent in the last seat of the row starts the process by writing a word on a piece ofpaper and handing the paper to the student in front of her. That student adds someinformation (maybe another word, maybe an illustration) and, in turn, hands the paperto the student in front of him, and so on, until the paper reaches the student at the frontof the row. Students at the front of the row have no opportunity to ask students behindthem for any clarification or additional information.When psychologists speak of bottom-up perceptual processes, they typicallyhave in mind something that takes information about a stimulus (by definition a“lower” level of processing) as input. Bottom-up processes are relatively uninfluencedby expectations or previous learning (the so-called higher-level processes). Posnerand Raichle (1994) argue that bottom-up processes involve automatic, reflexiveprocessing that takes place even when the perceiver is passively regarding theinformation. In this section, we will consider three distinct examples of bottom-upmodels of perception.Chapter 3: Perception: Recognizing Patterns and Objects45
TEMPLATE MATCHINGFigure 3.7 shows a copy of a9000check. Notice the numbersat the bottom of the check.2099-0000/9999These numbers encodePAY TO THE certain information aboutORDER OFa checking account—theDOLLARSaccount number, the bankthat manages it, and soforth. These numbers mayMEMOlook funny to you, but theywouldn’t look at all funnyto machines known as checksorters, such as those theFederal Reserve banks useFigure 3.7: A sample bank check. Note the numbers at the bottom.to sort checks and deliverthem to the correct banks forpayment. These machines “read”the numbers and compare them to12345678910previously stored patterns, calledtemplates. The machines “decide”which number is represented bycomparing the pattern to thesetemplates, as shown in Figure 3.8. Atour of your local Federal Reservebank would convince you that this4system works most impressively.You can think of a template as akind of stencil—one of the artsupplies you probably owned asa child. If you remember, thosestencils let you trace as manycopies as you wanted of the same thing. Templates work like stencils in reverse. Anunknown incoming pattern is compared to all of the templates (stencils) on hand andidentified by the template that best matches it.Figure 3.8: Illustration of template matching. The input “4” is compared either seriallyor simultaneously with all of the available templates. The match to “4” is the best.As a model of perception, template matching works this way: Every object, event, orother stimulus that we encounter and want to derive meaning from is compared tosome previously stored pattern, or template. The process of perception thus involvescomparing incoming information to the templates we have stored and looking for amatch. If a number of templates match or come close, we need to engage in furtherprocessing to sort out which template is most appropriate. Notice that this modelimplies that somewhere in our knowledge base we’ve stored millions of differenttemplates—one for every distinct object or pattern we can recognize.As may already be apparent to you, template-matching models cannot completelyexplain how perception works. First, for such a model to provide a complete explanation,we would need to have stored an impossibly large number of templates. Second, astechnology develops and our experiences change, we become capable of recognizingnew objects such as DVDs, laptop computers, and smartphones. Template-matchingmodels thus have to explain how and when templates are created and how we keep trackof an ever-growing number of templates.46COGNITIVE PSYCHOLOGY IN AND OUT OF THE LABORATORY
A third problem is that people recognize many patterns as more orless the same thing, even when the stimulus patterns differ greatly.Figure 3.9 illustrates this point. I constructed this figure by having14 people write the sentence “Cognitive psychology rocks!” intheir own handwriting. You can read each sentence despite thewide variation in the size, shape, orientation, and spacing of letters.How can a template-matching model explain your recognitionthat all 14 people have written the “same” sentence? In everydaylife, much of the stimulus information we perceive is far fromregular, whether because of deliberate alteration, degradation, oran unfamiliar orientation (compare an overturned cup or bicyclewith one that is right side up). Is a separate template needed foreach variation? And how is the perceiver to know whether an objectshould be rotated or otherwise adjusted before she tries to matchit to a template? Remember, matching information to templatesis supposed to tell the perceiver what the object is. The perceivercan’t know ahead of time whether an input pattern should beadjusted before she tries to match it to different templates, becausepresumably the perceiver does not yet know what the object is!So although some technology uses template matching, we probablydon’t rely heavily on such a process in our everyday perception.Template matching works only with relatively clean stimuliwhen we know ahead of time what templates may be relevant.It does not adequately explain how we perceive as effectively aswe typically do the “noisy” patterns and objects—blurred or faintletters, partially blocked objects, sounds against a background ofother sounds—that we encounter every day.Figure 3.9: Handwriting samples.FEATURAL ANALYSISAs I write, I’m staring down at one of my dogs, curled up under the table. I’m able torecognize not only her but also certain parts of her: ears, muzzle, tail, back, paws, chest,and eyes to name just a few. Some psychologists believe such analysis of a whole intoits parts underlies the basic processes used in perception. Instead of processing stimulias whole units, we might instead break them down into their components, using ourrecognition of those parts to infer what the whole represents. The parts searched forand recognized are called features. Recognition of a whole object, in this model, thusdepends on recognition of its features.Such a model of perception—called featural analysis—fits nicely with someneurophysiologic evidence. Some studies of the retinas of frogs (Lettvin, Maturana,McCullogh, & Pitts, 1959) involved implanting microelectrodes in individual cells ofthe retina. Lettvin et al. found that specific kinds of stimuli could cause these cells tofire more frequently. Certain cells responded strongly to borders between light anddark and were called “edge detectors”—edge because they fired when stimulated by avisual boundary between light and dark, detectors because they indicated the presenceof a certain type of visual stimulus. Others responded selectively to moving edges, andothers, jokingly called “bug detectors,” responded most vigorously when a small, darkdot (much like an insect) moved across the field of vision. Hubel and Wiesel (1962,1968) later discovered fields in the visual cortexes of cats and monkeys that respondedselectively to moving edges or contours in the visual field that had a particularChapter 3: Perception: Recognizing Patterns and Objects47
orientation. In other words, they found evidence of separate “horizontal-line detectors”and “vertical-line detectors,” as well as other distinct detectors.How does this evidence support featural analysis? Certain detectors appear toscan input patterns, looking for a particular feature. If that feature is present, thedetectors respond rapidly. If that feature is not present, the detectors do not respondas strongly. Each detector, then, appears designed to detect the presence of just onekind of feature in an input pattern. That such detectors exist, in the form of eitherretinal or cortical cells, confirms the applicability of the featural analysis model.Irving Biederman (1987)proposes a theory of objectperception that uses a typeGeonof featural analysis that isalso consistent with someof the Gestalt principlesGeonof perceptual organizationdiscussed earlier. Biedermansuggests that when peopleview objects, they segmentFigure 3.10: Some examples of geons.them into simple geometriccomponents, called geons.Biederman posits a total of 36 such primitive components, some of which are pictured inFigure 3.10. From this base set of units, he believes, we can construct mentalrepresentations of a very large set of common objects. He makes an analogy betweenobject and speech perception: From the 44 phonemes, or basic units of sound, in theEnglish language, we can represent all the possible words in English (a number wellinto the hundreds of thousands). Likewise, Biederman argues, from the basic set of 36geons, we can represent the thousands of common objects we can quickly recognize.As evidence for his theory (called “recognition bycomponents”), Biederman offers Figure 3.11, a line drawingof a fictional object probably none of us has ever seen.Nonetheless, we would all show surprising agreement overwhat the “parts” of the unknown object are: a central “box,”a wavy thing at the lower left, a curved-handled thing onthe lower right, and so on. Biederman believes the sameperceptual processes we use to divide this unknown figureinto parts are used for more familiar objects. We divide thewhole into the parts, or geons (named for “geometrical ions”;Biederman, 1987, p. 118). We pay attention not just to whatgeons are present but also to the arrangement of geons. AsFigure 3.12 shows, the same two geons combined in differentways can yield very different objects.It is worth noting that not all perception researchers acceptthe notion of geons as fundamental units of object perception.Tarr and Bülthoff (1995), for example, present a complex butinteresting competing proposal.Figure 3.11: A fictional object.48COGNITIVE PSYCHOLOGY IN AND OUT OF THE LABORATORYOther research has provided additional evidence of featuralprocessing in perception. For example, flashing letters on a
computer screen for very brief intervalsof time typically results in certainpredictable errors. For example, peopleare much more likely to confuse a G witha C than with an F. Presumably this isbecause the letters C and G share certainfeatures such
some (mostly) bottom-up models of perception. Then we will examine phenomena that have led many cognitive psychologists to argue that some top-down processes must occur in interaction with bottom-up processing. We will examine some neurological findings pertaining to object perception and will also consider a connectionist model of word perception.