This thesis investigated possible features that are used to guide saccadic eye movements in specific tasks, including a visual search task, answering questions about an image, and freely viewing images. Current eyetracking technology was used to gather eye movement records of subjects as they viewed images. A classical experiment that shows the influence of task on eye movements conducted by Alfred Yarbus was replicated under natural viewing conditions. In this experiment, 17 viewers were given a set of different instructions before viewing a Russian painting. Eye movement records were compared between tasks, and it was found that the instruction a viewer is given did affect which regions of the image are fixated. Even though the viewing times in the two experiments were drastically different (3 minutes compared to ~20 seconds), the behaviors of the 17 subjects were remarkably similar to the original record published by Yarbus; regions that were 'informative' for the task were fixated. Behavior between the 17 subjects (within one task) was more similar than between the seven tasks (within one subject). In a second experiment, 23 observers performed a visual search task in images of real-world scenes. Before each trial, the subject was shown a preview image of the target. This image was either pixel-for-pixel exactly as it appeared in the image ('Extracted Object' condition) or was a cartoon icon representation of the target ('Cartoon Icon' condition). On average, the reaction time in finding the target in the Cartoon Icon condition was 3.0 seconds, and less than 2.5 seconds in the Extracted Object condition. This increase in reaction time was caused primarily by the viewer taking longer to initially fixate on the target. Perceptual saliency and other feature content of the images at fixated and random locations were compared to gain insight into what features the visual system was using to guide, and expedite, visual search in each of the two conditions. Several commonly used metrics were used to measure the performance of each of 18 different topographical feature maps. It was found that feature maps that weight areas according to the color and spatial characteristics of the target perform better than general low-level saliency maps, showing that the visual system can be fine-tuned according to the task. However, a general model of visual attention for search in real-world scenes cannot be created using only low-level stimulus properties.

Library of Congress Subject Headings

Saccadic eye movements; Visual perception; Vision

Publication Date


Document Type


Student Type


Degree Name

Imaging Science (MS)


Jeff Pelz

Advisor/Committee Member

Roxanne Canosa

Advisor/Committee Member

Carl Salvaggio


Physical copy available from RIT's Wallace Library at QP477.5 .L57 2004


RIT – Main Campus