Pose estimation has become an increasingly important area in computer vision and more specifically in human facial recognition and activity recognition for surveillance applications. Pose estimation is a process by which the rotation, pitch, or yaw of a human head is determined. Numerous methods already exist which can determine the angular change of a face, however, these methods vary in accuracy and their computational requirements tend to be too high for real-time applications. The objective of this thesis is to develop a method for pose estimation, which is computationally efficient, while still maintaining a reasonable degree of accuracy. In this thesis, a feature-based method is presented to determine the yaw angle of a human facial pose using a combination of artificial neural networks and template matching. The artificial neural networks are used for the feature detection portion of the algorithm along with skin detection and other image enhancement algorithms. The first head model, referred to as the Frontal Position Model, determines the pose of the face using two eyes and the mouth. The second model, referred to as the Side Position Model, is used when only one eye can be viewed and determines pose based on a single eye, the nose tip, and the mouth. The two models are presented to demonstrate the position change of facial features due to pose and to provide the means to determine the pose as these features change from the frontal position. The effectiveness of this pose estimation method is examined by looking at both the manual and automatic feature detection methods. Analysis is further performed on how errors in feature detection affect the resulting pose determination. The method resulted in the detection of facial pose from 30 to -30 degrees with an average error of 4.28 degrees for the Frontal Position Model and 5.79 degrees for the Side Position Model with correct feature detection. The Intel(R) Streaming SIMD Extensions (SSE) technology was employed to enhance the performance of floating point operations. The neural networks used in the feature detection process require a large amount of floating point calculations, due to the computation of the image data with weights and biases. With SSE optimization the algorithm becomes suitable for processing images in a real-time environment. The method is capable of determining features and estimating the pose at a rate of seven frames per second on a 1.8 GHz Pentium 4 computer.

Library of Congress Subject Headings

Human face recognition (Computer science); Computer vision; Neural networks (Computer science)

Publication Date


Document Type


Student Type


Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)


Andreas Savakis

Advisor/Committee Member

Juan Cockburn

Advisor/Committee Member

Lawrence Ray


Physical copy available from RIT's Wallace Library at TA1650 .S35 2004


RIT – Main Campus