Introduction

The polygraph, commonly known as a lie detector, has been a tool of intrigue and controversy since its inception in the early 20th century. Initially conceptualized by William M. Marston, a psychologist and the creator of Wonder Woman, the polygraph was developed to measure physiological responses—such as changes in blood pressure, respiration, and electrodermal activity—that are believed to indicate deception. Despite its widespread use by law enforcement agencies, the legal community, and the private sector, the polygraph’s validity and reliability have been subjects of debate. In particular, the development of automated scoring algorithms for polygraph data aims to create reliable and statistically valid classification schemes that minimize both false positive and false negative rates. This blog post delves into the complexities of polygraph data analysis, based on Aleksandra Slavkovic’s detailed evaluation of polygraph data and automated scoring algorithms.

The Polygraph Examination Process

A polygraph examination is divided into three phases: pre-test, in-test, and post-test. During the pre-test, the examiner explains the theory behind the polygraph and formulates the test questions. This phase is crucial, as experienced polygraphers believe that improper conduct during the pre-test can lead to erroneous results. The in-test phase involves asking the examinee a series of Yes and No questions, while their physiological responses are recorded. The same series of questions is typically repeated multiple times to ensure consistency. Finally, in the post-test phase, the examiner reviews the responses with the examinee, often leading to a confession if the examinee is guilty.

The polygraph measures three main physiological signals: thoracic and abdominal respiration, electrodermal activity (EDR), and cardiovascular signals. These signals are recorded as time series data and are used to deduce whether the examinee is being deceptive or truthful. The premise is that a deceptive person will exhibit stronger physiological responses to relevant questions compared to control questions.

Challenges in Polygraph Data Analysis

One of the primary challenges in polygraph data analysis is the enormous variability in the data. Polygraph examinations vary widely in terms of the subject of investigation, test format, structure, and administration. This variability makes it difficult to develop standardized and generalizable statistical procedures for analyzing polygraph data. Moreover, the quality of the datasets and the assumptions underlying data collection are often overlooked, leading to potential biases and inaccuracies in the results.

In her evaluation, Slavkovic analyzed 149 real-life polygraph cases, focusing on the validity of logistic regression when applied to such a highly variable and small dataset. Logistic regression is a statistical model commonly used in classification problems, where the goal is to separate deceptive from non-deceptive subjects based on their physiological responses. The study highlighted the complexity of polygraph data and the challenges of developing reliable automated scoring algorithms.

Automated Scoring Algorithms

Automated scoring algorithms aim to minimize examiner variability and bias by using statistical models to classify examinees as deceptive or non-deceptive. Two prominent algorithms, PolyScore and the Computerized Polygraph System (CPS), have been developed for this purpose. PolyScore, developed at the Johns Hopkins University Applied Physics Laboratory, uses logistic regression or neural networks to estimate the probability of deception. CPS, on the other hand, employs discriminant analysis and a Bayesian probability calculation.

Both algorithms transform raw polygraph signals into a set of features used for classification. For instance, PolyScore detrends, filters, and baselines the signals before extracting features, whereas CPS retains much of the raw signal. Despite their differences, both algorithms heavily rely on electrodermal activity as the most significant predictor of deception.

However, the study revealed that while these computerized systems have the potential to reduce examiner bias, the evidence supporting their effectiveness is limited. The variability in polygraph data, combined with the lack of standardized data collection procedures, poses significant challenges to the development of universally reliable scoring algorithms.

Statistical Models for Classification

In the context of polygraph data, statistical models for classification and prediction involve estimating the probability that a subject is deceptive based on their physiological responses. Logistic regression is a common choice for such classification problems, as it does not assume a specific distribution of the predictor variables and focuses on observations close to the decision boundary between deceptive and non-deceptive subjects.

The process of developing a logistic regression model for polygraph data involves several steps:

  1. Specifying the Predictor Variables: Identifying the features of the data (e.g., changes in blood pressure, respiration, and electrodermal activity) that will be used as predictors.
  2. Choosing the Functional Form: Deciding on the mathematical form of the relationship between the predictor variables and the response variable (deceptive or non-deceptive).
  3. Feature Selection: Selecting the most relevant features from the predictor space for classification.
  4. Model Fitting: Estimating the parameters of the model using the data.
  5. Validation: Assessing the model’s performance using cross-validation or other validation techniques.

Despite the robustness of logistic regression, the study found that the high dimensionality of the predictor space and the potential for overfitting posed challenges to model generalization. Cross-validation revealed that models with a large number of predictor variables often performed well on the training data but poorly on the test data, indicating that the separation achieved was illusory.

Data Variability and Feature Extraction

One of the most significant sources of variability in polygraph data is the inconsistency in test formats and the lack of standardized question sequences. For instance, the Zone Comparison Test (ZCT) and the Multiple General Question Test (MGQT) differ in the number, type, and order of questions asked, adding to the complexity of data analysis. Additionally, the physiological responses themselves exhibit variability due to factors such as gender, age, and individual differences in emotional responsiveness.

To address this variability, the study focused on feature extraction—converting the continuous polygraph readings into a set of numerical features that capture the emotional signals associated with deception. The study employed three main types of features:

  1. Integrated Differences: The area between the response curves for relevant and control questions.
  2. Latency Differences: The time it takes for a physiological response to reach a certain threshold after a question is asked.
  3. Spectral Properties Differences: High-frequency changes in the physiological signals.

These features were evaluated using logistic regression to determine their effectiveness in distinguishing between deceptive and non-deceptive subjects. The study found that features derived from electrodermal activity and blood volume were the most discriminative, while those from respiration were less effective.

Classification Results and Discussion

The study tested the developed models on an independent test set of 52 subjects, using a probability cutoff of 0.5 to classify subjects as deceptive or non-deceptive. The results showed that the models achieved accuracy rates comparable to those reported by other algorithms, with some models performing better on deceptive subjects and others on non-deceptive subjects. However, the study also highlighted the limitations of the models, particularly in their ability to generalize to new data.

One of the key takeaways from the study is the difficulty of capturing all the variability in polygraph data and producing a model that generalizes well. The study suggested that the development of more complex time series analyses or mixed-effects models could improve the accuracy of polygraph data classification. Additionally, the study emphasized the need for more standardized data collection procedures and further research on the underlying theory of polygraph testing.

Conclusion

The evaluation of polygraph data and automated scoring algorithms reveals the complexity and challenges of developing reliable classification models. While automated systems have the potential to reduce examiner bias, the variability in polygraph data and the lack of standardized procedures make it difficult to achieve consistent accuracy. The study underscores the importance of careful data collection, feature extraction, and model validation in the development of polygraph scoring algorithms. Future research should focus on improving the standardization of polygraph examinations and exploring more sophisticated statistical models to enhance the reliability of polygraph testing.

Source:

Slavkovic, A. (2003). Evaluating Polygraph Data. Technical Report No. 766. Department of Statistics, Carnegie Mellon University. Available at https://www.stat.cmu.edu/tr/tr766/tr766.pdf.

 

 

Secure Your Appointment with Our Fully Certified & Accredited Examiner – Book Online Today!

X