# Chemometrics Glossary of Terms

## Education Article

**Published:**Jul 1, 2014**Channels:**Laboratory Informatics / Chemometrics & Informatics

## This glossary is provided with the permission of Bryan Prazen of the

Synovec Group, Department of Chemistry, University of Washington.

Please send comments, suggestions etc. to: bryanp@cpac.washington.edu

**
Absolute Method** - A method in which characterization is based entirely on physically (absolute) defined standards.{1}

**Accuracy** - The closeness of a measurement or the mean of a group of measurements to the accepted value.

**Analyte** - A chemical species contained within a sample, which is identified or quantified.

**Analysis of Variance (ANOVA)** - A mathematical method for separating the variation of a signal into that corresponding to controlled and uncontrolled sources.

**Autoscale** - A data standardization in which data is centered and scaled. The mean of an auto scaled variable is 0 and the variance is 1.

**Bias** - A measure of how far the average measurement lies from the accepted reference value. Errors from chance will cancel each other out after many measurements, those from bias will not. A systematic error inherent in a method or caused by some artifact of idiosyncrasy of the measurement system.{1}

**Bilinear Instruments** - An instrument in which the response function for each sample is a matrix of data and the mathematical rank of the response of pure components are unity in the absence of noise.

**Blank** - The measured value obtained when a specified component of a sample is not present during the measurement.{1}

**Blind Sample** - A sample submitted for analysis whose composition is known to the submitter but unknown to the analyst.

**Calibration** - A mathematical model that relates abstract measurements to known properties, often the properties are concentrations.

**Calibration Data or Set** - Collection of data used to construct a calibration or classification model.

**Centering** - A scale shift of a variable by subtracting the mean.

**Chemometrics** - (1) The art of extracting chemically relevant information from data produced in chemical experiments is given the name of chemometrics - Wold (2) Chemometrics is the chemical discipline that uses mathematical, statistical and other methods employing formal logic (i) to design or select optimal measurement procedures and experiments, and (ii) to provide maximum relevant chemical information by analyzing chemical data. - Massart (3) Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods - International Chemometrics Society (4) A cross-disciplinary approach of using mathematical and statistical methods to extract information from chemical data.{2}

**Correlation Coefficient** - A measure of the degree of linear dependence between two vectors. The correlation coefficient can take values between -1 and +1.

**Cross-Validation** - A calibration model validation process in which a portion of the calibration data is predicted using the remaining portion of the calibration data.

**Degrees of Freedom (df)** - Number of independent measurements that are available for parameter estimation. It generally corresponds to number of measurements minus number of parameters to estimate.{2}

**Detection Limit** - The smallest concentration or amount of some component of interest that can be detected by a single measurement with a stated level of confidence.{1}

**Eigenvalue** - The measure of the magnitude or importance of a derived variable within a multivariate analysis method. The variance in the space of the corresponding eigenvector.

**Eigenvector** - The response variables within a derived variable from a multivariate analysis method. The eigenvectors are numbered in order of decreasing variance or eigenvalues.

**Error** - The difference between a result (or the mean of a set of results) and the accepted value of the parameter.

**Factor** - An element of a data reduction in which many measurements are described by a few independent variables.

**Factor Analysis** - A multivariate data reduction method for the detection of data structure and patterns. A large set of variables are expressed as a small number of linearly independent factors.

**Figure of Merit** - A performance characteristic of a method believed to be useful when deciding its applicability for a specific measurement situation. Typical figures of merit include: selectivity, sensitivity, detection limit, precision, and bias.{1}

**Hetroscedastic Noise** - Noise that changes in magnitude between variables.

**Homoscedastic Noise** - Noise that is constant in magnitude between variables.

**Latent Variable** - Unobserved variables which need not be orthogonal.

**Limit of Detection** -

**Matrix** - A table of scalars.

**Mahalanobis distance** - A statistical distance taking into account the variance of each variable and the correlation coefficients. In the case of a single variable, it is the square of the distance (between two objects, or between an object and the centroid) divided by the variance.

**Multiple Linear Regression (MLR)** - A calibration technique which models response as a linear function of multiple variables.

**Multiplicative Error** - An error which depends on the value of measured signal.

**Multiplicative Scatter Correction (MSC)** - A preprocessing tool that corrects for differences in spectroscopic path lengths or multiplicative variations. This method was developed to correct for large light-scattering problems in reflectance spectroscopy. It is some times refered to as multiplicative signal correction.

**Multivariate Calibration** - A mathematical model that uses many measured variable simultaneously to predict quantitative information.

**Net analyte signal** -

**Neural Network** - A pattern recognition method based on mimicking the function of the biological neural system.

**Normal Distribution ** - A symmetrical distribution with probability density in the form of a Gaussian curve.

**Ordinary Least Squares (OLS)** -

**Outlier** - A value which appears to deviate markedly from that for other members of the data set in which it occurs.

**Overfit model ** - An excessively complex model that is modeling the noise in calibration samples.

**Partial Least Squares** - A multivariate inverse calibration method built from iterative fitting of bilinear models using both the measured values of the standards and quantitative information of the standards.

**Pattern Recognition** - A mathematical method that uses measurements made on a set of samples to classify the samples.

**Precision** - The degree of mutual agreement characteristic of independent measurements as the result of repeated application of the process under specified conditions.{1}

**Predicted residual error sum of squares (PRESS)** - The sum of the squared differences between the observed response and the estimated response obtained from a set of cross-validation regression models.

**Principal Component (PC)** - Orthogonal, maximum variance estimators of the data.

**Principal Component Analysis (PCA)** - A multivariate data reduction method used to detect structure and patterns with in data.

**Principal Components Regression (PCR)** - Multivariate calibration that uses the regression of a matrix of data onto selected principal components.

**Pseudorank** - The rank of the signal contained in a real data matrix if no noise was present.

**Rank** - An ordinal number corresponding to dimensionality of the largest sub-matrix of a matrix that is non-singular.

**Reference Material** - A material or substance, one or more properties of which are sufficiently well established to be used for the calibration of an apparatus, the assessment of a measurement method, or for assigning values to materials.{1}

**Reference Method** - An analytical method which has been specified as being capable, by virtue of recognized accuracy, of providing primary reference data.{1}

**Reference Sample** - Samples, including clean matrix spikes, with known analyte concentration. Used to assess analytical accuracy.

**Reference Standard** - A standard, generally of the highest metrological quality available at a given location, from which measurements made at that location are derived.

**Regression** -

**Residuals** - The part of the signal that is not used by the model.

**Sample** - A portion of a population or lot. It may consist of an individual or group of individuals. It may refer to objects, materials, or measurements, conceivable as part of a larger group that could have been considered.{1}

**Scalar** - a single number.

**Selectivity** - A measure of the degree of overlap of signals from different sources.

**Sensitiviy** - The signal response to an analyte divided by the concentration of the analyte.

**Signal-to-Noise Ratio** -

**Singular Value Decomposition (SVD) ** -

**Standard Addition** - A method in which small increments of a substance under measurement are added to a sample under test to establish a response function, or to determine by extrapolation the amount of a constituent originally present in the sample.{1}

**Standard deviation** - A measure of the dispersion of a series of results around their mean, expressed as the square root of the variance.

**Standard Error of Prediction (SEP)** -

**Standardization** - A transformation of the elements of a data set.

**Training Set** - A data set containing measurements on a set of known samples which is used to develop a calibration.

**Tensor** - A mathematical object. Zero-order tensors are scalars, first-order tensors are vectors and second-order tensors are matrices.

**Test Method** - Defined technical procedure for performing a test.

**Transpose of a Matrix** - The interchanging the rows and columns such that aij of the original becomes aji of the transposed matrix.

**Trilinear Data** -

**Underfit Model ** - An excessively simple model that models less than the optimum amount of the signal.

**Vector** - A row or column of numbers.

**Variance** - The value approached by the average of the sum of the squares or deviations of individual measurements from the limiting mean.{1} The square of the standard deviation.

{1} J. K. Taylor Quality Assurance of Chemical Measurements; Lewis: Chelsea MI, 1987.

{2} W. P. Gardiner Statistical Analysis Methods for Chemist: A Software-based Approach; The Royal Society of Chemistry: Cambridge, 1997.

This glossary is provided with the permission of Bryan Prazen of the

Synovec Group, Department of Chemistry, University of Washington.

Please send comments, suggestions etc. to: bryanp@cpac.washington.edu