Particle Size and Its Impact on Spectroscopic Calibration: Why Wet Materials Demand a Smarter Approach
Spectroscopy — particularly near-infrared (NIR), mid-infrared (MIR), and Raman spectroscopy — has become an indispensable tool across industries ranging from pharmaceuticals and food science to agriculture and petrochemicals. Its appeal is well founded: rapid, non-destructive analysis capable of simultaneously predicting multiple chemical and physical properties from a single scan.
Yet for all its power, spectroscopy is not immune to the physical realities of the materials it analyses. Among the most consequential and frequently underestimated of these realities is particle size. Whether a material is measured in its dry powder form or suspended in a wet, heterogeneous matrix, the distribution and scale of particles fundamentally shapes the spectral signal — and by extension, the quality of any calibration model built from it.
This article explores how particle size affects spectroscopic calibrations in both dry and wet materials, why wet systems present a uniquely complex challenge, and why overcoming that challenge increasingly demands sophisticated multivariate and machine learning modelling techniques.
The Physics Behind the Problem
Before examining calibration quality, it is worth understanding why particle size matters so profoundly in spectroscopy.
When light interacts with a particulate material, two competing phenomena occur simultaneously: absorption and scattering. Absorption is the desirable signal — it encodes chemical composition. Scattering, by contrast, is a physical phenomenon driven by the size, shape, and refractive index of particles relative to the wavelength of incident light. When particle size is large or variable, scattering dominates and distorts spectra in ways that are difficult to separate cleanly from meaningful chemical variation.
In the NIR region, where most industrial calibrations operate, scattering effects are particularly pronounced. Small particles scatter light more uniformly (approaching the Rayleigh regime), while larger particles produce complex, wavelength-dependent scattering behaviour described by Mie theory. When particle size varies across a sample set — as it almost invariably does in real-world materials — the resulting spectral variation is a convolution of chemistry and physics that a naive calibration model cannot disentangle.
The practical consequences include:
Baseline shifts across the spectrum, mimicking or masking genuine absorbance changes
Multiplicative scatter effects, where the apparent magnitude of peaks scales non-linearly with particle size
Broad, distorted peak shapes, reducing the specificity of otherwise characteristic absorption bands
Reduced signal-to-noise ratio in diffuse reflectance, as scattering alters the path length of light through the sample
Particle Size in Dry Materials: A Manageable, If Persistent, Challenge
For dry materials — powders, granules, lyophilised products, and similar solid-state samples — the particle size problem is well studied and, to a meaningful degree, tractable.
How Particle Size Manifests in Dry Spectra
In diffuse reflectance NIR spectroscopy of dry powders, particle size affects the effective path length of light through the sample. Finer particles pack more densely, increasing the number of scattering events and shortening the average path length. This results in spectra that appear less absorbing, with shifted baselines. Conversely, coarser particles yield longer effective path lengths and stronger apparent absorbance — entirely independently of any change in chemical composition.
For a calibration model built on samples with consistent particle size, this effect is systematic and largely absorbed into the model's intercept. The real danger arises when the particle size distribution varies across the calibration set, or between the calibration set and production samples. In those cases, the model conflates physical and chemical variation, producing predictions that drift with milling consistency, humidity-driven agglomeration, or seasonal raw material changes.
Pre-processing Strategies for Dry Materials
The spectroscopy community has developed a robust toolkit of pre-processing methods to correct for particle size effects in dry materials:
Standard Normal Variate (SNV) normalises each spectrum by its own mean and standard deviation, effectively removing multiplicative scatter effects. It is computationally simple and widely effective for homogeneous powders.
Multiplicative Scatter Correction (MSC) regresses each spectrum against a reference spectrum (typically the mean spectrum of the calibration set), then corrects for the estimated scatter component. It is particularly effective when scatter variation is the dominant source of spectral difference.
Savitzky-Golay derivatives suppress baseline variation by transforming spectral absorbance into its first or second derivative, emphasising the position and curvature of peaks rather than their absolute magnitude.
Extended MSC (EMSC) expands on classical MSC by explicitly modelling physical effects (including particle size-related scattering) as polynomial functions of wavelength, providing more flexible correction across the spectral range.
These methods, combined with careful calibration design that spans realistic particle size variation, allow dry material calibrations to achieve high accuracy and robustness. The challenge is demanding but well-understood.
Particle Size in Wet Materials: A Fundamentally Different Problem
Wet materials — slurries, emulsions, dispersions, pastes, biological fluids, fermentation broths, and soil suspensions — present a categorically more difficult challenge. The reasons are both physical and practical, and they compound in ways that render many of the standard dry-material approaches insufficient on their own.
Why Wet Systems Are More Complex
Dynamic and unstable particle distributions. Unlike a dry powder where particle size, once set by milling, remains relatively stable during measurement, wet systems are dynamic. Particles sediment, aggregate, dissolve, swell, and break up. A sample measured at one moment may have a meaningfully different particle size distribution seconds later. This temporal instability introduces spectral variance that has no equivalent in dry material analysis.
Multiple scattering phases. In a dry powder, the scattering medium is essentially air and the solid particles. In a wet system, there are at minimum two phases — a continuous liquid phase and a dispersed particulate phase — each contributing to absorption and scattering independently. In complex food matrices, fermentation media, or agricultural slurries, there may be five or more contributing phases simultaneously.
Overlapping water absorption. Water is a powerful NIR absorber, with broad, temperature-sensitive overtone and combination bands that dominate much of the 1400–1900 nm region. Particle size variation in the solid phase modulates the effective water path length, introducing correlated variation between water absorbance and particle size that is exceedingly difficult to separate with conventional pre-processing.
Temperature-coupled effects. In wet systems, temperature simultaneously affects the viscosity of the continuous phase (which alters particle dynamics), the water absorption spectrum, and the refractive index contrast between phases (which governs scattering efficiency). A calibration model that does not account for this multi-way coupling will degrade across seasonal temperature variation or process upsets.
Non-uniform sample presentation. In dry materials, a sample cup or probe interface with a relatively uniform reflectance geometry is achievable. In wet systems, the sample presented to the probe continuously varies in local composition, particle arrangement, and optical path — even across repeated measurements of the same bulk material.
Why Standard Pre-processing Falls Short
SNV and MSC, the workhorses of dry material scatter correction, assume that scattering variation is globally multiplicative — that is, the shape of the spectrum is preserved and only its scale changes. In wet systems, this assumption frequently breaks down. The scattering behaviour of a polydisperse emulsion is wavelength-dependent in complex, non-linear ways that a simple multiplicative correction cannot capture. Applying SNV to a wet slurry spectrum may correct part of the variance while introducing new artefacts.
Derivative-based methods fare somewhat better at suppressing slow baseline drift, but they amplify noise — a particular liability when the signal-to-noise ratio is already compromised by heavy scattering in a wet matrix.
The fundamental issue is that the physics of light interaction with wet, particulate systems is governed by a multi-parameter scattering regime that pre-processing methods designed for simpler systems were never intended to handle.
The Case for Sophisticated Modelling Techniques
The inadequacy of conventional pre-processing for wet materials is not merely a technical inconvenience — it has real commercial consequences. Calibrations that degrade with particle size shifts lead to incorrect compositional predictions, failed quality checks, and loss of confidence in the technology. Addressing these challenges requires moving beyond pre-processing into the domain of sophisticated modelling.
Locally Weighted Regression and Locally Adaptive Models
One of the most powerful approaches for wet material calibration is locally weighted regression (LWR) and its variants. Rather than building a single global model relating spectra to composition across all samples, LWR identifies, for each new prediction, the subset of calibration samples most similar to it — in spectral space, accounting for particle size effects — and builds a local model from that subset.
This approach is inherently adaptive to particle size variation: a sample with fine particles will draw calibration neighbours with similarly fine particles, and the local model implicitly accounts for the scattering conditions without requiring them to be explicitly parameterised. LWR has shown particular promise in agricultural and food slurry analysis, where particle size distributions span wide ranges and are poorly controlled.
Artificial Neural Networks and Deep Learning
Artificial neural networks (ANNs), and more recently convolutional neural networks (CNNs) and other deep learning architectures, are well-suited to learning non-linear relationships between spectra and target properties — including the non-linear interactions between particle size variation and compositional signals that defeat linear models.
A CNN operating directly on raw or minimally pre-processed spectra can, given a sufficiently large and representative calibration set, learn to extract chemical information while implicitly suppressing particle size-related variation. The network architecture can be designed to learn invariant representations — spectral features that are stable across particle size variation — rather than relying on the chemical analyst to specify what those features should be.
The limitation is data hunger: deep learning models require large, diverse calibration sets to generalise well. In wet material applications, assembling such a dataset demands sustained investment in reference analysis — but the payoff in model robustness can be substantial.
Ensemble and Hybrid Models
Ensemble modelling — combining predictions from multiple base models, each trained on a different spectral pre-processing or wavelength subset — provides robustness that no single model can match. If one base model is sensitive to particle size via a particular pre-processing pathway and another is relatively insensitive, their ensemble average tends toward the more robust prediction.
Hybrid models that explicitly incorporate physical knowledge — for example, parameterising the Mie scattering contribution as a function of estimated particle size, then using that estimate as an auxiliary variable in the chemometric model — are an active area of research. By encoding physical understanding directly into the model architecture, hybrid approaches can generalise more reliably to particle size conditions outside the calibration range.
Orthogonal Signal Correction and Advanced Spectral Filtering
Orthogonal Signal Correction (OSC) and its successors, including External Parameter Orthogonalisation (EPO), work by identifying and removing spectral variance that is orthogonal to the property of interest. In the context of wet materials, EPO can be used to remove spectral variation associated with known physical interferents — including particle size — by collecting spectra at multiple particle size conditions and explicitly modelling the resulting variation as a subspace to be projected out.
EPO is particularly elegant because it is data-driven yet physically motivated: it removes exactly the variation that is known to be physically, rather than chemically, generated. Applied to wet material calibrations, it has demonstrated substantial improvements in prediction accuracy for constituents buried beneath heavy scattering backgrounds.
Gaussian Process Regression and Probabilistic Models
Gaussian Process Regression (GPR) offers a principled probabilistic framework for calibration that naturally quantifies prediction uncertainty — a critical capability in wet material applications where particle size shifts may move new samples into sparsely calibrated regions of spectral space.
Unlike PLS or MLR, which provide point predictions, GPR returns both a mean prediction and a confidence interval. When a wet sample presents a spectrum that is spectrally distant from the calibration set (as might occur with an unusually coarse grind or an unexpectedly high moisture content), GPR flags this explicitly through widened prediction intervals. This uncertainty quantification enables inline quality systems to distinguish confident predictions from those requiring additional verification.
Practical Implications for Calibration Design
The theoretical considerations above carry direct practical implications for anyone developing spectroscopic calibrations for wet materials.
Span particle size deliberately in the calibration set. The calibration set should include samples representing the full realistic range of particle size distributions expected in production — not just the range observed on a convenient set of reference samples. This may require deliberate manipulation of milling or homogenisation conditions during calibration sample preparation.
Monitor particle size as a process variable. Where possible, particle size should be measured concurrently with spectral data during calibration. This enables explicit modelling of the particle size effect and supports the development of hybrid models.
Invest in model maintenance. Wet material calibrations drift more readily than dry material calibrations, because particle size distributions in production shift with raw material changes, process adjustments, and seasonal variation. Robust protocols for model monitoring, outlier detection, and periodic recalibration are not optional — they are the price of reliable performance.
Consider transfer learning for new products. When extending a calibration from one wet material formulation to another, transfer learning approaches — in which a model trained on one material is fine-tuned with a small number of samples from the new material — can dramatically reduce the reference analysis burden while preserving the robustness of the parent model.
Particle size is not a peripheral concern in spectroscopic calibration — it is one of the most consequential physical variables affecting the quality and generalisability of any model. For dry materials, a combination of thoughtful calibration design and well-established pre-processing methods provides a workable solution. For wet materials, the challenge is qualitatively harder: more dynamic, more physically complex, and less amenable to simple correction strategies.
Meeting this challenge is not a matter of applying a single superior algorithm. It requires a systems-level response: physically informed pre-processing, sophisticated modelling architectures that accommodate non-linearity and heteroscedasticity, probabilistic uncertainty quantification, and ongoing model stewardship. The tools exist — locally weighted regression, neural networks, EPO, Gaussian process models, and ensemble methods — but deploying them effectively demands a level of chemometric and physical understanding that goes well beyond fitting a PLS model to a calibration spreadsheet.
The reward for that investment is substantial: calibrations that hold their accuracy as real-world materials vary, that fail transparently rather than silently, and that unlock the true potential of spectroscopy as a real-time, in-line analytical technology.