Classification of Gem Materials Using Machine Learning

ABSTRACT
Gemstones traded in markets globally can have extremely high values due to their physical appearance and, in some cases, their scarcity. Gemological laboratories provide identification services to determine gemstone species, growth origin, provenance, and the history of color- or clarity-enhancing treatments, all characteristics that can significantly affect gemstone values. These determinations are primarily conducted through microscopic analysis of the gem material and by identifying characteristic features in spectroscopic data acquired using nondestructive techniques. In addition, mildly destructive methods such as laser ablation–inductively coupled plasma–mass spectrometry may be employed to determine the trace element composition of a gem material. Traditionally, the diagnostic criteria used to determine geographic provenance or color/clarity treatment history are selected by trained specialists based on careful evaluation of large research datasets. Identifying the specific features that are useful for classification purposes can be time-consuming, and subtle diagnostic features may be overlooked. Machine learning enables rapid exploration of large, complex datasets in new ways. Hence, the application of machine learning algorithms may complement existing microscopic, spectroscopic, and geochemical approaches.
This study examines the application of several machine learning models to gemstone classification problems involving natural alexandrite, laboratory-grown diamonds, and natural saltwater pearls. Using machine learning in a number of test cases, the authors achieved classification error rates as low as 5% or less for determination of provenance and detection of color treatment and were able to reduce the number of samples classified as indeterminate using conventional techniques by more than 50%.
The market value of diamonds, pearls, and colored stones can be influenced strongly by the species of the material, its physical features (e.g., cut, color, clarity, and size), and its rarity. However, value can also be driven by other factors, such as the geographic provenance of the material, growth origin (natural or laboratory-grown), or its history of color- or clarity-enhancing treatments. Gemological laboratories offer identification services that provide consumers with this type of information, as it can aid in defining the value of a gem. This process also protects consumers by reducing the number of erroneously or fraudulently traded gems, such as laboratory-grown diamonds that are marketed as natural.
At GIA, geographic provenance and treatment history are determined by comparing unclassified gems against large curated databases of samples from the same species, for which the provenance or treatment history are independently known. Microscopic examination is generally the first stage of analysis, identifying diagnostic physical features (e.g., mineral inclusions indicative of certain localities; Palke et al., 2019a,b). The gem material can also be characterized using a variety of nondestructive analytical techniques including photoluminescence (PL), Raman scattering, ultraviolet/visible/near-infrared (UV-Vis-NIR) absorption, and Fourier-transform infrared (FTIR) absorption spectroscopy. GIA’s traditional approaches to comparison of these data include visual inspection of spectra, identifying peaks and bands as well as their relative intensities, and in some cases using automated software for spectral matching (e.g., the WiRE package for Renishaw Raman devices). For diamonds, these features reflect atomic defects in the lattice, which are created or destroyed during growth and color- or clarity-enhancing treatments (Martineau et al., 2004; D’Haenens-Johansson et al., 2022).
For some colored stones and pearls, minimally destructive methods such as laser ablation–inductively coupled plasma–mass spectrometry (LA-ICP-MS) can determine the elemental composition. This technique removes a small amount of material from the stone, which may leave a tiny crater approximately 55 μm in diameter. These chemical data are compared with gem databases using bivariate scatterplots or “selective plotting,” as research has shown that trace element compositions can indicate geographic origin (e.g., Groat et al., 2019; Palke et al., 2019a,b). Selective plotting is an advanced classification technique that can reduce the compositional overlap between gemstones in bivariate diagrams by filtering the reference data to only those samples with similar elemental concentrations as the unclassified stone, considering multiple trace elements simultaneously. For example, unclassified and reference stones might be plotted on a bivariate diagram using magnesium and iron concentrations; the reference stones shown on the plot would be filtered to those with similar compositions of other elements as well, such as titanium or gallium (see Palke et al., 2019a for a detailed overview). This approach reduces the compositional overlap between gemstones of different provenance when applied to alexandrite, ruby, and sapphire (Sun et al., 2019; Palke et al., 2019a,b).
To determine geographic provenance or treatment history using these approaches, each group of samples must have sufficiently distinct characteristics to allow separation. In practice, however, a single characteristic is generally not diagnostic enough for determination, and multiple forms of evidence are needed. This becomes much more difficult when samples lack a particular set of characteristic features (e.g., a lack of inclusions). When gemological specialists cannot observe clear diagnostic features, even after careful examination of all data, a classification of “undetermined” may be given.
To address this challenge, we investigate machine learning (ML) as a complementary tool to existing classification approaches. The criteria used for gemstone classification at GIA—physical features, spectroscopic characteristics, and elemental compositions—are selected based on years of detailed examination, resulting in recognizing patterns related to gemstone characteristics and their provenance or treatment history. Directing computer algorithms to identify patterns among large datasets, in a manner similar to that of a gem specialist, may provide new insights.
BACKGROUND
Gemological laboratories have acquired spectroscopic and trace element data for millions of diamonds, colored gemstones, and pearls collectively. By carefully inspecting these databases, trained specialists can identify characteristic features that can be used to determine geographic provenance or treatment history for new samples. However, the massive scale of available data, coupled with the complexity of different gem materials and the constantly evolving techniques for growing and treating them (e.g., Eaton-Magaña et al., 2024), make it time-consuming to identify the most useful classification criteria through manual inspection. As a result, subtle diagnostic features may be overlooked. Computers can rapidly parse large datasets, and some software already provides functions that enable users to match features in acquired data (such as Raman spectra) to extensive reference databases. Hence, computers are powerful tools that complement existing traditional approaches.
Artificial intelligence (AI) is the use of computer systems to emulate the way humans think, process information, and learn. ML is a subfield of AI associated with developing algorithms to learn from datasets in order to identify patterns or solve particular problems without requiring explicit programming (Xu et al., 2021). The application of ML in the physical sciences has increased in recent years, with successful uses in fields including the evaluation of the mantle and exploration for diamonds and kimberlites (Dawson and Stephens, 1975; Griffin et al., 2002; Hardman et al., 2018a,b), as well as mineral chemistry and characterization (Schönig et al., 2021; Hazen and Morrison, 2022). The application of ML to gemological problems—including provenance determination—has also increased over time (e.g., Blodgett and Shen, 2011; Luo et al., 2015; Homkrajae et al., 2019; Krebs et al., 2020; Hardman et al., 2022; Bassoo et al., 2023; Bendinelli et al., 2024).
Applying ML to gemological research can yield significant benefits. For example, while the selective plotting approach surpasses bivariate scatterplots for classifying some colored stones, some samples still cannot be classified. ML allows the comparison of even more variables simultaneously and can reveal subtle classification criteria involving multiple variables with complex relationships that may not be apparent using bivariate scatterplots or selective plotting. Using these criteria, ML models can be automated to assist gemologists by suggesting a classification. This opinion can be quite valuable in making a confident judgment. However, some samples will still exhibit too much compositional overlap to be classifiable even with ML.
This study provides an overview of ML approaches to gem classification, including some that are in active use at GIA. We examine the applications of ML to three distinct gem materials for which GIA offers laboratory report services (alexandrite, laboratory-grown diamonds, and natural saltwater pearls) and discuss the advantages of ML over traditional approaches.
MATERIALS
Alexandrite. Alexandrite, a variety of the mineral chrysoberyl (BeAl2O4), has a unique color-change property when exposed to different light sources (for example, from green or green-blue in daylight to red in incandescent lighting; figure 1, left). This property is caused by trace concentrations of Cr3+ replacing Al3+ in the crystal lattice (Gübelin and Schmetzer, 1982). GIA provides services for determining the geographic origin of alexandrite (figure 1, right).
Alexandrite provenance determination is conducted at GIA using microscopic analysis to identify physical features such as mineral inclusions that may be characteristic of different sampling localities (Sun et al., 2019). Trace element compositions offer additional information. While alexandrite from different localities can have very similar compositions, possibly due to similar geological processes, the compositions can also vary significantly between localities (Sun et al., 2019). Data acquired for unknown stones are compared to curated databases of alexandrite with known provenance, using bivariate scatterplots and selective plotting. As it is uncommon for one or even two chemical variables to separate a group of samples from all others, selective plotting offers significant advantages over bivariate techniques by enabling comparison of more variables simultaneously. Despite these techniques, the provenance of some stones remains unclear.
To test the capability of ML to improve alexandrite provenance determination, the authors compiled the concentrations of the trace elements boron, magnesium, vanadium, chromium, iron, gallium, germanium, and tin for samples from mines located in seven countries. These alexandrites, detailed by Sun et al. (2019), represent a portion of the samples in GIA’s colored stone reference collection, consisting of stones with known sampling locality that were loaned to GIA by individual donors. The number of stones from each sampling locality is given in table 1. Trace element compositions were acquired by LA-ICP-MS, originally reported by Sun et al. (2019) and summarized in table 2. We have compiled data for 10 additional samples with chemical compositions yielding inconclusive provenance determinations using traditional bivariate scatterplot and selective plotting approaches. We calibrated new ML models using samples with known provenance and applied these models to classify the population of “undetermined” alexandrites.
CVD-Grown Diamonds. Diamonds can be grown using chemical vapor deposition (CVD) by placing a diamond substrate within a vacuum chamber at moderate temperatures (approximately 700° to 1300°C) and subatmospheric pressures (generally 20–500 mbar; figure 2); see Arnault et al. (2022) and D’Haenens-Johansson et al. (2022) for reviews on CVD growth. CVD-grown diamonds can be treated after growth under high-pressure, high-temperature (HPHT) or low-pressure, high-temperature (LPHT) conditions to improve their color grade. Several of GIA’s grading reports for laboratory-grown diamonds document any evidence of post-growth treatment.
Lattice defects form during CVD growth and may manifest as peaks or bands in PL spectra (e.g., figure 3; Martineau et al., 2004). The post-growth treatment process may create, destroy, or alter the concentration or distribution of lattice defects (figure 3; Martineau et al., 2004; D’Haenens-Johansson et al., 2022). Therefore, treatment can be detected by interpreting PL spectroscopic data. However, the full suite of potential treatments applied to CVD-grown diamonds is not always disclosed by manufacturers, and these processes can change over time. GIA seeks to identify all forms of post-growth treatment. This study assesses the capability of ML in establishing new criteria for the identification of post-growth treatment in CVD-grown diamonds. PL spectra were compiled for 300 as-grown (untreated) and 1,795 HPHT-treated CVD-grown diamonds (D–Z color range), measured during routine analysis of diamonds submitted by clients to GIA. Each determination of treated or untreated was made by trained specialists based on inspection and comparison of PL spectra and surface fluorescence patterns. These spectra were acquired with 514 nm laser excitation at liquid nitrogen temperature (approximately ~77 K) using a Renishaw inVia Raman microscope at 5× magnification. They were previously compiled by Hardman et al. (2022).
Natural Saltwater Pearls. Pearls differ from many other gem materials in that they are formed organically from a mollusk. Shells and pearls are composed of calcium carbonate (CaCO3) together with organic substances called conchiolin, plus a small volume of water. Pearls have a wide variety of trace element compositions that can be inherited from different growth-related processes and the environment in which they formed (Fuge et al., 1993). They can also have nacreous surfaces (featuring a layered structure of aragonite platelets in “brick-and-mortar” formation and commonly with pearly luster) or non-nacreous surfaces (lacking pearly luster; figure 4). Calcite and aragonite are polymorphs, minerals having the same chemical composition (i.e., CaCO3) but different structure. Pearls are produced by a variety of different mollusk species and can form in a variety of freshwater and saltwater environments globally. They can form naturally or be cultured, grown through human intervention. This complex set of features increases the difficulty of evaluating the relationships between pearl compositions and growth conditions.
GIA’s pearl identification services include a determination of natural or cultured identity, mollusk species, saltwater or freshwater formation environment, and the presence of any treatment. ML has been previously applied to determine the provenance of freshwater pearls using trace element compositions (Homkrajae et al., 2019), while pearls formed in saltwater environments have not been tested as thoroughly. To evaluate whether the provenance of saltwater pearls can be determined, the authors compiled trace element compositions measured by LA-ICP-MS for a set of 604 new natural saltwater pearls sampled from Oman, Bahrain, and Kuwait (table 3). Pearls from all three localities were sourced from local pearl divers and classified as C1 samples according to the GIA pearl classification codes (see table 1 in Homkrajae et al., 2021). The Bahrain and Kuwait pearls reportedly originated from Pinctada radiata mollusks from the Persian (Arabian) Gulf, while the Oman pearls were derived from Pinctada margaritifera mollusks in the Gulf of Oman region. Trace element compositions were determined using a Thermo Fisher Scientific iCAP Qc ICP-MS coupled with a New Wave Research UP-213 laser ablation unit using the same analytical parameters as Homkrajae et al. (2019). We analyzed between three and five spots on each sample and report the compositions of six elements (sodium, magnesium, potassium, manganese, strontium, and barium) in table 3.
METHODS
Data Processing. For this study and for statistical analysis in general, datasets need to be reprocessed from their original form into a format that is better suited to a particular statistical method. Trace element concentrations reported in units of parts per million by weight (ppmw) can be used in comparisons, including in bivariate scatterplots. For some statistical models, however, trace elements with very low concentrations—near or below the instrument’s detection limit—must be identified. We replace these values with the respective detection limit for each element so they can be processed using statistical methods. This reduces uncertainties for values near the analytical detection limit, but it is worth noting that the data distribution will be shifted slightly higher than if all such values were replaced with zero. During trace element analysis of pearls and colored gemstones, GIA typically tests three spots on each sample to ensure data quality and to assess the diversity of compositions that some samples may have: Some samples may have significant compositional heterogeneity that cannot be characterized with a single analytical spot.
Spectroscopic data is often presented graphically, with measured intensity (on the y-axis) for a variety of wavelengths (on the x-axis; e.g., figure 5). The spectra may display bands and/or peaks corresponding to particular defects or features in the material being analyzed. Some spectra may have significant “backgrounds,” appearing as broad bands in certain regions of the spectrum (figure 5). To determine the areas of some peaks, this background must first be removed. To reduce the background effects in this study, a modeled baseline was subtracted from each spectrum (figure 5). Background effects can also introduce noise, or generally weak spectroscopic features unrelated to the sample itself. Under some circumstances, noise can be strong enough to be confused with an actual peak. If noise is inadvertently used to calibrate a statistical model, the model outputs may be unpredictable or meaningless, so it is very important to identify these features. We have screened all spectra in this study to identify actual peaks, but note that distinguishing a weak peak from noise may be difficult or impossible in some cases.
For CVD-grown diamonds, peak heights above background levels can be calculated and compared with other spectra, including those from CVD-grown diamonds previously classified as treated or untreated. However, PL spectra are semiquantitative, and peak heights can vary due to factors such as changes in laser power during analysis. To compare peak heights across different spectra, the peak height can be normalized by dividing it by the height of the diamond Raman line (with a peak position of 552.4 nm when measured with a 514 nm laser; figure 3). This line corresponds to diamond’s intrinsic Raman peak at 1332 cm–1.
Data can also be transformed into a different structure to better fit the requirements of particular ML models. In reality, data populations can have nonsymmetrical distributions (figure 6, left), skewing toward high or low values rather than a clear central value (as is expected for normal or Gaussian distributions). For example, iron concentrations in an alexandrite database may have a nonnormal distribution similar to that in figure 6, left. However, some statistical models require that calibration data have approximately symmetrical distributions. Log transformation, which replaces all values in a dataset with their logarithms, can shift a population of nonnormal data closer to normality while preserving the relative distribution of their values (figure 6, right). In addition, variables in some datasets may have very different scales. For example, some elements may have concentrations at the parts per thousand level and others at the parts per million level. These data can be rescaled to improve the comparability of all data by preserving their relative order but adjusting their distributions to cover approximately the same range.
To perform statistical analysis in this study, we imported trace element and spectroscopic data files into the R statistical freeware package (v4.3.2; R Core Team, 2024). Using R, all trace element and spectroscopic data were rescaled and log transformed. Example scripts can be accessed in appendix 1.
Model Calibration and Validation. At GIA, traditional approaches to colored stone classification include the use of bivariate scatterplots and selective plotting to compare trace element compositions. For stones with compositions that overlap with multiple populations using these methods, statistical models can add more classification variables to enhance discrimination. For trace elements, this may mean considering larger numbers of elements, as compositional overlap in analyzing the concentrations of two elements may be reduced by using three or more elements.
An initial step is dataset simplification, particularly if the dataset contains many samples, each with a large number of variables. Principal component analysis (PCA) and linear discriminant analysis (LDA) are techniques that can simplify a dataset by projecting the relationships between many variables onto fewer dimensions for easier interpretation (box A). The dataset can then be visualized using bivariate scatterplots with the transformed variables as axes (see box A), or PCA or LDA can be used to preprocess a dataset for subsequent use in more complex ML models.
The choice of ML model depends on the goals of the user (see box B). Some models are extremely interpretable and intuitive, such as decision trees where the logic of decision-making is provided directly to the user for easy application. Other models are more “black box” in nature, involving complex computational, mathematical, or statistical logic hidden from the user. In these cases, the user inputs data and receives an outcome from an opaque computer model. In general, the more complex models are much more difficult to interpret but also more powerful and capable of producing better results. ML models discussed in this study include random forest (RF), support vector machines (SVM), and artificial neural networks (ANN). Box B provides an overview of all three. Previous research has documented the advantages and disadvantages of these models when applied to different problems, with RF in particular performing extremely well in many different fields (e.g., Fernández-Delgado et al., 2014; Bolton et al., 2019).
Feature selection is an application of ML that objectively analyzes a dataset to identify variables that have a strong influence on classification. The “Boruta” algorithm, for instance, ranks all statistically significant variables in a dataset by testing whether their exclusion leads to a significant decrease in model performance (box B; Kursa et al., 2010). This list can then be inspected more closely by a user. For datasets with hundreds or thousands of variables, feature selection saves significant time and effort.
Validation is extremely important. ML models are often reported with error rates, values that indicate their success in making correct classifications. Ideally, models with low error rates will produce more accurate results when testing new samples. One approach to accurately evaluate model quality is k-fold cross-validation, in which a dataset is partitioned into a random subset for model calibration (e.g., 80%) and the remainder is allocated to testing the model (e.g., 20%). This process is repeated k times, with k chosen by the user. The error rates of all folds are averaged to provide a measure of model quality. By using a larger k value, the average error rate of all folds will generally be less biased toward the results of individual folds (Hastie et al., 2009). ML models can be recalibrated and retested to improve model quality and reduce error rates.
An additional measure of quality is classification confidence, which indicates the probability that an individual result produced by the ML model is accurate. Some ML models will output a classification probability with every decision, indicating the model’s confidence in its own decision. When testing an unclassified sample using an ML model, the user can accept or reject the classification based on this probability value. For classification problems with two possible outcomes, a probability value of approximately 50% indicates low certainty for either outcome, possibly due to the unclassified sample having similar characteristics to both groups. The threshold probability that users will set during decision-making is subjective and may be chosen by comparing with the probabilities assigned to stones with known origin. It is important to consider confidence values before accepting ML results.
GIA’s research department uses R, Python, and MATLAB software to construct scripts and models that are used to classify gem materials. These software tools are computationally complex, with commands written using scripts, and can be difficult for general users to navigate. Therefore, models calibrated using these tools are exported and integrated into existing software already being used for gemological classification. This reduces the complexity of the ML logic and provides laboratory gemologists with the rapid and concise classification outputs they need.
This study examines GIA’s application of PCA, LDA, RF, ANN, SVM, and Boruta to gem-related classification problems. The authors used R to construct all of the machine learning models and to validate each one using k-fold cross-validation. For alexandrite and natural saltwater pearls, we applied PCA, LDA, RF, ANN, and SVM to build new models for provenance determination. For CVD-grown diamonds, the authors applied the Boruta algorithm to identify their spectroscopic features and identify those that could enhance the detection of post-growth treatment.
RESULTS
Alexandrite. PCA was applied to the trace element concentrations for alexandrite in this study; the results are shown in a bivariate scatterplot with principal components 1 and 2 as axes (figure 7). Principal components are linear combinations of the initial variables in a database, serving to summarize and simplify complex datasets; the vectors in figure 7 indicate the relative concentrations of each element and highlight samples with elevated concentrations of each (see box A). The scatterplot reveals separation between several populations of alexandrite from different localities—such as India, Brazil, and Russia—though alexandrite from other localities overlaps to different degrees. Alexandrite from some localities have different compositions than those from other localities; for example, samples from India have relatively elevated vanadium concentrations, and those from Sri Lanka have elevated gallium concentrations (figure 8). Calibrated models using LDA, RF, SVM, and ANN achieved cross-validated error rates <4% for alexandrite provenance determination, within error of one another (table 4). A set of 10 alexandrite samples with elemental compositions previously considered inconclusive using traditional approaches is plotted relative to alexandrite with known provenance using PCA (figure 7) and classified using the four calibrated models (table 5). The inconclusive alexandrite samples fall within areas of the plot where multiple localities overlap. When tested using the four ML models, half of the samples yielded consistent classifications and the remainder produced contradictory results.
CVD-Grown Diamonds. A previous study (Hardman et al., 2022) applied machine learning to the detection of treatment in CVD-grown diamonds using PL spectra with the Raman-normalized peak intensities of 13 different features as variables. The diamond spectra are complex and many other peaks occur, but their value as indicators of treatment has not been assessed. When the Boruta algorithm is applied to feature selection for CVD-grown diamond PL spectra, it produces a ranked list of peaks. Each peak is evaluated for its ability to improve detection of treated or untreated stones when included in the statistical model. We have tabulated the peaks considered most important by Boruta, and these are presented in figure 9 and table 6. For treated CVD-grown diamonds, the majority of these features occur in the range of 520–550 nm in PL spectra.
Natural Saltwater Pearls. There are varying levels of population separation when PCA results are shown using bivariate scatterplots and when pearls are color-coded based on different classification variables (figure 10, A and B). Many pearls reportedly fished from Kuwait and Bahrain (with relatively close spatial proximity) and produced by the same Pinctada radiata mollusk species have similar compositions and overlap; Oman pearls, which were reportedly produced by Pinctada margaritifera mollusks, are separated from Kuwait and Bahrain pearls (figure 10A). The population of Bahrain pearls is broadly bimodal, with one subgroup having strong overlap with Kuwait pearls and the other showing elevated magnesium and manganese concentrations (figure 10A). Moreover, nacreous and non-nacreous pearls differ in composition, with non-nacreous pearls (mostly from Bahrain) having higher magnesium concentrations (figure 10, B and C). The magnesium concentrations of non-nacreous pearls from Bahrain, Kuwait, and Oman largely overlap with each other (figure 10C).
Using models based on LDA, RF, SVM, and ANN, geographic provenance can be determined with model error rates <25%, with RF performing the best (<13% error rate; table 4). For outcomes produced using the RF model, the misclassification of natural saltwater pearls is tabulated in table 7. The most significant misclassifications are between Kuwait and Bahrain pearls: 13% of Bahrain and 18% of Kuwait pearls misclassify, primarily as one another, with very few misclassifications as “Oman.” Conversely, only 4% of pearls from Oman receive an incorrect provenance determination, all as “Bahrain.”
DISCUSSION
Alexandrite Provenance Determination. When alexandrite trace element compositions are transformed using PCA, separation between samples from different localities becomes apparent, enabling the broad classification of several alexandrite samples with previously inconclusive provenance (figure 7). PCA indicates several elements that may discriminate alexandrite from different localities, such as high vanadium in Indian alexandrite and gallium concentrations that are elevated in Sri Lankan alexandrite and low in Brazilian samples. These differences are also resolved in box and whisker plots for these elements (figure 8). Alexandrite from Sri Lanka and Madagascar display overlapping compositions that may be differentiated using additional principal components. PCA also indicates that Brazilian alexandrites are compositionally diverse, with some having higher germanium despite reportedly originating from the same mine (Sun et al., 2019). This underscores the importance of compiling representative databases for accurate provenance determination of unclassified stones. The compositional variability of alexandrite within a particular country may increase as stones from other mines are collected.
A moderately strong positive correlation between the elements boron and gallium for the full dataset is inferred from the vectors with similar length and direction shown in figure 7. The primary advantage of applying PCA to alexandrite is its ability to consider each of these aspects simultaneously, enabling efficient visualization and interpretation of alexandrite compositional data. If alexandrite samples from different localities are formed through distinct geological processes, PCA plots may reveal detectable differences in their compositions.
Machine learning models produced using the LDA, RF, SVM, and ANN methods can determine alexandrite provenance with error rates <4% (table 4). Of the 10 alexandrite samples with inconclusive provenance determinations using traditional approaches, 50% received a consistent classification using all four methods, while others were classified similarly using three different models (table 5). However, some samples received conflicting outcomes or classification probabilities (not shown) that were intermediate. These samples should be referred to a trained specialist before the ML outcome is accepted. Discrepancies between models calibrated using the same dataset may arise from the underlying mathematical properties of each model, some of which are more suitable for particular datasets, combined with the fact that some alexandrite samples have elemental compositions that are transitional between stones from different localities. This may be further complicated by changes in quality and consistency of data collection methods over time, including changes in standard reference materials. Previous studies have shown RF to be a particularly powerful method (Fernández-Delgado et al., 2014; Bolton et al., 2019), a finding consistent with model error rates in this study (table 4). In cases of conflicting outcomes, the RF model results may carry more weight with trained specialists than LDA, for example.
For alexandrite from different localities with similar compositions, it can be difficult to accurately determine provenance even with very robust ML models. To reduce model error rates and improve provenance determination for more stones, new compositional variables can be added. While the addition of ML to traditional approaches failed to definitively classify all samples in this study, it reduced the number of “undetermined” results by 50%, using existing data routinely acquired during gemological assessment.
CVD-Grown Diamond Treatment Detection. Machine learning approaches previously applied to detecting treatment in CVD-grown diamonds achieved error rates <5%, using the intensities of 13 peaks in PL spectra as variables in RF models (Hardman et al., 2022). The models described by Hardman et al. (2022) are currently used at GIA to aid in treatment detection for CVD-grown diamonds. For each stone analyzed, a confidence value is provided to the user, reflecting the model’s level of certainty in the classification outcome. This is very important for accurate detection of treatment: Accepting ML outcomes without careful consideration, even with error rates of 1%, can scale to a large number of misclassifications as more stones are tested.
Laboratory-grown diamond producers are continually exploring new approaches to post-growth treatment (Eaton-Magaña et al., 2024). These developments may introduce new defect distributions, rendering existing classification criteria obsolete and necessitating the development of new criteria. Beyond the 13 peaks used by Hardman et al. (2022), the spectroscopic signatures of CVD-grown diamonds contain many other peaks, not all of which have been fully characterized. As many peaks occur in different combinations in thousands of available spectra, machine learning offers a powerful tool to mine these data. By applying ML to diamonds with known treatment histories, researchers can identify potential new classification criteria significantly faster.
Applying Boruta to the CVD-grown diamond database in this study produced a ranking of the PL spectra peaks considered statistically useful for treatment detection. For brevity, we have listed only the top 10 important peaks as identified by the feature selection algorithm (table 6), shown graphically in figure 9. The majority of these peaks (e.g., 524.3 and 575 nm) were previously used for treatment detection by Hardman et al. (2022). The peak doublet at 596/597 nm has a high ranking and was previously reported as an important indicator of untreated CVD-grown diamonds (Wang et al., 2003; Martineau et al., 2004). Other peaks, such as 536.5 nm, are assigned high importance using Boruta but were not applied directly by Hardman et al. (2022). This feature occurs in the 520–550 nm spectral range, which contains many previously documented peaks associated with post-growth treatment of CVD-grown diamonds (Wang et al., 2012; Barrie, 2020; Hardman et al., 2022). With further validation and testing, this peak—and others identified using feature selection—may become new criteria for detection, complementing existing variables. Boruta analysis also provides a sense of comparative importance that is useful for classification. For example, a 524.3 nm peak may be very important for identifying HPHT treatment. In its absence, the relative intensity of the 575 nm peak—which is present in nearly all CVD-grown diamonds and ranked as the second most important variable (table 6)—becomes an even more important variable. New variables identified by Boruta must be validated before they are added into new ML models for treatment detection: Future studies can evaluate how strongly new variables improve classification error rates. Regardless, the feature selection procedure as a whole represents a significant time savings for objectively parsing large datasets and identifying subtle variations in available data that may be difficult for a human to quantify without computer software.
Natural Saltwater Pearl Provenance Determination. PCA applied to trace element compositions of natural saltwater pearls reveals that many pearls from Oman are compositionally distinct from those of Kuwait and Bahrain, driven in part by elevated magnesium concentrations (figure 10, A and B). By comparison, many pearls from Kuwait and Bahrain are compositionally similar to each other. The elemental compositions of pearls are the result of complex biological processes related to mollusk species, water chemistry, and nutrient sources (Homkrajae et al., 2019). Therefore, it is difficult to specify which biogenic processes cause the compositional similarity between pearls from Kuwait and Bahrain. For example, while the Kuwait and Bahrain pearls are derived from Pinctada radiata mollusks, their nutrient sources might differ. There is also no guarantee that all mollusks from a single locality and from the same species had similar nutrient sources. This may account for the compositional spread in pearls from each locality (figure 10).
For example, non-nacreous pearls have higher magnesium concentrations than nacreous pearls from the same locality (figure 10C). When analyzing pearls using LA-ICP-MS, it is important to accurately record the position of the analysis within the pearl, along with other useful descriptive information about the material (such as the presence or absence of nacre). These observations provide important context for compositional variation displayed in PCA plots.
Despite the compositional overlap of pearls from Bahrain and Kuwait using PCA (figure 10, A and B), the LDA, RF, SVM, and ANN models can still determine the provenance of many pearls (table 4). Each model has a moderate error rate for the classification of pearl provenance (>10%), indicating compositional differences between many pearls from different localities. However, some compositional overlap remains unresolved using the current set of elemental variables, even when using ML techniques. These error rates are higher than those observed for the classification of alexandrite (table 4), reflecting the nature of biological and growth environment conditions—some problems cannot be fully resolved even with powerful statistical methods such as RF. Nevertheless, the majority of Oman pearls from this study received a correct classification using RF, whereas classifications for Bahrain and Kuwait were less conclusive (table 7). When finalizing a provenance determination, pearls classified by ML as “Oman” tend to be more accurate. The similar compositions of some pearls from Bahrain and Kuwait may be attributed to the fact that they are produced by the same mollusk species (Pinctada radiata) located within the Persian (Arabian) Gulf, which provides similar biological and growth environment conditions. Conversely, Oman pearls form in Pinctada margaritifera mollusks found in the Gulf of Oman region.
Trace element composition also provides insight into the biogenic processes that formed the pearls. Freshwater pearls have much higher manganese content than saltwater pearls (Homkrajae et al., 2019). Among the natural saltwater pearls in this study, we noted several other variations in elemental compositions that may reflect local environmental conditions, such as varying concentrations of strontium, sodium, potassium, and barium (figure 10, A and B). Pearls harvested over time from a variety of environments may also be sensitive to small- and large-scale climate changes. Consequently, analyzing pearl compositional data using ML may be complicated by temporal changes in environment.
Pearl research is an active area of study at GIA, one that relies on acquiring pearls for which the provenance and culturing history are known and well characterized. Findings from ML will help shape future research projects by identifying important elemental variables and highlighting populations of pearls that would benefit from expanded sampling. Adding more elements into the ML models in this study—if these elements are shown to be free of analytical interferences (compare to Homkrajae et al., 2019)—may further improve the determination of pearl provenance.
CONCLUSIONS
Traditional approaches to gem classification at GIA encompass microscopic examination and the acquisition of trace element or spectroscopic data. Trace element data can be compared using bivariate scatterplots and selective plotting, while spectroscopic data are assessed through visual inspection to pinpoint key classification features. Unclassified materials are compared to curated datasets of samples for which the classification has been independently verified (figure 11). Machine learning complements these approaches, enhancing the determination of provenance and detection of color treatment in gem materials. The ML models in this study can successfully determine the provenance of alexandrite and reduce the number that may otherwise receive an inconclusive outcome. For CVD-grown diamonds, ML identifies peaks in spectroscopic data that offer valuable indicators of post-growth treatment. For natural saltwater pearls, principal component analysis helps to assess biogenic processes related to growth environment, enabling preliminary provenance determination.

Despite high success rates for some classification problems (>95%; table 4), ML models still provide incorrect answers sometimes. For certain natural gemstones, this could be associated with geology—similar geological processes occurring at different localities may result in samples with very similar properties and elemental compositions, complicating provenance determination (e.g., Smith et al., 2022). In these cases, ML does not always provide a confident provenance determination, no matter how robust the approach. If a decision cannot be made after classification by ML and review by a trained specialist, the sample may receive a final decision of “undetermined.” Nevertheless, ML successfully minimizes the small number of “undetermined” outcomes by leveraging data that is already part of routine gemological practices. These techniques are powerful complements to existing laboratory practices, offering substantial time and cost savings.