1. Trang chủ
  2. » Trung học cơ sở - phổ thông

Identification of rice varieties specialties in Vietnam using raman spectroscopy

8 5 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

In this research, we aimed to: (i) analyzing and comparing the characteristic of the Raman spectra between rice varieties sample; (ii) pretreating Raman spectra a[r]

(1)

Cite this paper: Vietnam J Chem., 2020, 58(6), 711-718 Article DOI: 10.1002/vjch.202000017

Identification of rice varieties specialties in Vietnam using Raman spectroscopy

Le Truong Giang1,2*, Pham Quoc Trung1, Dao Hai Yen1

1

Institute of Chemistry, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam

2

Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam

Received February 20, 2019, Accepted April 17, 2020

Abstract

The characteristics and quality of rice are significantly affected by its variety However, discrimination between varieties is an urgent but difficult and time-consuming effort in Vietnam In this study, an effective and reliable identification method was established by Raman spectroscopy (RS) Total Raman spectra of 32 rice samples were acquired from 400 to 1600 cm-1 and the sensitive fundamental vibrations of less polar groups and bonds in rice were analyzed Initially, the raw Raman spectra were processed by standard normal variety (SNV) combined with Savitzky Golay (SG) smoothing algorithm The positive influence of SNV-SGD2 on the ability to classify rice varieties has been confirmed by principal component analysis (PCA) Next, multivariate analysis methods included PCA, hierarchical cluster analysis (HCA), and K-nearest neighbor (KNN), that have been compared with each other on the ability to classify rice varieties All three methods give the ability to classify four rice varieties very well The PCA method identifies four main factors were starch chains, amylose, amylopectin, and protein contents which are used to distinguish among four rice varieties While HCA only distinguishes well between rice with high and low amylopectin content and does not provide the main components

Keywords Rice varieties, Raman spectroscopy, PCA, HCA, KNN

1 INTRODUCTION

Rice is an important food for more than half of the world's population They provide energy for the body in the form of carbohydrates, proteins, vitamins, and various trace elements.[1] Vietnam is known as the leading rice export country in the world, with many kinds of high-quality rice such as ST25, Huong Lai, Tam, and Seng Cu These specialty types of rice have higher economic value than other conventional rice types In recent years, some traders have changed their product labels, mixing different types of rice for-profit purposes This has seriously affected specialty rice brands, interests of consumers, and businesses It is therefore of great significance to ensure that products for which geographical indications are protected, through achieving reliable identification and classification, is of great significance

Over the last decade, several methods have been described for the traceability of rice These methods include detecting differences in inorganic, organic,

(2)

Vietnam Journal of Chemistry Le Truong Giang et al for water-rich samples compared to infrared

spectroscopy For example, Raman spectroscopy has been used to detect organic compounds in foods such as pesticide residues,[8] glucose in blood,[9] vitamin,[10] etc Moreover, the imitation of cooking oil by mineral oil was discovered by using the Raman spectrum and near-infrared spectrum In a study of rice collected from different agricultural areas in Korea, Hwang and colleagues used the Raman spectrum to detect the geographical origin of rice grains.[11] Currently, there is no specific report on the classification of different varieties of Vietnamese rice In this research, we aimed to: (i) analyzing and comparing the characteristic of the Raman spectra between rice varieties sample; (ii) pretreating Raman spectra and using multivariate analysis such as PCA, KNN, and HCA to evaluate and identify rice varieties

2 MATERIALS AND METHODS 2.1 Materials

A total of 32 samples; including 16 Seng Cu rice (MV), Tam rice (T), Ki Deo rice (K), and sticky rice (N) The samples were composed of different species and were cultivated in diverse geographical regions of Vietnam The sample was washed with deionized water, and then dried at 40

o

C until the weight was unchanged, and all the rice kernel samples were ground with a sample miller (LM-3100, Perten, Sweden) to obtain fine powder.[12]

2.2 Methods

2.2.1 Spectral collection method

A LabRAM HR Evolution (HORIBA Jobin Yvon S.A.S France) instrument was used to collect the Raman spectrum of rice samples The condition of LabRAM HR Evolution was set as follows: 50x objective lens, 20 mW laser power, 1.5 cm-1 resolution at room temperature (25 °C), and relative humidity below 60 % The excitation wavelengths and time were set at 632.8 nm and 30 s, respectively, time a scanning range from 100 to 1600 cm-1.[12] The rice sample scans were replicated three times

2.2.2 Raman spectra pre-processing

Spectra of the sample could have been recorded over several days, it is very difficult to calibrate the Raman instrument precisely to have the same Raman shift axis, laser power, and spectral resolutions

(depend on gratings) Before using multivariate analysis, the Raman spectra should be treated by a different kind of method such as mean centering (MC), mean scattering correction (MSC).[13] In this study, a Savitsky-Golay smoothing filter[14] and second-order polynomial deconvolution (SGD2) combined with Standard Normal Variate (SNV) method[15] were performed in this data to obtain the best results Initial, SNV is used to normalize Raman data of rice samples when they are measured at different times After that, the spectral data were processed to reduce background noise by a second-order polynomial 100-point S–G smoothing algorithm

2.3 Multivariate data analysis

Multivariate analysis is divided into main groups including exploration methods, calibration methods, and classification methods.[16] In this paper, the exploration method included principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used to analyze the rice distribution Subsequently, classification methods K-nearest neighbor (KNN) was compared to identify the best fitting model for rice varieties

2.3.1 Principal component analysis (PCA)

The principal component analysis uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.[12,17] In this study, after

pre-processing all spectrum data for rice samples were subjected to a PCA to find patterns in the complex data by reducing the dimensions Nine principal components (PCs) were found, and then whose subset were selected (referred to as PCk) with high

cumulative contribution rate (> 90 %) After that, a distribution chart was plotted base on eigenvalues for the PCk, distances between points represent the

magnitude of difference So the main characteristics of the different rice varieties can be classified base on the loading graph

2.3.2 Hierarchical cluster analysis (HCA)

(3)

Vietnam Journal of Chemistry Identification of rice varietiesspecialties in… 2.3.3 K-nearest neighbor (KNN)

KNN proposed by Fix and Hodges includes the distances calculated among all data points Then, K-closest neighbors are found by sorting the distance matrix.[19] The K-closest data points are analyzed to determine which class label is the most common among the set KNN has good performance in dealing with multiclass problems.[20]

3 RESULTS AND DISCUSSION 3.1 Spectral analysis

The composition of the rice is complex, moreover, the uneven distribution in the grain Figure gives information about the characteristic bands for the different groups In the Raman spectra of four different rice samples were randomly chosen, these main band included 308, 356, 408, 446, 479, 579, 766, 874, 947, 1001, 1061, 1085, 1127, 1203, 1265, 1341, 1405 and 1460 cm-1 Table illustrates each characteristic spectral feature for the vibrational or rotational modes (stretching, bending, torsional fundamental vibrations, etc.) of the different functional groups and the skeletal information of the ring component

Characteristics peaks of the glucose unit in starches were found at 446 cm-1, 479 cm-1, and 579 cm-1 The strong absorption of approximate 479 cm-1 is probably an important skeletal vibration that can reflect the degree of crystallinity in rice starch The fingerprint region for the Raman method from 800 to 1500 cm-1 provides the highly overlapping and complex vibration modes for different functional groups Polysaccharides, which are condensed from multiple glucose units, can be assigned by the different vibrational states of glucose in this region, such as the deformation vibration of CH2 (or CH3) at

1460 cm-1, CH bending at 1341 cm-1, COH deformation and CO stretching between 1085 and 1127 cm-1, and C1-H deformation at 874 cm

-1

.[21] In rice, starch, which is the main component, can be assigned to α-1,4 linkage vibrations (stretching vibration of COC) at 947 cm-1, and a slight location change may be associated with the amylopectin α-1,6 linkage.[12] A band near 1265 cm-1 was attributed to a CH2OH deformation vibration, which is closely

related to crystalline structures in starch.[22] Other components of rice, such as protein and lipids, were associated with vibrations at 1460 to 1341 cm-1 (CH2

twisting vibration), the COH stretching vibration, and OH twisting vibration at 1200 to 1000 cm-1.[21] Bands at 1001 and 1061 cm-1 were vibrations that originated from protein side chains

Figure 1: Raman spectra of selected rice samples

Figure 2a gives information about the spectral features of the sixteen rice samples were randomly chosen from collected spectral libraries Overall, it is obvious that the bands in the spectra are analogous to each other, which suggests that the samples have similar compositions However, it is clear that the intensity does not have similar between rice samples due to amylopectin branching and amylose lengths were different among the cultivars and varieties Since these differences in the spectrum cannot be confidently visualized, a clearer method of differentiation is needed Thus, multivariate analysis methods such as PCA, HCA, and KNN were combined with the Raman spectrum results to further interpret the data

3.2 Preprocessing of Raman spectral

Pretreatment is performed to eliminate the effects of unevenness, base compensation, and noise signals in the spectral data collected for rice samples In this study, before performing further spectral processing, all the spectra were pre-processed according to Sections 2.2 It can be seen that the background signal fluctuates dramatically from 500 to approximately 3000 counts Moreover, the background noise is relatively large in the raw Raman spectrum (figure 2a) The opposite was true for that of the corrected spectra (figure 2b) From this figure, it is clear that the background interference and baseline drift in the raw spectra have been effectively eliminated

In this study, PCA was applied to both the original and corrected spectra of rice grains for classifications (with six samples selected from the Seng Cu and Tam rice) The results are shown in figure 3, overall it obvious that the rice grains are not

200 400 600 800 1000 1200 1400 1600 2000

3000 4000 5000 6000 7000 8000 9000

766

1341

579

1085

947

874

T05

K01

MV03 N02

Intensitive (coun

ts)

Raman Shift (cm-1)

(4)

Vietnam Journal of Chemistry Le Truong Giang et al classified before baselines corrected (figure 3a)

Looking at the information in more detail, the difference scores PC2 among the samples in the same

group were clearly shown Standing at 0.365 and -0.183, the score PC2 of MV05 and T01, which differ

greatly from the rest of the samples in the same group respectively While after the baseline-corrected, samples were classified into two groups, which refer to Seng Cu and Tam varieties (figure 3b)

Table 1: Attribution spectrometry Raman of rice

Wavenumber

(cm-1) Expression form Spectral attribution

446 Pyran ring skeleton vibration Carbohydrate D-glucose

479 Skeleton vibration peak Starch

579

C-O bending vibration

Skeletal modes C–C stretch Skeletal modes of the pyranose ring

Glucan

766 O=C-N deformation vibration and OH

deformation Amide IV

874 C(1)-H vibration, CH2 deformation, and C-O ring

vibration Amylopectin

947 Skeletal mode vibrations of -1,4 glycosidic

linkage (C–O–C)

Glycogen and branched-chain starch

1001 C-O-H stretching vibration Cyclic amylose starch

1061 C–C stretching

1085 C-O-H bending vibration Amylose starch

1127 C-O stretching vibration and C-O-H flexural

deformation vibration Carbohydrate

1265 Amide III band C-N stretching vibration peak

CH2OH (side chain) related mode Protein

1341 CH2 twisting, C–O–H bending Carbohydrate

1405 C-C stretching vibration Glucan

1460 C-H In-plane bending vibration and CH2 and

C-OH bending vibration Glucan

Figure 2: Comparing Raman spectra of four rice varieties

(5)

Vietnam Journal of Chemistry Identification of rice varietiesspecialties in…

Figure 3: Score scatter plot for the first two PCs of rice sample

a - raw data; b - preprocessing using SNV-SGD2 From the above results, it can be suggested that

the influencing factors in the process of acquiring Raman spectra were effectively eliminated Moreover, they also help increase the ability to classify rice varieties by SNV-SGD2 Therefore, when distinguishing rice varieties by the Raman spectrum, the SNV-SGD2 method is necessary for pre-treatment

3.3 Principal Components Analysis

Clearly, it is impossible to distinguish rice varieties based on only one factor due to the difference signal of amylose or amylopectin spectra in rice samples is not clear Therefore, it is necessary to evaluate all signals of rice components, which are amylose, amylopectin, protein, and lipid for purpose discriminant varieties rice

Figure 4: Full-scale Raman spectra of four rice

varieties after preprocessed by SNV-SGD2

The principal component analysis was used in this study for discriminant among four varieties rice, with input data is the peak area at some characteristic wavelengths as follows: S1 (420-450

nm); S2 (470-560 nm); S3 (570-580 nm); S4 (710-720

nm); S5 (860-880 nm); S6 (920-980 nm); S7

(1000-1200 nm) and S8 (1300-140 nm) (figure 4) The

results of PCA indicate that the first nine principal components (PC) explained 100 % variance of the data (table 2) The PC1 represented 73.62 % of the

variance in the Raman spectrum, whereas PC2

accounted for 22.21 % and PC3 for 1.74 %

Noticeably, the cumulative variance of PCk from

to was 97.57 % (> 90 %), hence PC1, PC2, and PC3

were analyzed further The relationship between the variables and principal components was shown in equations (1), (2) and (3)

PC1 = 0.35S1 + 0.37S2 + 0.38S3 + 0.33S4 + 0.26S5

+ 0.37S6 + 0.38S7 (1)

PC2 = -0.21S1 – 0.22S2 + 0.32S4 + 0.52S5 + 0.69S8

(2) PC3 = 0.72S1 + 0.13S2 -0.41S4 + 0.32S8 (3)

The application of PC to all Raman shift produced major characteristic bands that represent significant contributions to varieties rice classification The main band of distinction between rice varieties was shown in equation (1-3) by the load factors in each component from PC1 to PC3

It can be seen that the main characteristic bands included 420-560 cm-1, 860-980 cm-1, 1000-1200 cm-1, and 1300-1400 cm-1, with 420-560 cm-1 showing the strongest correlation (total loading S1-S2

in PC1 was 0.72) This result confirmed that the

200 400 600 800 1000 1200 1400

0.2 0.4 0.6 0.8 1.0 1.2 1.4

S6

* *

* * * * *

S8 S7

S5 S4 S3 S2

Intensitive

Adj

Raman Shift (cm-1)

S1

(6)

Vietnam Journal of Chemistry Le Truong Giang et al main starch chains are affected by the rice variety

Other detected bands are related to amylose, amylopectin, and protein content Therefore, the different quantities or structures of amylose, amylopectin, and protein also are the main reference indices for the discrimination of Seng Cu, Tam, Ki Deo, and sticky rice The score scatters plot for the first two PCs was shown in figure 5, which demonstrates that Seng Cu, Tam, Ki Deo, and sticky rice were grouped in different clusters The results confirmed the separate clusters of the four rice varieties produced by the PCA

Table 2: Eigenvalues and contributing ratios of

principal components PC Eigenvalue Percentage of

Variance (%)

Cumulative (%)

1 6.6256 73.62 73.62

2 1.9992 22.21 95.83

3 0.1563 1.74 97.57

4 0.0912 1.01 98.58

5 0,0687 0.76 99.34

6 0.0285 0.32 99.66

7 0.0201 0.22 99.88

8 0.0063 0.07 99.96

9 0.0040 0.04 100.00

Figure 5: Score scatter plot for the first two PCs of

rice grain sample

3.4 Hierarchical cluster analysis (HCA)

One preliminary way to study data is by exploring the natural groupings among the samples HCA was used to perform a preliminary data scan and to uncover the structure residing in the data The

dendrogram in figure shows the clustering pattern of the data set 32 samples Rice samples were segregated into four clusters: G1, G2, G3, and G4 The

G1 cluster included the rice sample belongs to Seng

Cu varieties (from MV01 to MV16), while the G2

cluster included Tam rice sample (T01-T08) The G4

cluster consists of rice samples of the genus Ki Deo Noticeably, the G3 cluster was sticky rice varieties,

which was classified into sub-clusters (G31, G32)

when choosing the distance from the cluster center about 100000 (brown line, figure 6) The reason behind the splitting of sticky rice samples into sub-clusters may be related to the difference in sticky species and region of collection sites The results of the HCA analysis in Table show that the distance between clusters is very large (> 100000) Specifically, clusters and have a great distance compared to clusters of and 4, which may indicate that sample groups and are normal rice while groups and are flexible rice with high amylopectin content From the above results, it can be seen that using the HCA algorithm is suitable for grouping the initial data, but they are not strong enough to evaluate and provide the main components that contribute to the classification rice varieties

Table 3: Distance between cluster

Cluster

1 111996 242802 476383

2 111996 161564 382775

3 242802 161564 240148

4 476383 382775 240148

Figure 6: Hierarchical cluster analysis (HCA)

dendrogram for concatenated data obtained from Raman spectra of rice sample Colors indicate

grouping proposals

-4 -2 0 2 4 6

(7)

Vietnam Journal of Chemistry Identification of rice varietiesspecialties in…

3.5 K-nearest neighbor (KNN)

K nearest neighbor method is to classify different data by measuring the distance between them In this study, K is and the distance is cosine distance PCA-KNN classification models are respectively established by using the variables obtained from PCA post-analysis of the original data as the input of the KNN method The classification results are shown in table The classification results are good; the accuracy is approximately 90 %

Table 4: Classification of sample groups by the

KNN algorithm

Sample Membership Sample Membership

MV01 N01

MV02 N02

MV03 N03

MV04 N04

MV05 T01

MV06 T02

MV07 T03

MV08 T04

MV09 T05

MV10 T06

MV11 T07

MV12 T08

MV13 K01

MV14 K03

MV15 K02

MV16 K04

4 CONCLUSIONS

The results described in this study open the possibility to differentiate rice varieties by Raman spectroscopy combined with multivariate analysis methods such as PCA, HCA, and KNN The spectroscopy information showed that Raman spectroscopy reflected the sensitive fundamental vibrations of less polar groups and bonds in rice The combination of SNV and SGD2 in Raman spectra preprocessing enhances the ability to classify confirmed rice varieties Three algorithms PCA, HCA, KNN all give good ability to classify rice varieties but PCA can be shown the characteristic band that contributes greatly to the classification of rice varieties Therefore, the Raman technique is suitable for determining rice varieties with nondestructive and cost-efficient characteristics, especially as a fast screening tool for rice producer

and regulatory authorities

Acknowledgment We are grateful for funding

supports from project TDNDTP.03/19-21

REFERENCES

1 Bhattacharya S., S Tyagi, S Srisuma, D L DeMeo, S D Shapiro, R Bueno, E K Silverman, J J Reilly, T J Mariani Peripheral blood gene expression profiles in COPD subjects, Journal of Clinical Bioinformatics, 2011, 1(1), 12

2 Maione C., B L Batista, A D Campiglia, F Barbosa, R.M Barbosa Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry, Computers and Electronics in Agriculture, 2016, 121, 101-107 Tokalolu ., B ầiỗek, N nanỗ, G Zararsz, and A

Öztürk Multivariate Statistical Analysis of Data and ICP-MS Determination of Heavy Metals in Different Brands of Spices Consumed in Kayseri, Turkey, Food Analytical Methods, 2018, 11(9), 2407-2418 T Korenaga Traceability Studies for Analyzing the

Geographical Origin of Rice by Isotope Ratio Mass Spectrometry, Bunseki kagaku, 2014, 63, 233-244 Monakhova Y., D Rutledge, A Roßmann, H.-U

Waiblinger, M Mahler, M Ilse, T Kuballa, D Lachenmeier Determination of rice type by 1H NMR spectroscopy in combination with different chemometric tools, Journal of Chemometrics, 2014, 28, 83-92

6 Sampaio P., A Soares, A Castanho, A S Almeida, J Oliveira, C Brites Dataset of Near-infrared spectroscopy measurement for amylose determination using PLS algorithms, Data Brief., 2017, 15, 389-396

7 Wu Z., J Long, E Xu, F Wang, X Xu, Z Jin, A Jiao A Feasibility Study on the Evaluation of Quality Properties of Chinese Rice Wine Using Raman Spectroscopy, Food Analytical Methods, 2016, 9(5), 1210-1219

8 Xu M.-L, Y Gao, X X Han, B Zhao Detection of Pesticide Residues in Food Using Surface-Enhanced Raman Spectroscopy: A Review, Journal of Agricultural and Food Chemistry, 2017, 65(32), 6719-6726

9 Pandey R., S K Paidi, T A Valdez, C Zhang, N Spegazzini, R R Dasari, I Barman Noninvasive Monitoring of Blood Glucose with Raman Spectroscopy, Acc Chem Res., 2017, 50(2), 264-272 10 Junior B R A., F L F Soares, J A Ardila, L G C

Durango, M R Forim, R L Carneiro Determination of B-complex vitamins in pharmaceutical formulations by surface-enhanced Raman spectroscopy, Spectrochim Acta A Mol Biomol Spectrosc., 2018, 188, 589-595

(8)

Vietnam Journal of Chemistry Le Truong Giang et al

Enhanced Raman spectroscopic discrimination of the geographical origins of rice samples via transmission spectral collection through packed grains, Talanta,

2012, 101, 488-494

12 Zhu L., J Sun, G Wu, Y Wang, H Zhang, L Wang, H Qian, X Qi Identification of rice varieties and determination of their geographical origin in China using Raman spectroscopy, Journal of Cereal Science, 2018, 82, 175-182

13 Gautam R., S Vanga, F Ariese, S Umapathy Review of multidimensional data processing approaches for Raman and infrared spectroscopy, EPJ Techniques and Instrumentation, 2015, 2(1) 14 A Savitzky, M J E G Smoothing and

differentiation of data by simplified least squares procedures, Anal Chem., 1964, 36, 1627-1639 15 Liland K H., A Kohler, N K Afseth Model-based

pre-processing in Raman spectroscopy of biological samples, Journal of Raman Spectroscopy, 2016, 47(6), 643-650

16 Granato D., J S Santos, G B Escher, B L Ferreira, R M Maggio Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective, Trends in Food Science & Technology, 2018, 72, 83-90

17 Murakami K., N Shinozaki, A Fujiwara, X Yuan, A Hashimoto, H Fujihashi, H -C Wang, M B E Livingstone, S Sasaki A Systematic Review of Principal Component Analysis–Derived Dietary Patterns in Japanese Adults: Are Major Dietary Patterns Reproducible Within a Country?, Advances in Nutrition, 2019, 10(2), 237-249

18 Nielsen Hierarchical Clustering, Introduction to HPC with MPI for Data Science, 2016, 195-211 19 Aman Kataria M D S A Review of Data

Classification Using K-Nearest Neighbour Algorithm, International Journal of Emerging Technology and Advanced Engineering, 2013, 3(6), 354-360

20 Kanj S., F Abdallah, T Denœux, K Tout Editing training data for multi-label classification with the k-nearest neighbor rule, Pattern Analysis and Applications, 2016, 19(1), 145-161

21 Feng X., Q Zhang, P Cong, Z Zhu Preliminary study on classification of rice and detection of paraffin in the adulterated samples by Raman spectroscopy combined with multivariate analysis, Talanta, 2013, 115, 548-55

22 Tian F., F Tan, H Li An rapid nondestructive testing method for distinguishing rice producing areas based on Raman spectroscopy and support vector machine, Vibrational Spectroscopy, 2020, 107

Corresponding author: Le Truong Giang

Institute of Chemistry, Vietnam Academy of Science and Technology 18, Hoang Quoc Viet, Cau Giay, Hanoi 10000, Viet Nam

Ngày đăng: 09/04/2021, 23:24

Xem thêm:

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN