Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).Khai thác dữ liệu lớn trong việc tính chỉ số giá tiêu dùng ở Việt Nam (trường hợp TP. Hồ Chí Minh).
Trang 1UNIVERSITY OF ECONOMICS HO CHI MINH CITY
Nguyen Thanh Binh
EXPLOITING BIG DATA IN CALCULATING THE
CONSUMER PRICE INDEX IN VIETNAM
(CASE OF HO CHI MINH CITY)
SUMMARY OF PH.D DISSERTATION
Ho Chi Minh City, 2023
Trang 2UNIVERSITY OF ECONOMICS HO CHI MINH CITY
Nguyen Thanh Binh
EXPLOITING BIG DATA IN CALCULATING THE
CONSUMER PRICE INDEX IN VIETNAM
(CASE OF HO CHI MINH CITY)
Major: Statistics Code: 9460201
SUMMARY OF PH.D DISSERTATION
INSTRUCTOR
Ha Văn Sơn, PhD
Le Thi Thanh Loan, PhD
Ho Chi Minh City, 2023
Trang 3Chi Minh City
Scientific instructors:
Reviewer 1:
………
Reviewer 2:
………
Reviewer 3:
………
The dissertation will be defended at the University level - Committee of Disseratation Evaluation: At ……… On ………
The dissertation is available on library:
Trang 4CHAPTER 1: INTRODUCTION TO RESEARCH
1.1 Reasons to choose the research topic
The Consumer Price Index is considered one of the most important economic indicators published by national statistical offices (Berry et al., 2019) These agencies typically select a representative sample of goods and services most frequently consumed by the population to calculate the Consumer Price Index This traditional method of collecting price information through surveys, as currently practiced, also has some limitations such as the cost of conducting surveys and the increasing difficulties of carrying them out, the growing number of retail chains leading to longer data collection, and the decreasing response rates (Crystal et al., 2019), as well as sampling errors and non-sampling errors due to the quality of collected information depending on the skills and the honesty of the interviewers The Consumer Price Index in Vietnam is collected, calculated, and published by the General Statistics Office Although the statistical information system on consumer prices is implemented according to international practices and is getting improved, there are still some existing issues similar to the global situation in calculating the Consumer Price Index
Along with the increasingly developing trend of the digital economy worldwide, online transactions are becoming more common, creating an immense and diverse source of price data This large data source can help gather price information more timely, with
a wider variety of items and higher collection frequency (Crystal et al., 2019) The Billion Prices Project by the Massachusetts Institute
Trang 5of Technology is one of the pioneering projects in exploiting this large data source Research results have proven that detailed data on retail prices can be collected remotely at a significantly lower cost compared to traditional methods (Cavallo and Rigobon, 2016) Recognizing the advantages of this big data source, countries have begun to deploy experimental research and apply big data to calculate the Consumer Price Index for official statistics, with typical studies in Norway (Manik and Albarda, 2015), the UK (Naynor et al., 2015), Belgium (Van Loon and Roels, 2018), France, Sweden, and the Netherlands (Jens, 2019), and the USA (Crystal et al., 2019) Recognizing the importance of information and communication technology, and especially big data, on May 10, 2018, the Prime Minister issued Decision No 501/QĐ-TTg approving the proposal
on the application of information and communication technology in the National Statistical System for the period 2017 – 2025, with a vision to 2030 The goal is set to: "Apply big data technology to modernize, reduce costs, improve quality, and enhance forecasting capabilities for some statistical indicators in the field of price statistics" (Prime Minister, 2018) Realizing the importance and immense potential of big data, the General Statistics Office has outlined some strategic orientations such as: establishing a working group on big data, adding "Researching the application of big data to the Information Technology Application Development Program of the General Statistics Office," and building a proposal on the application of big data in National Statistics (Nguyễn Bích Lâm, 2016)
Trang 6Researching solutions to use big data for calculating the Consumer Price Index in Vietnam is very necessary and in line with global trends, therefore, the author has chosen the topic: “Exploiting big data in calculating the Consumer Price Index in Vietnam (the case of Ho Chi Minh City)” as the research subject for this thesis
1.2 Research objectives
Objective 1: To build a process for extracting price information from big data; Objective 2: To establish procedures and techniques for calculating the Consumer Price Index from big data; Objective 3:
To research the application of the Hedonic regression model for adjusting the quality changes of goods or for the scenario where goods are no longer available in the market for calculating the
Consumer Price Index; Objective 4: To analyze the appropriateness
of applying the Hedonic regression model for these adjustments;
Objective 5: To provide policy implications for the deployment of
calculating the Consumer Price Index from big data
1.3 Research questions
Research question 1: How is the process of collecting prices
from online websites?
Research question 2: How is the process and technique of
calculating the Consumer Price Index from big data?
Research question 3: Can the Hedonic regression model be
applied based on big data to adjust for changes in the quality of goods or in cases where goods are no longer present in the market?
Trang 7Research question 4: Is the Consumer Price Index from big data
and the application of the Hedonic regression model based on this data appropriate?
Research question 5: What research implications are there for
successfully implementing online price collection for calculating the Consumer Price Index in Vietnam?
1.4 Research subjects and scopes
Reseach subjects: Big data, Consumer Price Index and Hedonic regression model
Reseach scopes: The research implements the calculation of the Consumer Price Index in Ho Chi Minh City; which applies Hedonic regression model based on big data of laptops Price data is collected and aggregated for the years 2017 and 2018
1.5 Research methodology
To achieve the research objectives, the thesis exploited methods approach, using both qualitative and quantitative methods The qualitative method is carried out through group discussions, one-on-one discussions, and scientific seminars aimed at determining the necessity and solutions for the deployment of big data collection
mixed-to serve the calculation of the Consumer Price Index; Additionally, it identifies factors that influence laptop price
The quantitative research method: Based on qualitative research,
a model of factors affecting laptop prices is constructed Research is conducted to collect data from websites selling laptops online, including 974 types of laptops with detailed information such as price, configuration, weight, brand From previous studies and
Trang 8expert’s sharings, the author chooses logarithm linear model (LogLin) for this study, and estimates are made using the Ordinary Least Squares (OLS) method Tests performed on the model include the t-test, F-test, and the use of the R-squared coefficient to evaluate the model fit, as well as using the variance inflation factor to test for multicollinearity and the White test for heteroscedasticity
1.6 Data source
The thesis utilizes price information from all items collected from 29 official websites verified by the Ministry of Information and Communications, which are major and reputable online shopping sites in Vietnam
1.7 Contribution of the thesis
Theoretical contribution: The thesis develops a new approach
to collecting statistical information, which is one of the most important steps in the 7-step process of statistical information production
Practical contribution: The data collection method based on
big data will help to improve the quality of input data; establish a process for extracting price information from big data, specifically from online shopping websites in Vietnam; develop a process and technique for calculating the Consumer Price Index from big data; and construct a Hedonic regression model to adjust for changes in the quality of goods based on big data
Trang 9CHAPTER 2 LITERATURE REVIEW AND OVERVIEW OF
RELATED PREVIOUS STUDIES 2.1 Literature review on price and price index
"The Consumer Price Index is a relative indicator (measured in
%) that reflects the trend and the extent of price fluctuations over time of the goods in the representative basket of goods and consumer services" (General Statistics Office, 2018) “The weight used to calculate the Consumer Price Index is the expenditure structure of the groups of items in the total household expenditure, which is compiled from the results of the living standards survey and is fixed for about 5 years” (General Statistics Office, 2018)
To calculate the Consumer Price Index, national statistical agencies must collect data on prices and quantities for various goods and services Additionally, to estimate price changes relative to the base period (using weights, also known as expenditure shares), these agencies need to gather data on household expenditure structures Typically, this data is obtained through living standards surveys, which most national statistical agencies conduct at irregular intervals (Beegle et al., 2016)
Currently, the Consumer Price Index is collected, compiled, and published in 196 economies, including 37 developed economies, accounting for a 19% share, and 159 emerging and developing economies, accounting for an 81% share (Berry et al., 2019) In Vietnam, in order to calculate and publish the monthly, quarterly, and annual Consumer Price Index as currently done, the statistical agency carries out the consumer price survey (General Statistics
Trang 10Office, 2015) From the survey design of the General Statistics Office and the practice of collecting price information locally, based
on the research by Berry and colleagues (2019), the criteria for assessing the reasonableness of the Consumer Price Index were proposed It can be seen that the method of calculating the Consumer Price Index in Vietnam conforms well to the international practices, such as: the frequency of updating the weights of the Consumer Price Index is carried out every five years; Regarding the timeliness of data publication: Vietnam's Consumer Price Index is published on the 29th of each month; Vietnam is using the COICOP classification
in line with the recommendations of international organizations
2.2 Literature review on Big Data
Daas et al (2023) define "Big Data as datasets (extremely large) that may contain both structured and unstructured data and when analyzed computationally, can reveal patterns, trends, and associations related to behaviors and interactions of the units included." Struijs et al (2014) argue that big data is a very diverse and broad topic for research The emergence of big data across various fields has led to a series of groundbreaking innovations that national statistical agencies cannot be left out of due to their role as the primary data provider and the authoritative agency on official statistics Letouzé and Jütting (2014) argue that "engaging with big data is not a technical issue but a political obligation." Daas et al (2023) have synthesized a total of 44 statistical fields based on big data, of which six fields have been officially published, while the rest are experimental and have been produced once or more frequently
Trang 11Within these six fields, price statistics are ranked first and are the only field that more than five countries use for official statistics Currently, almost every country in the world has deployed research and the application of big data for price statistics work Specific examples include Norway (Manik and Albarda, 2015), the UK (Naynor et al., 2015), Belgium (Van Loon and Roels, 2018), France, Sweden, and the Netherlands (Jens, 2019), the USA (Crystal et al., 2019), Australia (ESCAP, 2020), New Zealand (Statistics New Zealand, 2017), Japan (Statistics Bureau of Japan, 2019), Malaysia (Mustapa, 2020), the Philippines, and Indonesia (Bernal et al., 2021)
2.3 Literature review on Hedonic regression model
The Consumer Price Index measures the average price change over time for goods and services by collecting prices for a representative sample of consumer items at retail points A fundamental issue with goods and services in the consumer price index sample is that their characteristics, not just prices, change over time as retailers introduce new versions and discontinue older ones These new versions may bring additional benefits or, in some cases, reduce benefits The change in these benefits is a change in quality
To accurately measure price changes, the Consumer Price Index must distinguish and eliminate the price change due to this quality change The International Labour Organization (ILO, 2014) guides methods of handling this issue, including direct comparison, quantity adjustment, and hedonic pricing The direct comparison method is only used when the new product is very similar to the old one, and it can be assumed to have a similar base price The quantity adjustment
Trang 12method is employed when goods only experience a permanent size change within a product (assuming other quality factors remain constant) Additionally, price imputation and the traditional solution
of temporarily removing an item from the sample when its quality changes are often used Although this method is sometimes accepted,
it distorts the consumer price index if the price changes of the new version are systematically different from the unchanged items Therefore, in cases where goods and services undergo significant quality changes, the use of the Hedonic regression method is appropriate Hedonic quality adjustment is one of the techniques used by the consumer price index to account for changes in the quality of products in some categories that tend to undergo high levels of quality change due to seasonal variations, such as in apparel, or due to new innovations and technological advancements, such as in consumer equipment and electronic device
The results of the joint survey by the IMF-OECD conducted in August 2017 provide insights into how quality adjustments are handled in price indices In total, 43 countries responded to the questionnaire on quality adjustment According to the questionnaire
of IMF-OECD, among the 43 countries surveyed, 6 economies (14%) do not make quality adjustments when replacing new products There are 27 countries (63%) using various methods depending on the product, and 8 economies use only one quality adjustment method in all cases (Berry et al., 2019) The most commonly used methods for quality adjustment in the consumer price index are overlap, Hedonic regression, and judgmental quality
Trang 13adjustment Thus, the Hedonic regression model is chosen by many countries when adjusting the quality of goods and services
Wells and Restieaux (2014) synthesized the use of Hedonic quality adjustment in calculating the Consumer Price Index in several countries Among the 8 countries studied, all of them, namely Australia, Canada, New Zealand, the United States, Germany, Sweden, the United Kingdom, and Switzerland, employed Hedonic regression to adjust for changes in the quality of products In the overview section, out of the 5 countries calculating the Consumer Price Index, only South Africa did not implement quality adjustments for goods, while the United States, the United Kingdom, Australia, and Japan all used the Hedonic regression model to adjust for the quality of goods in official statistics The analysis above has demonstrated the necessity of applying the Hedonic regression model
in calculating the Consumer Price Index
2.4 Overview of relevant previous studies related to the topic
There are totally 29 international and domestic studies related to the topic which were reviewed in the dissertation Among them, 11 international studies utilized big data for calculating the Consumer Price Index: Cavallo (2013), Naylor et al (2015), Aparicio and Bertolloto (2016), Haoyang Wu et al (2017), Harchaoui and Janssen (2018), Konny (2020), Juszczak (2021), Haqqoni and Pramana (2022), Benedetti et al (2022), Del Prado et al (2023), Seitaro Tanimichi and Takuya Shibata (2023) Regarding domestic studies related to the price index, the dissertation covered 9 studies: Van Thi Loan (1998), Bui Duy Phu (2006), Nguyen Thi Lien (2008), Dinh
Trang 14Thi Bao Linh (2009), Do Thi Ngoc et al (2014), Do Thi Ngoc (2014), Vu Thi Thu Thuy (2015), Vu Thi Thu Thuy (2018), Nguyen Van Thuy and Nguyen Cong Hoan (2018) Additionally, the research also reviewed 9 international studies related to quality adjustment in computer indices: Berndt and Griliphes (1990), Nelson et al (1994), Berndt et al (1995), Baker (1997), Konijn et al (2003), Parkhomenko A et al (2017), Zafar and Himpens (2019), Shkolnyi (2021), Goren and Arslan (2023)
2.5 Research gap
Based on the analysis of scientific research document related to using big data for calculating the Consumer Price Index, the author identified some existing gaps as follows:
Firstly, there are limitations in the collection and calculation of the Consumer Price Index using traditional methods globally (Crystal
et al., 2019), and in the context of Vietnam, studies related to using big data for calculating the Consumer Price Index in Vietnam also face numerous constraints Additionally, adjusting the quality of goods is a crucial aspect in the process of calculating the Consumer Price Index, as the fundamental requirement of the index is that the quality and specifications of goods and services in the basket of goods remain unchanged over the study period The Hedonic regression model is one of the suitable methods for adjusting the quality of goods, especially when the goods have multiple compositional characteristics In Vietnam, the Hedonic regression model has not been studied for integration into the Consumer Price Index calculation process