HSUM HC Integrating Bert Based Hidden Aggregation to Hierarchical Classifier for Vietnamese Aspect Based Sentiment Analysis HSUM HC Integrating Bert based hidden aggregation to hierarchical classifier[.]
2021 8th NAFOSTED Conference on Information and Computer Science (NICS) HSUM-HC: Integrating Bert-based hidden aggregation to hierarchical classifier for Vietnamese aspect-based sentiment analysis Tri Cong-Toan Tran Thien Phu Nguyen Ho Chi Minh City University of Technology Vietnam National University Ho Chi Minh City Ho Chi Minh, Viet Nam tri.tran.1713657@hcmut.edu.vn Ho Chi Minh City University of Technology Vietnam National University Ho Chi Minh City Ho Chi Minh, Viet Nam thien.nguyen.phu@hcmut.edu.vn Thanh-Van Le* Ho Chi Minh City University of Technology Vietnam National University Ho Chi Minh City Ho Chi Minh, Viet Nam ltvan@hcmut.edu.vn * Corresponding Author Abstract—Aspect-Based Sentiment Analysis (ABSA), which aims to identify sentiment polarity towards specific aspects in customers’ comments or reviews, has been an attractive topic of research in social listening In this paper, we construct a specialized model utilizing PhoBert’s top-level hidden layers integrated into a hierarchical classifier, taking advantage of these components to propose an effective classification method for ABSA task We evaluated our model’s performance on two public datasets in Vietnamese and the results show that our implementation outperforms previous models on both datasets Index Terms—aspect based sentiment analysis, PhoBert, BERT, hidden layer aggregation, hierarchical classifier, Vietnamese corpus I INTRODUCTION The fast growth of e-commerce, particularly the B2C (business-to-customer) model, has resulted in a rise in online purchasing habits It makes day-to-day transactions extremely simple for the general public, and it ultimately becomes one of the most popular sorts of purchases, especially during a global pandemic like COVID-19 Due to the sheer development of social media platforms, customers are encouraged to provide reviews and comments expressing their positive or negative sentiments about the products or services that they experienced Analyzing a huge amount of data for mining public opinion is a time-consuming and labor-intensive operation As a result, building an automatic sentiment analysis system can help consumers exploit quality judgments of others about interest products Moreover, this system will support businesses to better manage their reputation, understand the business requirements well adapted to the customer’s needs and avoid marketing disasters For this reason, sentiment analysis has become one of the most attractive study fields in machine 978-1-6654-1001-4/21/$31.00 ©2021 IEEE learning among academic and business researchers in recent years There have been previous interesting researches of sentiment analysis for Vietnamese text using VLSP 2016 datasets1 However, in modern days, sentiment analysis does not provide enough information since it assumes that the entire review only has one topic and one sentiment, but a product can have both its pros and cons in many aspects The challenge of Aspectbased sentiment analysis (ABSA) is not only detecting aspects in a review but also the sentiment attached to that aspect A review can be represented by dozens or hundreds of words about multiple aspects with different sentiments to each, and determining which sentiment words go with which aspect can be very difficult With ABSA, reviews about a product can now be analyzed in detail, showing the reviewer’s opinion on each aspect of that product The main problem of ABSA is as follows: Given a customer review about a domain (e.g hotel or restaurant), the goal is to identify sets of (Aspect, Polarity) that fit the opinion mentioned in the review Each aspect is a set of an entity and an attribute, and polarity consists of negative, neutral, and positive sentiment For each domain, all possible combinations of entities and attributes are predefined The ABSA task will be divided into two phases: (i) identify pairs of entities and attribute, (ii) analyze the sentiment polarity to the corresponding aspect (entity#attribute) identified in the previous phase For example, a review “Nơi có quang cảnh tuyệt đẹp, đồ ăn ngon phục vụ tệ” (This place has an amazing view, the food is great too but the service is bad) will output (Entity#Attribute: Polarity) as follows: (Ho- 284 https://vlsp.org.vn/vlsp2016/eval/sa 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) tel#Design&Features: Positive), (Food&Drinks#Quality: Positive), (Service#General: Negative) In this paper, we propose a method using multiple Bert’s top-level hidden layers for classification combined with an intuitive hierarchical classifier for the ABSA task Our results demonstrate that a large model with many hidden layers contains useful information which can be used to get better results We achieved the highest score when applying our method to two Vietnamese ABSA datasets such as VLSP2 and UIT ABSA [1] dataset II RELATED WORK In recent years, Sentiment Analysis has taken off and is strongly developed by advanced researches for social listening Many corpora and tasks have been developed, such as SemEval 2015 (Task 12) [2] and 2016 (Task 5) [3] for various languages, including English, Chinese, etc The first public Vietnamese benchmark datasets were released by the VLSP (Vietnamese Language and Speech Processing) community in 2018 The organizer built two benchmark document-level corpora with 4,751 and 5,600 reviews for the restaurant and hotel domain, respectively Several interesting methods have been proposed to handle these tasks The earliest works are heavily based on feature engineering (Wagner et al [4]; Kiritchenko et al [5]), which made use of the combination of n-grams and sentiment lexicon features to solve various ABSA tasks in SemEval Task 2014 Nguyen and Shirai [6]; Wang et al [7]; Tang et al [8] were able to achieve higher accuracy by improving on Neural network with hierarchical structure by integrating the dependency relations and phrases [6], an Attention module [7], or with target-dependent mechanism [8] Ma et al [9] incorporated useful commonsense knowledge into a deep neural network to further enhance the model Recently, the pre-trained language model over a large text corpus such as ELMo (Peters et al [10]), OpenAI GPT (Radford et al [11]), and especially BERT (Devlin et al [12]) have shown their effectiveness to alleviate the effort of feature engineering Chi Sun et al [13] proposed four methods of converting the ABSA task, such as question answering (QA) and natural language inference (NLI), into a sentence pair classification task by constructing auxiliary sentences and fine-tuned a BERT model to solve the task The sentence pair is created by concatenating the original sentence with an auxiliary sentence generated by several methods from the target-aspect pair Karami et al [14] proposed two modules called Parallel Aggregation and Hierarchical Aggregation utilizing the hidden layers of the BERT language model to produce deeper semantic representations of input sequences The prediction and its loss are performed on each one of the selected modules These losses are then aggregated to produce the final loss of the model They used Conditional Random Fields (CRFs) for the sequence labeling task which yielded better results In addition, their experiments also show that training BERT with a large number of epochs does not cause the model to overfit For the low-resource language such as Vietnamese, there has been little study of aspect-based sentiment analysis over the year, but still steady progress Oanh et al [15] proposed a BERT-based Hierarchical model which integrated the context information of the entity layer into the prediction of the aspect layer, optimizing the global loss functions to capture the entire information from all layers Their model consists of two main components The Bert component encodes the context information of the review into a representation vector The representation vector will be used as input to the hierarchical model to generate multiple outputs (entity; aspect; polarity) corresponding to each layer Thin et al [16] performed an investigation on the performance of various monolingual pretrained language models compared with multilingual models on the Vietnamese aspect category detection problem This research showed the effectiveness of PhoBert compared to several models, including the XLM-R [17], mBERT model [12] and another version of BERT model for Vietnamese languages III PROPOSED MODEL In this section we will introduce HSUM-HC, our ABSA approach inheriting the benefits of PhoBert with hidden layer aggregation and hierarchical classifiers for Vietnamese text (Fig 1) By deeply analyzing the characteristic of each model, we believe this combination can give us a model that is well suited for ABSA task PhoBert is a monolingual pretrained model specifically made for the Vietnamese language Input sequences will be tokenized and fed into the PhoBert Model, then we take top n hidden layers as the meaningful context input for the next step of the hierarchical aggregation layer Then the output of the latter layer will be input into a hierarchical classifier for predicting the set of aspects and sentiment polarity 1) Bert Model: There have been many multilingual pretrained Bert models that support Vietnamese, but as pointed out by [18], these models have two main problems: little pretraining data and not recognizing compound words PhoBert is made to address these problems, it is also the first monolingual Bert model pre-trained for Vietnamese PhoBert’s pre-training approach is based on RoBerta [19], which aims to improve Bert’s pre-training procedures for better performance The pretraining was done with 20GB of monolingual text (Vietnamese Wikipedia and Vietnamese news corpus3 ) and employs the use of a segmenter, VNCoreNLP4 to tokenize compound words (e.g khách_sạn, thức_uống) PhoBert has been used as a pre-trained model in our research because we aim to process Vietnamese text for ABSA tasks For fine-tuning, we follow the steps taken when pre-training the model, using VNCoreNLP for word segmentation, and PhoBert’s tokenizer to split sequences into tokens and map tokens into their index, https://github.com/binhvq/news-corpus https://vlsp.org.vn/vlsp2018/eval/sa https://github.com/vncorenlp/VnCoreNLP 285 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) are not handled properly We experimented with the model architecture in their paper and saw that we could improve the result by around 3% by using PhoBert as a pre-trained model and VNCoreNLP for word segmentation Secondly, in their implementation, only the last hidden layer was used to make the prediction, this means the top layer is considered most important and all the information in previous hidden layers is not utilized [20] showed that all hidden layers of BERT can contain information, higher-level layers have valuable semantic information Thus, we can enhance the Bert-based model by using these layers For that reason, we implemented the hierarchical hidden level aggregation architecture by [14], which adds a BERT layer on top of the hidden layers The output is then aggregated with the previous hidden layer and then goes through the hierarchical classifier and the total loss is the sum of every classifier’s losses The Binary Cross-Entropy loss function for each layer Li of the classifier is calculated as follows: Li = C X yc · log(σ(ˆ y )) + (1 − yc ) · log(1 − σ(yˆc )) (1) c=1 with C being the number of classes for that layer The loss for each classifier is the sum of three predictions layers’ losses calculated above classif ier_loss = L1 + L2 + L3 (2) The total loss is the sum of all classifier’s losses, with H being the number of classifiers Figure 1: Our HSUM-HC model for the ABSA task total_loss = H X classif ier_lossh (3) h=1 adding the [CLS] token at the start and [SEP] token at the end of each sequence This tokenizer will also give us the attention masks and pad sequences to ensure equal length Then the list of tokens and attention masks will be input into the Bert model 2) Hidden layer aggregation with hierarchical classifiers: A Bert-based model with a hierarchical classifier was created by Oanh et al [15] to deal with ABSA Its architecture is based on how a human would annotate manually the same task It carries out classification in three layers: Entity, Aspect, and Sentiment The process is to label first the entity (e.g Hotel, Room, ) then identify the entity’s attribute (e.g Design, Comfort, ) to form an aspect, and lastly analyze the sentiment for that aspect in the review Every layer contributes its output as context to the next layer With this architecture, we can solve ABSA with an end-to-end model, without the need for multiple classifiers In the original Bert with hierarchical classifier implementation from Oanh et al [15], we observe that some improvements can be made to achieve better performance for this task Firstly, they used a multilingual Bert model and did further training for Vietnamese to create a pre-trained model accustomed to Vietnamese However it is still not specialized since, without the use of a segmenter, Vietnamese compound words With this implementation, we obtain an enhanced model with the goal of achieving the best performance possible for the aspect-based sentiment analysis task: A monolingual pretrained model for Vietnamese text, a mechanism to exploit this pre-trained model to its full potential, and a hierarchical classifier Our promising results will be presented in detail in the experiment section IV EXPERIMENTS A Datasets We experimented our model’s performance with the VLSP 2018 ABSA dataset, which was the first public Vietnamese dataset for ABSA task This dataset was collected from user comments on Agoda5 and consists of document-level reviews The length of each review varies by quite a large number, some are short sentences but some reviews can contain hundreds of words, with the longest containing around 1000 words We also evaluated our model on the UIT ABSA Datasets, which is sentence-level reviews containing relatively short sentences, which only have 1.65 per review on average The data was collected on mytour6 In the formulation of both 286 https://www.agoda.com/vi-vn https://mytour.vn 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) datasets, multiple annotators were employed and raw data were manually annotated with strict guidelines The datasets deal with the hotel and restaurant domains being divided into training, development, and testing sets with similar label ratios There are 34 aspects for the hotel domain and each review can have a various amount of aspects Details about the dataset can be seen in Table I and Table II From the standard deviation for each dataset, it is apparent that the aspect distribution is very uneven, with the most frequent aspect appearing around 2000 times, and the rarest aspect only appearing or times Table I: Dataset Details for VLSP 2018 ABSA Type train dev test #Reviews 3000 2000 600 #Aspect 13948 7111 2584 Avg Aspect 4.64 3.55 4.31 σ 439.21 252.34 84.55 Avg Length 47 23 30 Table II: Dataset Details for UIT ABSA Type train dev test #Reviews 7180 795 2030 #Aspect 11812 1318 3283 Avg Aspect 1.65 1.66 1.62 σ 469.00 52.18 130.26 Avg Length 18.25 18.54 18.27 B Evaluation Metrics To evaluate the performance of ABSA models, we use the micro-average method The evaluation will be done in two phases, Phase A will evaluate the model’s capabilities in detecting aspects of a review, Phase B will evaluate the aspect, polarity pair detection The Precision, Recall, and F1 Scores are evaluated with the following formulas: P ci ∈C T Pci P recision = P ci ∈C T Pci + F Pci P ci ∈C T Pci Recall = P ci ∈C T Pci + F Nci F1 = ∗ P re ∗ Rec P re + Rec C Experimental Setup As mentioned above, we use VNCoreNLP’s segmenter to segment each review before using PhoBert Then we use PhoBert’s tokenizer to get token ids, attention masks and then perform padding We use Hugginface’s AdamW optimizer7 together with the constant scheduler8 for warmup, the base learning rate we choose is 2e−5 and 5e−6 for document and sentence level datasets, respectively We set the warmup ratio to 0.25 and batch size to 10, then we train each model for 100 epochs The BERT model we use is PhoBert-large with 25 Transformers blocks and a hidden layer size of 1024 We test the performance of two settings: layers aggregation (HSUM-HC_4) and layers aggregation (HSUM-HC_8) https://huggingface.co/transformers/main_classes/optimizer_schedules.html D Experimental Results and Discussion We compared our model’s performance with previous work done on the same dataset For the UIT ABSA Dataset, all results besides ours are from the baseline results in [1], these results will be taken from the Multi-task approach (except for SVM) 1) Experimental Results: Results can be seen in Table III and IV for two datasets Overall, we find that our implementation outperforms previous methods in the same task For the VLSP 2018 dataset, our model achieved an F1 score of 85.20% for Phase A and 80.08% for Phase B, which is a significant improvement from previous Deep Learning models Notably, compared to [15], our model performs considerably better when applying a hierarchical classifier with a languagespecific pre-trained model and hidden layers, improving 3.14% in Phase A and 5.39% in Phase B F1 score of our model is 6.04% higher than one of [16] which used PhoBert-base with a linear layer for aspect detection For the UIT ABSA Dataset, our model got 80.78% and 75.25% in Phase A and Phase B, respectively Our model also improved at least 1.68% in Phase A and 1.56% in Phase B compared to baseline models in [1] It’s also proven that using the top layers for hidden layer aggregation gives us a better performance compared to only 4, this is because we are using a large model with more hidden layers, which means more layers can contain useful semantic information From the results of UIT ABSA sentence-level dataset, we can see that our implementation can have lower precision but much higher recall than previous models, which leads to a higher F1 score than Deep Learning models, meaning it overall outperforms these models This is even more apparent in the document-level dataset, which has longer reviews requiring the model to capture long-range dependencies, each review also has a higher amount of aspects on average Therefore this task can be considered more challenging than sentence-level However, for document-level, our model scores significantly higher than it did on sentence-level This means that our model, instead of being challenged by long sequences and forgetting information, actually can learn the extra information in these sequences and make use of them to achieve a better result We see that our model shows its true potential when put through a more demanding task with more information to learn Overall the results show that our implementation is effective in dealing with ABSA, and all three components PhoBert, HSUM, and hierarchical classifier are essential for improving the model’s performance 2) Loss and performance curve: In our experiments, we trained our model with a high amount of epochs and relatively little data Our training loss curve can be seen in Fig 2, from a first glance, it is obvious that our model started to overfit very early and the validation loss kept increasing However, we observe that it is not the case Even though validation loss was increasing, performance still slowly increases as can be seen in Fig This case was also observed by [14] and [21], indicating that the model still learns with a slow and steady 287 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Table III: Results on the test set of VLSP 2018 Dataset, Hotel domain Models Linear SVM Multilayer Perceptron CNN BiLSTM + CNN PhoBert-based viBERT Our method HSUM-HC_8 HSUM-HC_4 Phase A (Aspect Detection) Precision Recall F1 83 58 68 85 42 56 82.35 59.75 69.25 84.03 72.52 77.85 81.49 76.96 79.16 83.93 80.26 82.06 86.79 85.59 83.66 83.39 85.20 84.67 Phase B(Aspect Polarity Detection) Precision Recall F1 71 49 58 80 39 53 76.53 66.04 70.90 80.04 70.01 74.69 84,52 83.50 76.08 74.65 80.08 78.83 Table IV: Results on the test set of UIT ABSA Dataset, Hotel domain Models Multiple SVM CNN LSTM + Attention BiLSTM + Attention CNN-LSTM CNN-LSTM + Attention BiLSTM-CNN PhoBert-base Our method HSUM-HC_8 HSUM-HC_4 Phase A (Aspect Detection) Precision Recall F1 76.68 74.70 75.68 78.61 74.35 76.42 83.47 69.07 75.59 82.02 72.08 76.73 10.74 42.35 17.14 76.92 70.76 73.71 77.11 78.22 77.66 83.46 75.18 79.10 80.26 79.75 81.31 80.96 (a) VLSP 2018 Loss 80.78 80.34 Phase B(Aspect Polarity Precision Recall 69.06 67.28 71.48 67.61 76.22 63.07 74.68 65.63 07.72 30.43 69.02 63.50 70.23 71.23 77.75 70.03 76.87 76.89 73.71 72.97 Detection) F1 68.16 69.49 69.03 69.86 12.32 66.14 70.72 73.69 75.25 74.88 (b) UIT ABSA Loss Figure 2: The loss curves on the validation and test sets for VLSP 2018 (left) and UIT ABSA dataset (right) (a) VLSP 2018 F1 Score (b) UIT ABSA F1 Score Figure 3: The F1 curves on the validation and test sets for VLSP 2018 (left) and UIT ABSA dataset (right) 288 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) pace At one point the performance stands and the learning process stops It can be explained that BERT was pre-trained on an enormous amount of data and therefore will not easily overfit E Conclusion We implemented an effective method that utilizes hidden layers of Bert with a hierarchical classifier to deal with the Vietnamese ABSA task We experimented on two datasets on different review levels and significantly outperforms previous methods, achieving state-of-the-art results for both datasets We find that since Bert-large has 25 hidden layers, using layers for aggregation gives better performance compared to the original layers usage For future work, we plan to apply our model to different domains and languages and test it with online customer reviews to see its potential applications ACKNOWLEDGMENT We would like to thank VLSP 2018 organizers and the UIT NLP Group for providing us with the ABSA datasets REFERENCES [1] D Van Thin, N L.-T Nguyen, T M Truong, L S Le, and D T Vo, “Two new large corpora for vietnamese aspect-based sentiment analysis at sentence level,” ACM Trans Asian Low-Resour Lang Inf Process., vol 20, no 4, May 2021 [Online] Available: https://doi.org/10.1145/3446678 [2] B Phạm and S McLeod, “Consonants, vowels and tones across vietnamese dialects,” International Journal of Speech-Language Pathology, vol 18, no 2, pp 122–134, 2016, pMID: 27172848 [Online] Available: https://doi.org/10.3109/17549507.2015.1101162 [3] M Pontiki, D Galanis, H Papageorgiou, I Androutsopoulos, S Manandhar, M AL-Smadi, M Al-Ayyoub, Y Zhao, B Qin, O De Clercq, V Hoste, M Apidianaki, X Tannier, N Loukachevitch, E Kotelnikov, N Bel, S M Jiménez-Zafra, and G Eryi˘git, “SemEval-2016 task 5: Aspect based sentiment analysis,” in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) San Diego, California: Association for Computational Linguistics, Jun 2016, pp 19–30 [4] J Wagner, P Arora, S Cortes, U Barman, D Bogdanova, J Foster, and L Tounsi, “DCU: Aspect-based polarity classification for SemEval task 4,” in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) Dublin, Ireland: Association for Computational Linguistics, Aug 2014, pp 223–229 [5] S Kiritchenko, X Zhu, C Cherry, and S Mohammad, “NRC-Canada2014: Detecting aspects and sentiment in customer reviews,” in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) Dublin, Ireland: Association for Computational Linguistics, Aug 2014, pp 437–442 [6] T H Nguyen and K Shirai, “PhraseRNN: Phrase recursive neural network for aspect-based sentiment analysis,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing Lisbon, Portugal: Association for Computational Linguistics, Sep 2015, pp 2509–2514 [7] Y Wang, M Huang, X Zhu, and L Zhao, “Attention-based LSTM for aspect-level sentiment classification,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Austin, Texas: Association for Computational Linguistics, Nov 2016, pp 606–615 [8] D Tang, B Qin, X Feng, and T Liu, “Effective lstms for targetdependent sentiment classification,” 2016 [9] Y Ma, H Peng, and E Cambria, “Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm,” Proceedings of the AAAI Conference on Artificial Intelligence, vol 32, no 1, Apr 2018 [Online] Available: https://ojs.aaai.org/index.php/AAAI/article/view/12048 [10] M E Peters, M Neumann, M Iyyer, M Gardner, C Clark, K Lee, and L Zettlemoyer, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume (Long Papers) New Orleans, Louisiana: Association for Computational Linguistics, Jun 2018, pp 2227–2237 [11] A Radford and K Narasimhan, “Improving language understanding by generative pre-training,” 2018 [12] J Devlin, M.-W Chang, K Lee, and K Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2019 [13] C Sun, L Huang, and X Qiu, “Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume (Long and Short Papers) Minneapolis, Minnesota: Association for Computational Linguistics, Jun 2019, pp 380–385 [14] A Karimi, L Rossi, and A Prati, “Improving bert performance for aspect-based sentiment analysis,” arXiv preprint arXiv:2010.11731, 2020 [15] O T Tran and V T Bui, “A bert-based hierarchical model for vietnamese aspect based sentiment analysis,” in 2020 12th International Conference on Knowledge and Systems Engineering (KSE), Can Tho, Viet Nam, 2020, pp 269–274 [16] D V Thin, L S Le, V X Hoang, and N L.-T Nguyen, “Investigating monolingual and multilingual bertmodels for vietnamese aspect category detection,” 2021 [17] A Conneau, K Khandelwal, N Goyal, V Chaudhary, G Wenzek, F Guzmán, E Grave, M Ott, L Zettlemoyer, and V Stoyanov, “Unsupervised cross-lingual representation learning at scale,” 2020 [18] D Q Nguyen and A T Nguyen, “Phobert: Pre-trained language models for vietnamese,” CoRR, vol abs/2003.00744, 2020 [Online] Available: https://arxiv.org/abs/2003.00744 [19] Y Liu, M Ott, N Goyal, J Du, M Joshi, D Chen, O Levy, M Lewis, L Zettlemoyer, and V Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol abs/1907.11692, 2019 [Online] Available: http://arxiv.org/abs/1907.11692 [20] G Jawahar, B Sagot, and D Seddah, “What does BERT learn about the structure of language?” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, Jul 2019, pp 3651–3657 [21] X Li, L Bing, W Zhang, and W Lam, “Exploiting BERT for end-to-end aspect-based sentiment analysis,” in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), Hong Kong, China, Nov 2019, pp 34–41 289 ... “Improving bert performance for aspect -based sentiment analysis, ” arXiv preprint arXiv:2010.11731, 2020 [15] O T Tran and V T Bui, “A bert- based hierarchical model for vietnamese aspect based sentiment. .. sequences to ensure equal length Then the list of tokens and attention masks will be input into the Bert model 2) Hidden layer aggregation with hierarchical classifiers: A Bert- based model with a hierarchical. .. achieving the best performance possible for the aspect -based sentiment analysis task: A monolingual pretrained model for Vietnamese text, a mechanism to exploit this pre-trained model to its full potential,