Accepted Papers
Written by

Accepted Papers

List of accepted papers

Authors: Manh Nguyen Duy, Hang Dao Viet, Long Dao Van, Hung Le Quang, Khanh Pham Cong, Oanh Nguyen Thi, Thuy Nguyen Thi and Sang Dinh Viet.

Abstract: Endoscopy is one of the most effective methods for disease diagnoses in the upper gastrointestinal (GI) tract. The paper proposes a unified encoder-decoder model that simultaneously deals with three tasks: anatomical site classification, lesion classification, and lesion segmentation. Furthermore, the model is capable of learning from a mixed training dataset derived from multiple data sources. We report results on our own extensive dataset of 8207 images acquired during routine upper GI endoscopic examinations. The experiments show that our model achieves great classification accuracy and competitive segmentation results compared to the single-task model with the same architecture.

Authors: Joe Hrzich, Gunjan Basra and Talal Halabi.

Abstract: Machine Learning (ML)-based intelligent services are gradually becoming the leading service design and delivery model in edge computing, where user and device data is outsourced to take part of large-scale BigData analytics. This paradigm however entails challenging security and privacy concerns, which require rethinking the fundamental concepts behind performing ML. For instance, the encryption of sensitive data provides a straightforward solution that ensures data security and privacy. In particular, Homomorphic encryption allows arbitrary computation on encrypted data and has gained a lot of attention recently. However, it has not been fully adopted by edge computing-based ML due to its potential impact on classification accuracy and model performance. This paper conducts an experimental evaluation of different types of Homomorphic encryption techniques, namely, Partial, Somewhat, and Fully Homomorphic encryption over several ML models, which train on encrypted data and produce classification predictions based on encrypted input data. The results demonstrate two potential directions in the context of ML privacy at the network edge: privacy-preserving training and privacy-preserving classification. The performance of encryption-driven ML models is compared using different metrics such as accuracy and computation time for plaintext vs. encrypted text. This evaluation will guide future research in investigating which ML models perform better over encrypted data.

Authors: Bang Giang Le and Viet Cuong Ta.

Abstract: In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks.
Because the experiences from different tasks are usually biased toward the specific task goals.
Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others.
In this work, we explore the direction of replacing the Kullback-Leibler divergence with a novel Optimal transport-based regularization.
By using the Sinkhorn mapping, we can approximate the Optimal transport distance between the state distribution of tasks.
The distance is then used as an amortized reward to regularize the amount of sharing information.
We experiment our frameworks on several grid-based navigation multi-goal to validate the effectiveness of the approach.
The results show that our added Optimal transport-based rewards are able to speed up the learning process of agents and outperforms several baselines on multi-task learning.

Authors: Binh Nguyen, Linh Tran Quang and Binh Duong Van.

Abstract: The fast growth of e-commerce markets helps companies bring their products closer to customers and lets users have many choices for online shopping. However, it causes the need to have a proper strategy to keep customers in every company. As a rising solution, sentiment analysis on users’ feedback using artificial intelligence is a timely-fashioned way for business owners to understand their customers and clients, which could help them improve their business against competitors. Therefore, in the scope of our research, we introduce our results on the task of customers’ review sentiment analysis using the dataset provided in the Fashion and Beauty Review Rating (one competition organized in Kaggle), where our solution reached first place with a score of 0.51269 RMSE. Our proposed solution combines deep learning models (Bidirectional Long Short-term Memory, Bidirectional Gated Recurrent Unit, Convolutional Neural Network) and a rule-based method (a method that uses linguistic rules to predict the rating of reviews). We can describe the solution in this paper with the support of analysis techniques to give more insightful points.

Authors: Xuan Trinh Thị, Nhan Ta Van, Tung Hoang Do Thanh, Cuong Nguyen, Hai Truong Nam and Hung Tran Dang.

Abstract: In recent years, efforts to stratify disease risk based on polygenic risk scores (PRS) have made great strides. Quality control processes and machine learning models are increasingly being improved when applied to genome-wide association study (GWAS) data. However, most tools have to go through many steps, requiring the user to have a thorough understanding of PRS. Therefore, we have built a PRS calculation tool (called PRST) that is capable of self-optimizing the parameters and giving advice to the user to change the parameters in accordance with the data. This tool not only simplifies quality control operations but also provides calculation results according to previously studied models such as PRSice-2 and Penalized Logistic Regression.

Authors: Lam Thu Bui, Van Truong Vu, Van Pham Bich and Viet Anh Phan.

Abstract: This paper proposes a cooperative co-evolutionary approach namely COESDP to the software defect prediction (SDP) problem. The proposed method consists of three main phases. The first one conducts data preprocessing including data sampling and cleaning. The second phase utilizes a multi-population co-evolutionary approach (MPCA) to find out optimal instance selection solutions. These first two phases help to deal with the imbalanced data challenge of the SDP problem. While the data sampling method aids in the creation of a more balanced data set, MPCA supports in the elimination of unnecessary data samples (or instances) and the selection of crucial instances. The output of phase 2 is a set of different optimal solutions. Each solution is a way of selecting instances from which to create a classifier (or weak learners). Phase 3 utilizes an ensemble learning method to combine these weak learners and produce the final result. The proposed algorithm is compared with conventional machine learning algorithms, ensemble learning algorithms, computational intelligence algorithms and an other multi-population algorithm on 6 standard SDP datasets. Experimental results show that the proposed method gives better and more stable results in comparison with other methods and it can tackle the challenge of imbalance in the SDP data

Authors: Hoang Giang Pham.

Abstract: In many real-world applications, cost factors play a significant role. Costs have been taken into consideration in numerous previous studies in machine learning, especially, in building decision trees. This research also considers a cost-sensitive decision tree construction problem with an assumption that test costs must be paid to obtain the values of the decision attribute and a record must be classified without exceeding the spending cost threshold. Moreover, our problem considers records with multiple condition attributes. We construct a cost-constrained decision tree using a Mixed-Integer formulation, which enables us to identify the optimal trees. The experimental results demonstrate that our formulation satisfactorily handles small data sets with multiple condition attributes under different cost constraints.

Authors: Anh Phan, Dung Le, Van-Anh Tran, Hung Pham, Truong Vu and Lam Bui.

Abstract: This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6\% by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones.

Authors: Khai Dinh Lai, Thai Hoang Le and Thuy Thanh Nguyen.

Abstract: In this research, we study, analyze, and choose a deep learning approach to accurately diagnose lung nodules using the LUNA16 dataset. The ResNet101 network is the specific model we are going towards. The paper’s findings include: (1) Demonstrating the efficiency of the ResNet101 network on the LUNA16. (2) Analyze the benefits and drawbacks of Attention modules before selecting the best Attention module to integrate into the resNet101 model in the classification of lung nodules in CT scans challenge. (3) Compare the efficacy of the proposed model to prior outcomes to demonstrate the model’s feasibility.

Authors: Trang T. H Tran, Nam Vo, Duong Nguyen, Tien Pham, Nam Nguyen, Quang Vu, Giang Vu and Mai Tran.

Abstract: Genome-wide association studies (GWAS) with millions of genetic markers have proven to be useful for precision medicine applications as means of advanced calculation to provide a Polygenic risk score (PRS). However, the potential for interpretation and application of existing PRS models has limited transferability across ancestry groups due to the historical bias of GWAS toward European ancestry. Here we propose an adapted workflow to fine-tune the baseline PRS model to the dataset of target ancestry. We use the dataset of Vietnamese whole genomes from 1KVG project and build a PRS model of height prediction for the Vietnamese population. Our best-fit model achieved an increase in R2 of 0.152 (according to 29.8%) compared to the null model which only consists of the metadata.

Authors: Truong Thang, Yutaka Watanobe, Rage Uday Kiran and Incheon Paik.

Abstract: During the Covid-19 pandemic, most schools and companies had to adopt online learning, which is a special kind of e-learning that provides a virtual classroom via a live session for both teachers and learners. However, studies on education in the Covid-19 pandemic shows that there should be more efforts from researchers as well as governments to effectively support learners. In this paper, we focus on the problem of Quality of Experience in online learning. We discuss the enabling technologies of online learning. Also, we make an extensive review on of QoE in video streaming, the key enabling technology of online leaning. Finally, the key challenges and potential solutions of QoE management for future online learning will be discussed.

Authors: An Mai, Quang Phu Nguyen, Nhat Nguyen and Tuan Tran.

Abstract: In quantitative finance, estimation of the covariance matrix of all assets in a portfolio is extremely important. From this, we can derive approximately the weight of each asset in our investment and evaluate the performance of the target trading strategy via a backtesting system. In this paper, our contribution is two folds. First, taking into account the well-known nonlinear shrinkage estimation approach from Ledoit-Wolf for estimating the covariance matrix of stocks in general framework, we adapt it for Vietnam stock market and gain impressive results comparing with the other popular methods. In our approach, we also add the transaction cost at level of 0.3% in Vietnam market. For the second contribution, we provide a customized backtesting system and can prove its effectiveness experimentally to the real stocks data from HOSE. Our study is expected to be a good reference for investors following quantitative approach especially for Vietnam stocks market.

Authors: Van Vu-Thi, Dung Luong-The and Quan Hoang-Van.

Abstract: The popularity of online recommender systems has soared; they are deployed on numerous websites and gather tremendous amounts of user data that are necessary for recommendation purposes. However, this data may pose a severe threat to user privacy, if accessed by untrusted parties or used inappropriately. The goal of a privacy-preserving recommender system is to hide user ratings from the system and yet allow them to make recommendations. A recent example is the privacy-preserving recommender scheme proposed by Pranav Verma et al. Their scheme can ensure the privacy of user ratings against a malicious server as shown by Mu, Shao, and Miglani. However, this scheme still requires quite high communication and computation costs. This paper proposes an improved protocol using Secure multiparty computation. The theoretical and experimental analysis shows that the proposed method is effective in both computing and communication compared to other method. Moreover, the new protocol preserves the privacy of the honest users against the miner and up to n-2 corrupted users.

Authors: Tran Nguyen Huong, Le Huu Chung, Lam Nguyen Tung, Tran Hoang-Viet and Pham Ngoc Hung.

Abstract: Automated stubs generation is an important problem when testing units which contains calls to other uncompleted functions as testing and development phases are normally performed in parallel. This paper presents a fully automated method, named AS4UT, for generating stubs used in unit testing of C/C++ projects. The key idea of AS4UT is to consider each function call a mock variable. The idea is done by adding a Pre-process CFG (control flow graph) phase to concolic testing method. In this phase, all function calls in the CFG of a unit under test are replaced by their corresponding mock variables. Then, the updated CFG is used as an input for concolic testing method to generate the required test data set. We have implemented AS4UT in a tool, named AutoStubTesing, and performed experiments with some common functions which calls other units. Experimental results show that AS4UT can increase the code coverage of the generated test data set whilst reducing the number of test data and keeping the required time acceptable.

Authors: Thu Trang Hoa and Minh Anh Nguyen.

Abstract: The talent scheduling problem seeks to determine the shooting sequence that minimizes the total cost of the actors
involved, which usually account for a significant portion of the cost of an real-world movie production. We present an extension of the talent scheduling problem that takes into account both
costs of filming locations and actors. To better model the reality, we consider that the rental cost for a filming location can vary across the planning horizon. The objective is to find the shooting sequence as well as the start day for each scene that minimizes the total cost, including actor cost and location cost, while satisfying all scenes are completed within the planned horizon. We first formulate the problem as an mixed integer linear programming (MILP) model, from which small instances can be solved to optimality by MILP solvers. Next, an iterated local search heuristic that can efficiently solve larger instances is developed. Then we provide a new benchmark of data set for our new variance of talent scheduling problem. The computational experiments based upon new benchmark instances suggest that our heuristic can outperform MILP model solve by commercial solver in reasonable amount of time.

Authors: Van Quan Nguyen, Viet Hung Nguyen, Tuan Hao Hoang and Nathan Shone.

Abstract: The role of semi-supervised network intrusion detection systems is becoming increasingly important in the ever-changing digital landscape. Despite the boom in commercial and research interest, there are still many concerns over accuracy yet to be addressed. Two of the major limitations contributing to this concern are reliably learning the underlying probability distribution of normal network data and the identification of the boundary between the normal and anomalous data regions in the latent space. Recent research has proposed many different ways to learn the latent representation of normal data in a semi-supervised manner, such as using Clustering-based Autoencoder (CAE) and hybridized approaches of Principal Component Analysis (PCA) and CAE. However, such approaches are still affected by these limitations, predominantly due to an overreliance on feature engineering, or the inability to handle the large data dimensionality. In this paper, we propose a novel
Cluster Variational Autoencoder (CVAE) deep learning model to overcome the aforementioned limitations and increase the efficiency of network intrusion detection. This enables a more concise and dominant representation of the latent space to be learnt. The probability distribution learning capabilities of the
VAE are fully exploited to learn the underlying probability distribution of the normal network data. This combination enables us to address the limitations discussed. The performance of the proposed model is evaluated using eight benchmark network intrusion datasets: NSL-KDD, UNSW-NB15, CICIDS2017 and five scenarios from CTU13 (CTU13-08, CTU-13-09, CTU13-10, CTU13-12 and CTU13-13). The experimental results achieved clearly demonstrate that the proposed method outperforms semisupervised approaches from existing works.

Authors: Phuong Le-Hong, Thi Thuy Lien Nguyen, Minh Tu Pham and Thanh Hai Vu.

Abstract: This paper presents a multilingual natural language understanding model which is based on BERT and ELECTRA neural networks. The model is pre-trained and fine-tuned on large datasets of four languages including Indonesian, Malaysian, Japanese and Vietnamese. Our fine-tuning method uses an attentional recurrent neural network instead of the common fine-tuning with linear layers. The proposed model is evaluated on several standard benchmark datasets, including intent classification, named entity recognition and sentiment analysis. For Indonesian and Malaysian, our model achieves the same or higher results compared to the existing state-of-the-art IndoNLU and Bahasa ELECTRA models for these languages. For Japanese, our model achieves promising results on sentiment analysis and two-layer named entity recognition. For Vietnamese, our model improves the performance of two sequence labeling tasks including part-of-speech tagging and named entity recognition compared to the state-of-the-art results. The model has been deployed as a core component of the commercial FPT.AI conversational platform, effectively serving many clients in Indonesian, Malaysian, Japanese and Vietnamese markets — the platform has served 62 million API requests in the first five months of 2022 for chatbot services.

Authors: Tien Dung Huynh, Quoc Tuan Vu, Viet Dung Nguyen and Diep Thi Hoang.

Abstract: The approximation technique in MPBoot effectively addresses maximum parsimony phylogenetic bootstrapping, which is an essential task in bioinformatics with diverse applications in evolutionary biology. In this paper, we investigate integrating tree bisection and reconnection (TBR) to MPBoot’s collection of tree rearrangements in order to increase sampling performance in the search space, and we describe the MPBoot-TBR algorithm. Since the size of the TBR neighborhood is cubic in the number of taxa, we offer algorithmic strategies for swiftly evaluating a TBR move, searching quickly in the neighborhood of a specified remove-branch, and hill-climbing using TBR. Furthermore, the framework’s stopping condition is adjusted because TBR helps to converge to an acceptable MP score quicker than subtree pruning and regrafting. In terms of bootstrap accuracy, MPBoot-TBR is comparable to MPBoot. In terms of MP score and computation time on real dataset, MPBoot-TBR outperforms the original MPBoot. We have implemented the proposed methods in the MPBoot-TBR program, the source code of which is accessible at Tien Dung.

Authors: Nhon Do and Hien Nguyen.

Abstract: Intelligent Problem Solver (IPS) is an intelligegent system for solving practical problems in the determined domain by using human knowledge. Thus, designing of the knowledge base and the inference engine of IPS systems are important. This study proposed a general model for knowledge representation by using a kernel ontology combining other knowledge components, called Integ-Ontology Based on this model, the model of problems are presented. The reasoning method is also proposed. This method includes the inference processing and techniques of heuristic rules, sample problems and pattern to speed up the problem solving. The Integ-Ontology and its reasoning method are applied to design practical IPS in solid geometry and Direct Current (DC) Electrical Circuits.

Authors: Duc Tran, Ha Nguyen, Hung Nguyen and Tin Nguyen.

Abstract: Advances in single-cell RNA sequencing (scRNA-seq) technologies have allowed us to study the heterogeneity of cell populations within each tissue. The cell compositions of a tissue from the same host may vary greatly indicating the condition of the hosts, from which the samples are collected. However, the high sequencing cost and the lack of fresh tissues make single-cell approaches less appealing. In many cases, it is practically impossible to generate single cell data in a large number of subjects, making it challenging to monitor changes in cell type compositions in various diseases. In this article, we introduce a novel approach, named Deconvolution using Weighted Elastic Net (DWEN), that allows researchers to accurately estimate the cell type compositions from bulk data samples without the need generating single-cell data. It would also allow for the re-analysis of bulk data collected from rare conditions to extract more in-depth cell-type level insights. The approach consists of two modules. The first module constructs the cell type signature matrix from single-cell data, either provided by users or from public repositories. The signature matrix serves as the input of the second module, which estimates the cell type compositions of input bulk samples. In an extensive analysis using 20 datasets generated from scRNA-seq data of different human tissues, we demonstrate that DWEN outperforms current state-of-the-arts in estimating cell type compositions of bulk samples.

Authors: Minh Tran Binh, Long Nguyen and Dinh Nguyen Duc.

Abstract:Using improvement direction to control the evolution of multi-objective optimization algorithms is an interesting and effective method. Improvement direction techniques often evaluate the geometric properties of the solution set in the objective space and based on that to adjusting the evolutionary process to ensure it is capable of exploration and exploitation. The direction of improvement is usually determined based on the convergent and diverse nature of the solution population, in fact, the distribution of the solution population can suggest online adjustment of the evolutionary process to overcome the problem of keeping the balance between convergence and diversity. In this study, we identify empty regions in the solution population and use the centers of those areas, which we call bliss points, to direct and adjust the algorithm using the DMEA-II improvement direction to enhance the quality of the algorithm. Experimental results have shown competitive results, promising to apply to multi-objective evolutionary algorithms using other geometric techniques.

Authors: Duong Nguyen Anh, Nam Cao Hai, Son Nguyen Van, Son Ta Cong and Cuong Dinh Manh.

Abstract: There has been an increasing demand for good se-mantic representations of text in the financial sector when solving natural language processing tasks in Fintech. Previous work has shown that widely used modern language models trained in the general domain often perform poorly in this particular domain. There have been attempts to overcome this limitation by introducing domain-specific language models learned from financial text. However, these approaches suffer from the lack of in-domain data, which is further exacerbated for languages other than English. These problems motivate us to develop a simple and efficient pipeline to extract large amounts of financial text from large-scale multilingual corpora such as OSCAR and C4. We conduct extensive experiments with various downstream tasks in three different languages to demonstrate the effectiveness of our approach across a wide range of standard benchmarks.

Authors: Yanran Zhou, Teeradaj Racharak and Minh Le Nguyen.

Abstract:Due to the variability of human’s conversation, accurately identifying and distinguishing individual emotions from social media text is challenging. To overcome this limitation, this paper investigates to exploit emojis as a new source of information for emotion recognition in conversation. Emojis are received much interest for use as a salient feature in social media NLP systems. However, there is less explored in the domain of conversations in social media. This paper examines state-of-the-art emotion recognition algorithms in deep learning and evaluates impact of supplementing emojis as an additional feature for improving the algorithms. Emojis are transformed into corresponding vectors and combined with text embeddings. We propose two techniques of the combination at conversation and sentence levels. Our experiments show that emojis are effective for improving the accuracy of emotion recognition. We also perform a deeper analysis to find the most optimal dimension of emoji embedding in our recognition task.

Authors: Sukrit Jaidee, Konlakorn Wongpatikaseree, Narit Hnoohom, Sumeth Yuenyong and Panida Yomaboot.

Abstract: Currently, several well-known facial datasets have been proposed and used to train artificial intelligence models for facial expression interpretation. However, since each dataset varies in terms of ethnicity and facial expression characteristics, using the currently available dataset does not yield satisfactory results in predicting the emotional expressions of Thai people. As a result, the research team has developed a dataset on the facial expressions of Thai people, which may be used by academics to study and improve facial expression analysis research. There were two different kinds of datasets created by the research team: audio datasets and image datasets. The research team created two kinds of datasets: audio datasets and image datasets, each of which includes five classes: positive-active (happy), positive-deactivate (relaxed), neutral, negative-active (anger, stress), and negative-deactivate (sad). This dataset was created using data from 30 volunteers who were analyzed for their emotional expression by a group of psychologists from Mahidol University using standardized procedures. The research team calls this information MU-Corpus.

Authors: Thuy-Anh Nguyen Thi, Thi-Hanh Le, Thi-Thao Le, Thi-Hong Vuong, Xuan-Hieu Phan and Quang-Thuy Ha.

Abstract: Knowledge base completion (KBC) is the task to predict and fill missing information based on the existing information in that knowledge base. Recently, one of the most feasible approaches introduced by V. Kocijan and T. Lukasiewicz (2021) is to transfer knowledge from one collection of information to another without the need for entity or relation matching, but this work has not scaled of pre-training to larger models, datasets and investigating the impact of the encoder architecture. In this work, we propose a method that can combines the benefits of BERT, fastText, Gated Recurrent Unit (GRU) and Fully Connected (FC) layer to improve the KBC task in Kocijan and Lukasiewicz model. The experimental results show that the effectiveness of our porposed model in several popular datasets like ReVerb20K, ReVerb45K, FB15K237 and WN18RR.

Authors: Thu-Trang Nguyen and Dinh Hieu Vo.

Abstract: Coincidental correctness is the phenomenon that test cases execute the faulty statements yet still produce correct/expected outputs. In software testing, this problem is prevalent and causes negative impacts on fault localization performance. Although detecting coincidentally correct (CC) tests and mitigating their impacts on localizing faults in non-configurable systems have been studied in-depth, handling CC tests in Software Product Line (SPL) systems have been unexplored. To test an SPL system, products are often sampled, and each product is tested individually. The CC test cases, that occur in the test suite of a product, not only affect the testing results of the corresponding product but also affect the overall testing results of the system. This could negatively affect fault localization performance and decelerate the quality assurance process for the system. In this paper, we introduce DeMiC, a novel approach to detect CC tests and mitigate their impacts on localizing variability faults in SPL systems. Our key idea to detect CC tests is that two similar tests tend to examine similar behaviors of the system and should have a similar testing state (i.e., both passed or failed). If only one of them failed, the other could be coincidentally passed. In addition, we propose several solutions to mitigate the negative impacts of CC tests on variability fault localization at different levels. Our experimental results on +2,6M test cases of five widely used SPL systems show that DeMiC can effectively detect CC tests, with 97% accuracy on average. In addition, DeMiC could help to improve the fault localization performance by 61%.

Authors: Thi Linh Hoang and Viet Cuong Ta. 

Abstract: The GNN operates mainly on the message passing mechanism which a node receives related nodes information to improve its internal representation. However, when the depth of the GNN increases, the message passing mechanism cut-offs the high-frequency component of the nodes’ representation, thus leads to the over-smoothing issue.
In this paper, we propose the usage of cluster-based sampling to reduce the smoothing effect of the high number of layers in GNN.
Given each nodes is assigned to a specific region of the embedding space, the cluster-based sampling is expected to propagate this information to the node’s neighbour, thus improve the nodes’ expressivity.
Our approach is tested with several popular GNN architecture and the experiments show that our approach could reduce the smoothing effect in comparison with the standard approaches using the Mean Average Distance metric.

Authors: Minh-Hieu Do, Lam Nguyen Tung, Hoang-Viet Tran and Hung Pham Ngoc.

Abstract: Template in C++ is a useful programming method to create libraries or common classes for data structures or algorithms. This paper presents a fully automated method, named TEC, to generate test data for unit testing of templates in C++ projects. The main idea of TEC is to add a source code pre-processing phase to concolic testing method. This phase is to parse source code of a given template and its including project to find lists of suitable data types for each template parameter. These lists of data types are combined and passed to templates to create concrete source code. Then, TEC uses concolic testing to generate test data for these source code. We have implemented TEC method and performed experiments with potential results. In addition, we give some discussions about the experimental results in the paper.

Authors: Huu-Tien Dang, Thi-Hai-Yen Vuong and Xuan-Hieu Phan.

Abstract: Converting written texts into their spoken forms is an essential problem in any text–to–speech (TTS) systems. However, building an effective text normalization solution for a real–world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two–phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule–based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.

Authors: Viet-Trung Tran, Van-Sang Tran, Xuan-Bang Nguyen and The-Trung Tran.

Abstract: Face anti-spoofing has become increasingly critical due to the widespread deployment of face recognition technology. Current approaches mostly focus on presentation attacks, where they rely on textual and spatio-temporal features in captured facial videos. However, in an environment where end-users manage their own devices, attackers can cheat by using virtual camera sensors and easily bypass sophisticated approaches for presentation attacks. In this paper, we propose a novel liveness detection protocol where users are required to read a random generated sequence of words. Our proposed prediction model, LipBERT, a deep visual-linguistic alignment, is trained to detect if the captured facial stream conforms to the valid textual sequence. For the experiments, we introduce VNFaceTalking1, an extensive dataset of 188,561 samples (around 130 hours in total). Each sample is at most 3 seconds video of frontal face talking Vietnamese. Experiments on the VNFaceTalking dataset demonstrate promising results.

Authors: Masaya Taniguchi and Satoshi Tojo.

Abstract: The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generaitve power of the obtained grammar. In this phase, the head information is a significant feature in retrieving a syntactic feature and a semantic/ predicate structure, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve that using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.

Authors: Quoc-Dai Luong Tran, Anh-Cuong Le and Van-Nam Huynh.

Abstract: Conversational agents are getting more popular and applied in a wide range of practical application areas. The main task of these agents is not only to generate context-appropriate responses to a given query but also to make the conversation human-like. Thanks to the ability of deep learning based models in natural language modeling, recent studies have made progress in designing conversational agents that can provide more semantically accurate responses. However, the naturalness in such conversation setting has not been given adequate attention in these studies. This paper aims to incorporate both important criteria of accuracy and naturalness of conversation in developing a new model for conversational agents. To this end, inspired by the idea of Turing test and the idea of adversarial learning strategy, we propose to design a model based on generative deep neural networks that interestingly allow to generate accurate responses optimized by the mechanics of imitating human-generated conversations. Experimental results demonstrate that the proposed models produce more natural and accurate responses, yielding significant gains in BLEU scores.

Authors: Yuhui Yang, Koichi Ota, Wen Gu and Shinobu Hasegawa.

Abstract: This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.

Authors: Nguyen Huu Khang, Dinh Cong Dat, Le Thi Thuy Hang and Dinh Dien.

Abstract: Cross-lingual Semantic Textual Similarity (STS) is a challenging problem in Natural Language Understanding tasks, especially for low-resource languages like Vietnamese. Currently,
one of the state-of-the-art approaches for this problem is to use distilled multilingual Sentence Transformer model. However, there are few studies on how these models work for English- Vietnamese language pairs. In this paper, we aim to inspect the performance of these models in the English-Vietnamese STS tasks. From our findings, we will propose possible improvements for this approach in the future.

Authors: Tuan Le Xuan, Hang Pham Thi and Hai Nguyen Do.

Abstract: Detecting and recognizing text in images is a problem that has received a lot of attention due to its high applicability in many fields such as digitization, storage, lookup, authentication,… However, most current research works and products are focusing on detecting and extracting text from images but not paying very much attention to analyzing and exploiting semantics and nuances of those extracted texts. In this study, we propose a system to detect, recognize and classify Vietnamese text in images collected from network sources for the purpose of ensuring network information security and safety. The system receives as input images containing Vietnamese text, using the CRAFT model to perform background processing to produce areas containing text in the image; these text containers will then be rearranged in the same order as in the original image, and the text in the image will also be extracted out according to the text container. Next, we use VietOCR model to convert these text images into text fragments. Finally, these texts will be classified as negative (or even reactionary) using an associative machine learning model. Preliminary results show that the proposed model has an accuracy of up to 88.0\% in detecting and recognizing text and reaching 94\% in classifying text nuances on the collected data set.

Authors: Khoa Tan Truong and Thai Hoang Le.

Abstract: The detection and treatment of cancer and other disorders depend on the use of magnetic resonance imaging (MRI) and computed tomography (CT) scans. Compared to CT scan, MRI scans provide sharper pictures. An MRI is preferable to an X-ray or CT scan when the doctor needs to observe the soft tissues. Besides, MRI scans of organs and soft tissues, such as damaged ligaments and herniated discs, can be more accurate than CT imaging. However, capturing MRI typically takes longer than CT. Furthermore, MRI is substantially more expensive than CT because it requires more sophisticated current equipment. As a result, it is challenging to gather MRI scans to help with the medical image segmentation training issue. To address the aforementioned issue, we suggest using a deep learning network (TarGAN) to reconstruct MRI from CT scans. These created MRI images can then be used to enrich training data for MRI images segmentation issues in the road.

Authors: Khoa Tan Truong and Thai Hoang Le.

Abstract: The detection and treatment of cancer and other disorders depend on the use of magnetic resonance imaging (MRI) and computed tomography (CT) scans. Compared to CT scan, MRI scans provide sharper pictures. An MRI is preferable to an X-ray or CT scan when the doctor needs to observe the soft tissues. Besides, MRI scans of organs and soft tissues, such as damaged ligaments and herniated discs, can be more accurate than CT imaging. However, capturing MRI typically takes longer than CT. Furthermore, MRI is substantially more expensive than CT because it requires more sophisticated current equipment. As a result, it is challenging to gather MRI scans to help with the medical image segmentation training issue. To address the aforementioned issue, we suggest using a deep learning network (TarGAN) to reconstruct MRI from CT scans. These created MRI images can then be used to enrich training data for MRI images segmentation issues in the road.

Authors: Vy Giang Thao, Chi T.K. Huynh and Van Hop Nguyen.

Abstract: Kanban system is a tool of the Just-in-time approach that many automotive companies have been adopted to improve their in-house operations. However, the traditional Kanban system has experienced some disadvantages in meeting the performance targets in terms of inventory, delivery, quality, and cost. In this paper, we propose a new combination approach of Blockchain and Electronic Kanban (E-Kanban) system to improve the traditional Kanban system for pull leveling pattern associated with the parallel information system. The proposed system is simulated to validate the feasible solutions for continuous improvement purpose. A real case of a leading automotive company in Vietnam is investigated to illustrate the proposed system.

Authors: Binh Nguyen, Tung Doan Nguyen Tran, Son Huynh, An Le Tran-Hoai, An Nguyen Trong, Khanh Tran, Nhi Ho, Trung Nguyen and Dang Huynh.

Abstract: Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task.
This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.

Authors: Duy-Dong Le, Mohamed Saleem Haja Nazmudeen, Anh-Khoa Tran, Minh-Son Dao, Viet-Tiep Mai and Nhat-Ha Su.

Abstract: Air quality index forecast in big cities is an exciting study area in smart cities and health care in the Internet of Things. Many research papers on machine learning in air quality analysis have been published in empirical and academic and overview research in recent years. However, most of those studies are focused on traditional centralized data processing on a server, and there are few survey studies about federated learning applying to this field. This overview aims to provide newcomers with a broader perspective to inform future research on this topic, especially for the multi-model approach. We have examined over 70 carefully selected papers in this scope and discovered that multi-model federated learning is the most effective technique in air quality index prediction that needs to be considered and studied more soon.

Authors: Duc Long Vu, Van Su Pham, Minh Tuan Nguyen and Hai Chau Le.

Abstract: Sepsis is known as a life-threading status, which relates closely to the responses of the human body to an infection inside the tissues and organs. Such a reaction results in the distortion of the organ function. In this work, a novel algorithm is proposed for the diagnosis of pediatric sepsis including a random forest model and a combination of 9 genes. The proposed algorithm is constructed carefully with a sequential gene selection procedure, which combines pathway enrichment analysis and gene importance computed by the machine learning model to address the most informative differential gene expression. The cross-validation procedure in combination with different machine learning algorithms is adopted for the estimation of the diagnosis performance related to the gene combinations and machine learning models. The selected gene combinations are then tested separately using various machine learning methods. The validation results, which are the accuracy of 91.79%, sensitivity of 57.33%, and specificity of 100%, show that the proposed algorithm is potential for practical application in the real clinic environment.

Authors: Le Kim Thu and Le Sy Vinh.

Abstract: The evolutionary process of characters (e.g., nucleotides or amino acids) is heterogenous among sites of alignments. Applying the same evolutionary model for all sites leads to unreliable results in evolutionary studies. Partitioning alignments into sub-alignments (groups) such that sites in each sub-alignment follow the same model of evolution is a proper and promising approach to adequately handle the heterogeneity among sites. A number of computational methods have been proposed to partition alignments, however, they are unable to properly handle invariant sites. The iterative k-means algorithm is widely used to partition large alignments, unfortunately, recently suspended because it always groups all invariant sites into one group that might distort phylogenetic trees reconstructed from sub-alignments.
In this paper, we improve the iterative k-means algorithm for protein alignments by combining both amino acids and their secondary structures to properly partition invariant sites. The protein secondary structure information helps classify invariant sites into different groups each includes both variant and invariant sites. Experiments on real large protein alignments showed that the new algorithm overcomes the pitfall of grouping all invariant sites into one group and consequently produces better partitioning schemes.

Authors: Tu Le-Xuan, Trung Tran-Quang, Thi Ngoc Hien Doan and Thanh-Hai Tran.

Abstract: 3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both spatial and temporal dependencies into a single process. We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation. We thereby introduce two new objective functions, Angle and Direction loss, to take the hand structure into account. While Angle loss covers locally kinematic information, Direction loss handles globally kinematic one. Our LG-Hand achieves promising results on the First-Person Hand Action Benchmark (FPHAB) dataset. We also perform an ablation study to show the efficacy of the two proposed objective functions.

Authors: Joonchoul Shin, Wansu Kim, Jusang Lee, Jieun Park and Cheol Young Ock.

Abstract: In machine learning, the feature frequency in learning data can be used for a value of the feature, and in this case, sprase feature is likely to create overfitting problems in the weight optimization process. This is called sparse data problem, and this paper proposes a method that reduce the probability of weight update as the feature is sparse. We experimented with this method in four Natural Language Processing tasks, and the experiment results showed that this method had positive effects on all tasks. On average, this method had the effect of reducing 8 per 100 errors. Also it reduced the number of weight updates, therefore the learning time was reduced to 81% in Named Entity Recognition task.

Authors: Ngan Ha Duong and Thuy Anh Ta.

Abstract: In this paper, we study a facility cost optimization problem in a competitive market. Our objective is to distribute an available budget to some newly opened facilities to maximize an expected captured customer demand, assuming that customers will select a facility to visit according to a random utility maximization model. In this work, given the fact that the objective function of this problem is highly non-convex and challenging to solve exactly, we propose a technique to approximate the objective function by piece-wise linear functions, making it possible to reformulate the problem as a mixed-integer linear or conic program, which can further be solved by a commercial solver such as CPLEX. We also explore an outer-approximation algorithm to solve the approximate problem. Computational results are provided to demonstrate the performances of our approaches.

Authors: Viet Tran and Hung Pham.

Abstract: Concolic testing has been well-known among software quality assurance methods thanks to its fully automated capability of generating test data, executing them, and producing code coverage reports. This paper presents an improved method named ISDART for SDART, which is one of the most recent advanced methods based on concolic testing, to increase its performance. The key idea of the proposed method is to remove the waste time on generating and executing random test data which do not increase the code coverage. Initially, ISDART generates random test data only once. Then, with the code coverage information retrieved from the randomly generated test data, ISDART explores an uncovered test path, transforms them to test path constraints, solves those constraints, and generates a new test data from the resulting solution. The process is repeated until no uncovered test path can be found. We have implemented both SDART and ISDART and performed experiments with some common unit functions. The experimental results show that ISDART outperforms SDART in terms of speed for the whole testing process whilst reducing the number of generated test data.

Authors: Nhung CaoRadek Valasek and Stanislav Ozana.

Abstract: Several fuzzy concepts are involved in relational databases such as the degree of fulfilment of a graded property, the level of importance (or of possibility) of a component in a query, grouping features, or the concept of fuzzy quantifiers. We have recently approached the concepts of excluding features and unavoidable features to construct the extensions of fuzzy relational compositions. The extended compositions include the employment of fuzzy quantifiers as well. In this work, we approach the concept of importance levels of considered features in a particular sense that is intuitively suitable to the classification tasks. Then we propose a direction of incorporating this concept into the existing fuzzy relational compositions. We provide various useful properties related to the new models of the compositions. Furthermore, a simple example of the classification of animals in biology is addressed for the behaviour illustration of the proposed models. Finally, we examine the applicability of the new models to the practical application of the Dragonfly classification, which has been considered previously.

Authors: Suong N. Hoang, Binh Nguyen, Nam Nguyen, Son Luu, Trung Hieu Phan and Hien Nguyen.

Abstract: The explosion of free-text content on social media has brought the exponential propagation of hate speech. The definition of hate speech is well-defined in the community guidelines of many popular platforms such as Facebook, Tiktok, and Twitter, where any communication judges towards the minor, protected groups are considered hateful content. This paper first points out the sophisticated word-play of malicious users in a Vietnamese Hate Speech (VHS) Dataset. The Center Loss in the training process to disambiguate the task-based sentence embedding is proposed for improving generalizations of the model. Moreover, a task-based lexical attention pooling is also proposed to highlight lexicon-level information and then combined into sentence embedding. The experimental results show that the proposed method improves the F1 score in the ViHSD dataset, while the training time and inference speed are insignificantly changed.

Authors: Thuan Le Duc, Huong Pham Van, Hiep Hoang Van and Khanh Nguyen Kim.

Abstract: This study proposes a new approach for feature selection in the Android malware detection problem based on the popularity and contrast in a multi-target approach. The popularity of a feature is built on the frequency of each feature in the sample set. The contrast of features consists of two types: a contrast between malware and benign, and a contrast among malware classes. Obviously, the greater the contrast between classes of a feature, the higher the ability to classify based on this feature. There is a trade-off between the popularity and contrast of features, i.e., as popularity increases, contrast may decrease and vice versa. Therefore, to evaluate the global value of each feature, we use the global evaluation function (global measurement) according to the Pareto multi-objective approach. To evaluate the feature selection method, the selected feature is fed into a convolutional neural network (CNN) model, and test the model on a popular Android malware dataset, the AMD dataset. When we removed 1,000 features (500 permission features and 500 API features) accuracy decreased by 0.42%, and recall increased by 0.08%.

Authors: Minh Tuan Nguyen, Chi T.K. Huynh and Van Hop Nguyen.

Abstract: The objective of this paper is to propose a hybrid machine learning approach using a so-called Cost-Complexity Pruning Decision Trees algorithm in predicting supply chain risks, particularly, delayed deliveries. The Recursive Feature Elimination with Cross-Validation solution is designed to improve the feature selection function of the Decision Trees classifier. Then, the Two-Phase Cost-Complexity Pruning technique is developed to reduce the overfitting of the tree-based algorithms. A case study of an e-commerce enabler in Vietnam is investigated to illustrate the efficiency of the proposed models. The obtained results show promise in terms of predictive performance.

Authors: Tien Thanh Dam and Thuy Anh Ta.

Abstract: This work concerns a stochastic fractional 0-1 program whose coefficients are assumed to be random and follow a given distribution. To solve such a problem, one would need to sample over the randomness of the coefficients. However, in many situations, the sample size would be limited, which makes it difficult for existing approaches (e.g, the sample average approximation approach) to give good solutions. To deal with this issue, we explore a distributionally robust optimization version (DRO) of the fractional problem. We show that the DRO can be reformulated as an equivalent variance regularization version and can be further transformed into a mixed-integer second order cone program (MISOCP), for which an off-the-shelf solver (i.e., CPLEX) can handle. We, then, perform computational results comparing our robust method against the conventional sample average approximation (SAA), using synthetic instances. Our results show that our approach is more effective than the SAA approach in protecting the decision-maker against bad scenarios.

Authors: Hang Nguyen, Nam Tran and Bac Le.

Abstract: According to the World Health Organization (WHO), tuberculosis (TB) is the top disease deadly worldwide, especially in developing/underdeveloped countries, due to poverty and limited health resources. With severe effects on patient health and rapid spread, early screening for TB is a highly urgent task. Among the methods of diagnosing tuberculosis, chest X-ray images are often used as resources for clinical diagnosis because of their convenience and optimal cost. Currently, research on computer-aided diagnostics (CAD) systems uses machine learning to provide doctors with diagnostic, analytical, and disease-monitoring techniques. Recently, graph neural network has emerged as a research trend; works using GNN bring perfect accuracy in many fields. In this paper, a study is presented on a solution to automatically diagnose tuberculosis on X-ray images (CXR) using the graph neural network method. We classify the CRX dataset into two classes (TB and non-TB). We achieve encouraging results with the proposed model: the accuracy 99.33%, recall 99.33%, precision score 99.33%, F1-score 99.33%, AUC 99.97%.

Authors: Binh Nguyen, Bao Le Hoang, Hoang Nguyen Minh and Phuong Nhi Nguyen Kieu.

Abstract: Intelligent systems, especially smartphones, have become crucial parts of the world. These devices can solve various human tasks, from long-distance communication to healthcare assistants. For this tremendous success, customer feedback on a smartphone plays an integral role during the development process. This paper presents an improved approach for the Vietnamese Smartphone Feedback Dataset (UIT-ViSFD), collected and annotated carefully in 2021 (including 11,122 comments and their labels) by employing the pretrained PhoBERT model with a proper pre-processing method. In the experiments, we compare the approach with other transformer-based models such as XLM-R, DistilBERT, RoBERTa, and BERT. The experimental results show that the proposed method can bypass the state-of-the-art methods related to the UIT-ViSFD corpus. As a result, our model can achieve better macro-F1 scores for the Aspect and Sentiment Detection tasks, which are 78.76% and 86.03%, respectively. In addition, our approach could improve the results of Aspect-Based Sentiment Analysis datasets in the Vietnamese language.

Authors: Thanh Trong Vu and Hieu Dinh Vo.

Abstract: In order to ensure the quality of software and prevent attacks from hackers on critical systems, static analysis tools are frequently utilized to detect vulnerabilities in the early development phase. However, these tools often report a large number of warnings with a high false-positive rate, which causes many difficulties for developers. In this paper, we introduce VulRG, a novel approach to address this problem. Specifically, VulRG predicts and ranks the warnings based on their likelihoods to be true positive.To predict that likelihood, VulRG combines two deep learning models CNN and BiGRU to capture the context of each warning in terms of program syntax, control flow, and program dependence. Our experimental results on a real-world dataset of 6,620 warnings show that VulRG’s Recall at Top-50% is 90.9%. This means that using VulRG, 90% of the vulnerabilities can be found by examining only 50% of the warnings. Moreover, at Top-5%, VulRG can improve the state-of-the-art approach by +30% in both Precision and Recall.

Authors: Nam Kieu Dang, Oanh Nguyen Thi, Thuy Nguyen Thi, Hang Dao Viet, Long Dao Van, Trung Tran Quang and Sang Dinh Viet.

Abstract: The goal of the Unsupervised Domain Adaptation (UDA) is to transfer the knowledge of the model learned from a source domain with available labels to the target data domain without having access to labels. However, the performance of UDA can greatly suffer from the domain shift issue, which is caused by the unalignment of the two data distributions from the two data sources. Endoscopy can be performed under different light modes, including white-light imaging (WLI), blue-laser imaging (BLI), and LCI and Flexible spectral imaging color enhancement (FICE). However, most of the current polyp datasets are collected in the WLI mode since it is the most popular one in endoscopy. Therefore, AI models trained on such WLI datasets can strongly degrade when applied to other light modes. In order to address this issue, this paper proposes a coarse-to-fine UDA method that first coarsely aligns the two data distributions at the input level using the Fourier transform in chromatic space; then finely aligns them at the feature level using a fine-grained adversarial training. The backbone of our model is based on a powerful transformer architecture. Experimental results show that our proposed method effectively solves the domain shift issue and achieves a substantial performance improvement on cross-mode polyp segmentation for endoscopy.

Authors: Toan Pham Van, Sang Dinh Viet, Linh Doan Bao, Duc Tran Trung and Quan Nguyen Van.

Abstract: Semantic segmentation is an essential task in developing medical image diagnosis systems. However, building an annotated medical dataset is expensive. Thus, semi-supervised methods are significant in this circumstance. In semi-supervised learning, the quality of labels plays a crucial role in model performance. In this work, we present a new pseudo labeling strategy that enhances the quality of pseudo labels used for training student networks. We follow the multi-stage semi-supervised training approach, which trains a teacher model on a labeled dataset and then uses the trained teacher to render pseudo labels for student training. By doing so, the pseudo labels will be updated and more precise as training progress. The key difference between previous and our methods is that we update the teacher model during the student training process. So the quality of pseudo labels is improved during the student training process. We also propose a simple but effective strategy to enhance the quality of pseudo labels using a momentum model – a slow copy version of the original model during training. By applying the momentum model combined with re-rendering pseudo labels during student training, we achieved an average of 84.1% Dice Score on five datasets (i.e., Kvarsir, CVC-ClinicDB, ETIS-LaribPolypDB, CVC-ColonDB, and CVC-300) with only 20% of the dataset used as labeled data. Our results surpass common practice by 3% and even approach fully-supervised results on some datasets. Our source code and pre-trained models are available at

Authors: Binh Dang and Le-Minh Nguyen.

Abstract: Topic information has been useful to direct semantics in text summarization. In this paper, we present a study on a novel and efficient method to incorporate the topic information with BART model for abstractive summarization, which is called the tBART. The proposed model basically inherits the advantages of the BART, learns latent topics, and transfers the topic vector of tokens to context space by an align function. The experimental results illustrate the effectiveness of our proposed method which significantly outperforms previous methods on two benchmark datasets: XSUM and CNN/DAILY MAIL.

Authors: Xuan Hanh Vu, Xuan Dau Hoang and Thi Hong Hai Chu.

Abstract: Recently, DGA has been becoming a popular technique used by many malwares in general and botnets in particular. DGA allows hacking groups to automatically generate and register domain names for C&C servers of their bot-nets in order to avoid being blacklisted and disabled if using static domain names and IP addresses. Many types of sophisticated DGA techniques have been developed and used in practice, including character-based DGA, word-based DGA and mixed DGA. These techniques allow to generate from simple domain names of random combinations of characters, to complex domain names of combinations of meaningful words, which are very similar to legitimate domain names. This makes it difficult for solutions to monitor and detect botnets in general and DGA botnets in particular. Some solutions are able to efficiently detect character-based DGA domain names, but cannot detect word-based DGA and mixed DGA domain names. In contrast, some recent proposals can effectively detect word-based DGA domain names, but cannot effectively detect domain names of some character-based DGA botnets. This paper proposes a model based on ensemble learning that enables efficient detection of most DGA domain names, including character-based DGA and word-based DGA. The proposed model combines two component models, including the character-based DGA botnet detection model and the word-based DGA botnet detection model. The experimental results show that the proposed combined model is able to effectively detect 37/39 DGA botnet families with the average detection rate of over 89%.

Authors: Phuc-Thinh Nguyen, Mohamed Saleem Haja Nazmudeen and Minh-Son Dao.

Abstract: Regular exercise and scientific eating can support weight control and benefit everyone’s health, especially athletes. In recent years, although much research has been conducted in this field, only small groups of people were studied, and a few models revealed links between weight and speed attributes (e.g., activities, wellbeing, habits) to extract tips to assist people in controlling their weight and running speed. In this research, we propose an approach that uses pattern mining and correlation discovery techniques to discover the most optimal attributes over time to forecast the weight and speed of an athlete for a sports event. Furthermore, we propose Adaptive Learning Models, which can learn from personal and public data to forecast a person’s weight or speed in various age groups, such as young adults, middle-aged adults, and female or male members. Based on the above analysis, different approaches to building prediction models of athletes’ weight or running speed are being examined based on the primary data. Our suggested approach yields encouraging results when tested on public and private data sets

Authors: Anh Tu Tran, The Dung Luong, Xuan Sang Pham and Luong Tran Thi.

Abstract: The complexity of today’s web applications entails many security risks, mainly targeted attacks on zero-day vulnerabilities. New attacks often disable the detection capabilities of IDS and web application firewalls (WAFs) based on traditional pattern matching rules. Therefore, the need for new generation WAF systems using machine learning and deep learning technologies is urgent today. Deep learning models require a lot of input data to be able to train the models accurately. This leads to the problem of collecting and labeling data which is very resource-intensive. In addition, web request data is often sensitive or private and is not intended to be disclosed. This makes it challenging to develop high-accuracy deep learning and machine learning models. This paper proposes a privacy-preserving distributed training process for the web attacks detection deep learning model. The proposed model allows the participants to share the training process to improve the accuracy of the deep model for web attack detection while preserving the privacy of the local data and local model parameters. The proposed model uses the technique of adding noise to the shared parameter to ensure differential privacy. The participants will train the local detection model and share intermediate training parameters with some noise that increases the privacy of the training process. The results evaluated on the CSIC 2010 benchmark dataset show that the detection accuracy is greater than 98%, which is close to the model that does not guarantee the privacy and is much higher than the maximum accuracy of all local model which does not share data.

Authors: Win Shwe Sin Khine, Prarinya Siritanawan and Kazunori Kotani.

Abstract: Interpreting facial expressions is an important task for human beings since they convey their inner feelings through facial expressions. Then, facial expressions are significant visual signals to recognize human emotions. They are used in human communication and machines to build a good interaction system by analyzing human emotional behaviors. Therefore, facial expression recognition is an important study in Human-Computer Interaction. Compared to the past, which focused on traditional feature extraction methods for facial expression recognition, the current state of the arts emphasizes deep learning based approaches. The drawback of deep learning based methods is that they require a massive amount of data. Therefore, in this study, we apply transfer learning to the pre-trained deep learning models to recognize the facial expression and compare their results. The experiments were conducted on the Extended Cohn Kanade Facial Expressions Dataset (CK+), and approximately 92% accuracy in facial expression recognition was obtained with the EfficientNet B0 model. In addition, the discriminatory facial expression features of the model were reported.