Prev: 2021.01.21 Next: 2021.01.23

Summary for 2021-01-22, created on 2021-12-24

Censorship of Online Encyclopedias: Implications for NLP Models arxiv:2101.09294 📈 43

Eddie Yang, Margaret E. Roberts

**Abstract:** While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought to attention that the algorithms powering AI are not free of politics, stereotypes, and bias. While most work in this area has focused on the ways in which AI can exacerbate existing inequalities and discrimination, very little work has studied how governments actively shape training data. We describe how censorship has affected the development of Wikipedia corpuses, text data which are regularly used for pre-trained inputs into NLP algorithms. We show that word embeddings trained on Baidu Baike, an online Chinese encyclopedia, have very different associations between adjectives and a range of concepts about democracy, freedom, collective action, equality, and people and historical events in China than its regularly blocked but uncensored counterpart - Chinese language Wikipedia. We examine the implications of these discrepancies by studying their use in downstream AI applications. Our paper shows how government repression, censorship, and self-censorship may impact training data and the applications that draw from them.

Maximum Likelihood Training of Score-Based Diffusion Models arxiv:2101.09258 📈 28

Yang Song, Conor Durkan, Iain Murray, Stefano Ermon

**Abstract:** Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.

Tighter expected generalization error bounds via Wasserstein distance arxiv:2101.09315 📈 20

Borja Rodríguez-Gálvez, Germán Bassi, Ragnar Thobaben, Mikael Skoglund

**Abstract:** In this work, we introduce several expected generalization error bounds based on the Wasserstein distance. More precisely, we present full-dataset, single-letter, and random-subset bounds on both the standard setting and the randomized-subsample setting from Steinke and Zakynthinou [2020]. Moreover, we show that, when the loss function is bounded, these bounds recover from below (and thus are tighter than) current bounds based on the relative entropy and, for the standard setting, generate new, non-vacuous bounds also based on the relative entropy. Then, we show how similar bounds featuring the backward channel can be derived with the proposed proof techniques. Finally, we show how various new bounds based on different information measures (e.g., the lautum information or several $f$-divergences) can be derived from the presented bounds.

Outlining Traceability: A Principle for Operationalizing Accountability in Computing Systems arxiv:2101.09385 📈 19

Joshua A. Kroll

**Abstract:** Accountability is widely understood as a goal for well governed computer systems, and is a sought-after value in many governance contexts. But how can it be achieved? Recent work on standards for governable artificial intelligence systems offers a related principle: traceability. Traceability requires establishing not only how a system worked but how it was created and for what purpose, in a way that explains why a system has particular dynamics or behaviors. It connects records of how the system was constructed and what the system did mechanically to the broader goals of governance, in a way that highlights human understanding of that mechanical operation and the decision processes underlying it. We examine the various ways in which the principle of traceability has been articulated in AI principles and other policy documents from around the world, distill from these a set of requirements on software systems driven by the principle, and systematize the technologies available to meet those requirements. From our map of requirements to supporting tools, techniques, and procedures, we identify gaps and needs separating what traceability requires from the toolbox available for practitioners. This map reframes existing discussions around accountability and transparency, using the principle of traceability to show how, when, and why transparency can be deployed to serve accountability goals and thereby improve the normative fidelity of systems and their development processes.

A Review on Deep Learning in UAV Remote Sensing arxiv:2101.10861 📈 9

Lucas Prado Osco, José Marcato Junior, Ana Paula Marques Ramos, Lúcio André de Castro Jorge, Sarah Narges Fatholahi, Jonathan de Andrade Silva, Edson Takashi Matsubara, Hemerson Pistori, Wesley Nunes Gonçalves, Jonathan Li

**Abstract:** Deep Neural Networks (DNNs) learn representation from data with an impressive capability, and brought important breakthroughs for processing images, time-series, natural language, audio, video, and many others. In the remote sensing field, surveys and literature revisions specifically involving DNNs algorithms' applications have been conducted in an attempt to summarize the amount of information produced in its subfields. Recently, Unmanned Aerial Vehicles (UAV) based applications have dominated aerial sensing research. However, a literature revision that combines both "deep learning" and "UAV remote sensing" thematics has not yet been conducted. The motivation for our work was to present a comprehensive review of the fundamentals of Deep Learning (DL) applied in UAV-based imagery. We focused mainly on describing classification and regression techniques used in recent applications with UAV-acquired data. For that, a total of 232 papers published in international scientific journal databases was examined. We gathered the published material and evaluated their characteristics regarding application, sensor, and technique used. We relate how DL presents promising results and has the potential for processing tasks associated with UAV-based image data. Lastly, we project future perspectives, commentating on prominent DL paths to be explored in the UAV remote sensing field. Our revision consists of a friendly-approach to introduce, commentate, and summarize the state-of-the-art in UAV-based image applications with DNNs algorithms in diverse subfields of remote sensing, grouping it in the environmental, urban, and agricultural contexts.

Will Artificial Intelligence supersede Earth System and Climate Models? arxiv:2101.09126 📈 9

Christopher Irrgang, Niklas Boers, Maike Sonnewald, Elizabeth A. Barnes, Christopher Kadow, Joanna Staneva, Jan Saynisch-Wagner

**Abstract:** We outline a perspective of an entirely new research branch in Earth and climate sciences, where deep neural networks and Earth system models are dismantled as individual methodological approaches and reassembled as learning, self-validating, and interpretable Earth system model-network hybrids. Following this path, we coin the term "Neural Earth System Modelling" (NESYM) and highlight the necessity of a transdisciplinary discussion platform, bringing together Earth and climate scientists, big data analysts, and AI experts. We examine the concurrent potential and pitfalls of Neural Earth System Modelling and discuss the open question whether artificial intelligence will not only infuse Earth system modelling, but ultimately render them obsolete.

Bayesian hierarchical stacking: Some models are (somewhere) useful arxiv:2101.08954 📈 9

Yuling Yao, Gregor Pirš, Aki Vehtari, Andrew Gelman

**Abstract:** Stacking is a widely used model averaging technique that asymptotically yields optimal predictions among linear averages. We show that stacking is most effective when model predictive performance is heterogeneous in inputs, and we can further improve the stacked mixture with a hierarchical model. We generalize stacking to Bayesian hierarchical stacking. The model weights are varying as a function of data, partially-pooled, and inferred using Bayesian inference. We further incorporate discrete and continuous inputs, other structured priors, and time series and longitudinal data. To verify the performance gain of the proposed method, we derive theory bounds, and demonstrate on several applied problems.

SGD-Net: Efficient Model-Based Deep Learning with Theoretical Guarantees arxiv:2101.09379 📈 8

Jiaming Liu, Yu Sun, Weijie Gan, Xiaojian Xu, Brendt Wohlberg, Ulugbek S. Kamilov

**Abstract:** Deep unfolding networks have recently gained popularity in the context of solving imaging inverse problems. However, the computational and memory complexity of data-consistency layers within traditional deep unfolding networks scales with the number of measurements, limiting their applicability to large-scale imaging inverse problems. We propose SGD-Net as a new methodology for improving the efficiency of deep unfolding through stochastic approximations of the data-consistency layers. Our theoretical analysis shows that SGD-Net can be trained to approximate batch deep unfolding networks to an arbitrary precision. Our numerical results on intensity diffraction tomography and sparse-view computed tomography show that SGD-Net can match the performance of the batch network at a fraction of training and testing complexity.

Streaming Models for Joint Speech Recognition and Translation arxiv:2101.09149 📈 8

Orion Weller, Matthias Sperber, Christian Gollan, Joris Kluivers

**Abstract:** Using end-to-end models for speech translation (ST) has increasingly been the focus of the ST community. These models condense the previously cascaded systems by directly converting sound waves into translated text. However, cascaded models have the advantage of including automatic speech recognition output, useful for a variety of practical ST systems that often display transcripts to the user alongside the translations. To bridge this gap, recent work has shown initial progress into the feasibility for end-to-end models to produce both of these outputs. However, all previous work has only looked at this problem from the consecutive perspective, leaving uncertainty on whether these approaches are effective in the more challenging streaming setting. We develop an end-to-end streaming ST model based on a re-translation approach and compare against standard cascading approaches. We also introduce a novel inference method for the joint case, interleaving both transcript and translation in generation and removing the need to use separate decoders. Our evaluation across a range of metrics capturing accuracy, latency, and consistency shows that our end-to-end models are statistically similar to cascading models, while having half the number of parameters. We also find that both systems provide strong translation quality at low latency, keeping 99% of consecutive quality at a lag of just under a second.

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT arxiv:2101.09115 📈 8

Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, Mitesh M. Khapra

**Abstract:** Multi-headed attention heads are a mainstay in transformer-based models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. These roles include syntactic (tokens with some syntactic relation), local (nearby tokens), block (tokens in the same sentence) and delimiter (the special [CLS], [SEP] tokens). There are two main challenges with existing methods for classification: (a) there are no standard scores across studies or across functional roles, and (b) these scores are often average quantities measured across sentences without capturing statistical significance. In this work, we formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference. This provides us the right lens to systematically analyze attention heads and confidently comment on many commonly posed questions on analyzing the BERT model. In particular, we comment on the co-location of multiple functional roles in the same attention head, the distribution of attention heads across layers, and effect of fine-tuning for specific NLP tasks on these functional roles.

Enhanced word embeddings using multi-semantic representation through lexical chains arxiv:2101.09023 📈 8

Terry Ruas, Charles Henrique Porto Ferreira, William Grosky, Fabrício Olivetti de França, Débora Maria Rossi Medeiros

**Abstract:** The relationship between words in a sentence often tells us more about the underlying semantic content of a document than its actual words, individually. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II. These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system. In short, our approach has three main contributions: (i) a set of techniques that fully integrate word embeddings and lexical chains; (ii) a more robust semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. We intend to assess the knowledge of pre-trained models to evaluate their robustness in the document classification task. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.

Slot Self-Attentive Dialogue State Tracking arxiv:2101.09374 📈 6

Fanghua Ye, Jarana Manotumruksa, Qiang Zhang, Shenghui Li, Emine Yilmaz

**Abstract:** An indispensable component in task-oriented dialogue systems is the dialogue state tracker, which keeps track of users' intentions in the course of conversation. The typical approach towards this goal is to fill in multiple pre-defined slots that are essential to complete the task. Although various dialogue state tracking methods have been proposed in recent years, most of them predict the value of each slot separately and fail to consider the correlations among slots. In this paper, we propose a slot self-attention mechanism that can learn the slot correlations automatically. Specifically, a slot-token attention is first utilized to obtain slot-specific features from the dialogue context. Then a stacked slot self-attention is applied on these features to learn the correlations among slots. We conduct comprehensive experiments on two multi-domain task-oriented dialogue datasets, including MultiWOZ 2.0 and MultiWOZ 2.1. The experimental results demonstrate that our approach achieves state-of-the-art performance on both datasets, verifying the necessity and effectiveness of taking slot correlations into consideration.

Selfish Sparse RNN Training arxiv:2101.09048 📈 6

Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy

**Abstract:** Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained dense network (dense-to-sparse training) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense model (sparse-to-sparse training), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in the Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train intrinsically sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell gates for better regularization. Further, we propose SNT-ASGD, a novel variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results, better than the dense-to-sparse methods, with various types of RNNs on Penn TreeBank and Wikitext-2 datasets. Our codes are available at https://github.com/Shiweiliuiiiiiii/Selfish-RNN.

Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review arxiv:2102.04216 📈 5

Anusha Bompelli, Yanshan Wang, Ruyuan Wan, Esha Singh, Yuqi Zhou, Lin Xu, David Oniani, Bhavani Singh Agnikula Kshatriya, Joyce, E. Balls-Berry, Rui Zhang

**Abstract:** Background: There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been little research into how to make the most of SBDH information from EHRs. Methods: A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided. Results: Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, and several NLP approaches for extracting SDOH from clinical literature. Discussion: The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using Natural Language Processing (NLP) technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues. Conclusion: Despite known associations between SBDH and disease, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, and ultimately promoting health and health equity. Keywords: Social and Behavioral Determinants of Health, Artificial Intelligence, Electronic Health Records, Natural Language Processing, Predictive Model

Machine Learning in LiDAR 3D point clouds arxiv:2101.09318 📈 5

F. Patricia Medina, Randy Paffenroth

**Abstract:** LiDAR point clouds contain measurements of complicated natural scenes and can be used to update digital elevation models, glacial monitoring, detecting faults and measuring uplift detecting, forest inventory, detect shoreline and beach volume changes, landslide risk analysis, habitat mapping, and urban development, among others. A very important application is the classification of the 3D cloud into elementary classes. For example, it can be used to differentiate between vegetation, man-made structures, and water. Our goal is to present a preliminary comparison study for the classification of 3D point cloud LiDAR data that includes several types of feature engineering. In particular, we demonstrate that providing context by augmenting each point in the LiDAR point cloud with information about its neighboring points can improve the performance of downstream learning algorithms. We also experiment with several dimension reduction strategies, ranging from Principal Component Analysis (PCA) to neural network-based auto-encoders, and demonstrate how they affect classification performance in LiDAR point clouds. For instance, we observe that combining feature engineering with a dimension reduction a method such as PCA, there is an improvement in the accuracy of the classification with respect to doing a straightforward classification with the raw data.

Understanding Visual Saliency in Mobile User Interfaces arxiv:2101.09176 📈 5

Luis A. Leiva, Yunfei Xue, Avya Bansal, Hamed R. Tavakoli, Tuğçe Köroğlu, Niraj R. Dayama, Antti Oulasvirta

**Abstract:** For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.

Adversarial Laws of Large Numbers and Optimal Regret in Online Classification arxiv:2101.09054 📈 5

Noga Alon, Omri Ben-Eliezer, Yuval Dagan, Shay Moran, Moni Naor, Eylon Yogev

**Abstract:** Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are \emph{online learnable}. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of \emph{Littlestone's dimension}, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015).

Image Restoration by Solving IVP arxiv:2101.08987 📈 5

Seobin Park, Tae Hyun Kim

**Abstract:** Recent research on image restoration have achieved great success with the aid of deep learning technologies, but, many of them are limited to dealing SR with realistic settings. To alleviate this problem, we introduce a new formulation for image super-resolution to solve arbitrary scale image super-resolution methods. Based on the proposed new SR formulation, we can not only super-resolve images with multiple scales, but also find a new way to analyze the performance of super-resolving process. We demonstrate that the proposed method can generate high-quality images unlike conventional SR methods.

Learning Setup Policies: Reliable Transition Between Locomotion Behaviours arxiv:2101.09391 📈 4

Brendan Tidd, Nicolas Hudson, Akansel Cosgun, Jurgen Leitner

**Abstract:** Dynamic platforms that operate over manyunique terrain conditions typically require multiple controllers.To transition safely between controllers, there must be anoverlap of states between adjacent controllers. We developa novel method for training Setup Policies that bridge thetrajectories between pre-trained Deep Reinforcement Learning(DRL) policies. We demonstrate our method with a simulatedbiped traversing a difficult jump terrain, where a single policyfails to learn the task, and switching between pre-trainedpolicies without Setup Policies also fails. We perform anablation of key components of our system, and show thatour method outperforms others that learn transition policies.We demonstrate our method with several difficult and diverseterrain types, and show that we can use Setup Policies as partof a modular control suite to successfully traverse a sequence ofcomplex terrains. We show that using Setup Policies improvesthe success rate for traversing a single difficult jump terrain(from 1.5%success rate without Setup Policies to 82%), and asequence of various terrains (from 6.5%without Setup Policiesto 29.1%).

Automatic Cerebral Vessel Extraction in TOF-MRA Using Deep Learning arxiv:2101.09253 📈 4

V. de Vos, K. M. Timmins, I. C. van der Schaaf, Y. Ruigrok, B. K. Velthuis, H. J. Kuijf

**Abstract:** Deep learning approaches may help radiologists in the early diagnosis and timely treatment of cerebrovascular diseases. Accurate cerebral vessel segmentation of Time-of-Flight Magnetic Resonance Angiographs (TOF-MRAs) is an essential step in this process. This study investigates deep learning approaches for automatic, fast and accurate cerebrovascular segmentation for TOF-MRAs. The performance of several data augmentation and selection methods for training a 2D and 3D U-Net for vessel segmentation was investigated in five experiments: a) without augmentation, b) Gaussian blur, c) rotation and flipping, d) Gaussian blur, rotation and flipping and e) different input patch sizes. All experiments were performed by patch-training both a 2D and 3D U-Net and predicted on a test set of MRAs. Ground truth was manually defined using an interactive threshold and region growing method. The performance was evaluated using the Dice Similarity Coefficient (DSC), Modified Hausdorff Distance and Volumetric Similarity, between the predicted images and the interactively defined ground truth. The segmentation performance of all trained networks on the test set was found to be good, with DSC scores ranging from 0.72 to 0.83. Both the 2D and 3D U-Net had the best segmentation performance with Gaussian blur, rotation and flipping compared to other experiments without augmentation or only one of those augmentation techniques. Additionally, training on larger patches or slices gave optimal segmentation results. In conclusion, vessel segmentation can be optimally performed on TOF-MRAs using a trained 3D U-Net on larger patches, where data augmentation including Gaussian blur, rotation and flipping was performed on the training data.

Nigraha: Machine-learning based pipeline to identify and evaluate planet candidates from TESS arxiv:2101.09227 📈 4

Sriram Rao, Ashish Mahabal, Niyanth Rao, Cauligi Raghavendra

**Abstract:** The Transiting Exoplanet Survey Satellite (TESS) has now been operational for a little over two years, covering the Northern and the Southern hemispheres once. The TESS team processes the downlinked data using the Science Processing Operations Center pipeline and Quick Look pipeline to generate alerts for follow-up. Combined with other efforts from the community, over two thousand planet candidates have been found of which tens have been confirmed as planets. We present our pipeline, Nigraha, that is complementary to these approaches. Nigraha uses a combination of transit finding, supervised machine learning, and detailed vetting to identify with high confidence a few planet candidates that were missed by prior searches. In particular, we identify high signal to noise ratio (SNR) shallow transits that may represent more Earth-like planets. In the spirit of open data exploration we provide details of our pipeline, release our supervised machine learning model and code as open source, and make public the 38 candidates we have found in seven sectors. The model can easily be run on other sectors as is. As part of future work we outline ways to increase the yield by strengthening some of the steps where we have been conservative and discarded objects for lack of a datum or two.

It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports arxiv:2101.09194 📈 4

Nathan Cooper, Carlos Bernal-Cárdenas, Oscar Chaparro, Kevin Moran, Denys Poshyvanyk

**Abstract:** When a bug manifests in a user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given the importance of visual information to the process of identifying and understanding such bugs, users are increasingly making use of screenshots and screen-recordings as a means to report issues to developers. However, when such information is reported en masse, such as during crowd-sourced testing, managing these artifacts can be a time-consuming process. As the reporting of screen-recordings in particular becomes more popular, developers are likely to face challenges related to manually identifying videos that depict duplicate bugs. Due to their graphical nature, screen-recordings present challenges for automated analysis that preclude the use of current duplicate bug report detection techniques. To overcome these challenges and aid developers in this task, this paper presents Tango, a duplicate detection technique that operates purely on video-based bug reports by leveraging both visual and textual information. Tango combines tailored computer vision techniques, optical character recognition, and text retrieval. We evaluated multiple configurations of Tango in a comprehensive empirical evaluation on 4,860 duplicate detection tasks that involved a total of 180 screen-recordings from six Android apps. Additionally, we conducted a user study investigating the effort required for developers to manually detect duplicate video-based bug reports and compared this to the effort required to use Tango. The results reveal that Tango's optimal configuration is highly effective at detecting duplicate video-based bug reports, accurately ranking target duplicate videos in the top-2 returned results in 83% of the tasks. Additionally, our user study shows that, on average, Tango can reduce developer effort by over 60%, illustrating its practicality.

A multi-perspective combined recall and rank framework for Chinese procedure terminology normalization arxiv:2101.09101 📈 4

Ming Liang, Kui Xue, Tong Ruan

**Abstract:** Medical terminology normalization aims to map the clinical mention to terminologies come from a knowledge base, which plays an important role in analyzing Electronic Health Record(EHR) and many downstream tasks. In this paper, we focus on Chinese procedure terminology normalization. The expression of terminologies are various and one medical mention may be linked to multiple terminologies. Previous study explores some methods such as multi-class classification or learning to rank(LTR) to sort the terminologies by literature and semantic information. However, these information is inadequate to find the right terminologies, particularly in multi-implication cases. In this work, we propose a combined recall and rank framework to solve the above problems. This framework is composed of a multi-task candidate generator(MTCG), a keywords attentive ranker(KAR) and a fusion block(FB). MTCG is utilized to predict the mention implication number and recall candidates with semantic similarity. KAR is based on Bert with a keywords attentive mechanism which focuses on keywords such as procedure sites and procedure types. FB merges the similarity come from MTCG and KAR to sort the terminologies from different perspectives. Detailed experimental analysis shows our proposed framework has a remarkable improvement on both performance and efficiency.

B-DRRN: A Block Information Constrained Deep Recursive Residual Network for Video Compression Artifacts Reduction arxiv:2101.09021 📈 4

Trinh Man Hoang, Jinjia Zhou

**Abstract:** Although the video compression ratio nowadays becomes higher, the video coders such as H.264/AVC, H.265/HEVC, H.266/VVC always suffer from the video artifacts. In this paper, we design a neural network to enhance the quality of the compressed frame by leveraging the block information, called B-DRRN (Deep Recursive Residual Network with Block information). Firstly, an extra network branch is designed for leveraging the block information of the coding unit (CU). Moreover, to avoid a great increase in the network size, Recursive Residual structure and sharing weight techniques are applied. We also conduct a new large-scale dataset with 209,152 training samples. Experimental results show that the proposed B-DRRN can reduce 6.16% BD-rate compared to the HEVC standard. After efficiently adding an extra network branch, this work can improve the performance of the main network without increasing any memory for storing.

On managing vulnerabilities in AI/ML systems arxiv:2101.10865 📈 3

Jonathan M. Spring, April Galyardt, Allen D. Householder, Nathan VanHoudnos

**Abstract:** This paper explores how the current paradigm of vulnerability management might adapt to include machine learning systems through a thought experiment: what if flaws in machine learning (ML) were assigned Common Vulnerabilities and Exposures (CVE) identifiers (CVE-IDs)? We consider both ML algorithms and model objects. The hypothetical scenario is structured around exploring the changes to the six areas of vulnerability management: discovery, report intake, analysis, coordination, disclosure, and response. While algorithm flaws are well-known in the academic research community, there is no apparent clear line of communication between this research community and the operational communities that deploy and manage systems that use ML. The thought experiments identify some ways in which CVE-IDs may establish some useful lines of communication between these two communities. In particular, it would start to introduce the research community to operational security concepts, which appears to be a gap left by existing efforts.

Expression Recognition Analysis in the Wild arxiv:2101.09231 📈 3

Donato Cafarelli, Fabio Valerio Massoli, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

**Abstract:** Facial Expression Recognition(FER) is one of the most important topic in Human-Computer interactions(HCI). In this work we report details and experimental results about a facial expression recognition method based on state-of-the-art methods. We fine-tuned a SeNet deep learning architecture pre-trained on the well-known VGGFace2 dataset, on the AffWild2 facial expression recognition dataset. The main goal of this work is to define a baseline for a novel method we are going to propose in the near future. This paper is also required by the Affective Behavior Analysis in-the-wild (ABAW) competition in order to evaluate on the test set this approach. The results reported here are on the validation set and are related on the Expression Challenge part (seven basic emotion recognition) of the competition. We will update them as soon as the actual results on the test set will be published on the leaderboard.

Sparsistent filtering of comovement networks from high-dimensional data arxiv:2101.09174 📈 3

Arnab Chakrabarti, Anindya S. Chakrabarti

**Abstract:** Network filtering is an important form of dimension reduction to isolate the core constituents of large and interconnected complex systems. We introduce a new technique to filter large dimensional networks arising out of dynamical behavior of the constituent nodes, exploiting their spectral properties. As opposed to the well known network filters that rely on preserving key topological properties of the realized network, our method treats the spectrum as the fundamental object and preserves spectral properties. Applying asymptotic theory for high dimensional data for the filter, we show that it can be tuned to interpolate between zero filtering to maximal filtering that induces sparsity and consistency while having the least spectral distance from a linear shrinkage estimator. We apply our proposed filter to covariance networks constructed from financial data, to extract the key subnetwork embedded in the full sample network.

Lexical semantic change for Ancient Greek and Latin arxiv:2101.09069 📈 3

Valerio Perrone, Simon Hengchen, Marco Palma, Alessandro Vatri, Jim Q. Smith, Barbara McGillivray

**Abstract:** Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word's correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Greek and Latin, typically come with rich metadata, and existing models are limited by their inability to exploit contextual information beyond the document timestamp. While embedding-based methods feature among the current state of the art systems, they are lacking in the interpretative power. In contrast, Bayesian models provide explicit and interpretable representations of semantic change phenomena. In this chapter we build on GASC, a recent computational approach to semantic change based on a dynamic Bayesian mixture model. In this model, the evolution of word senses over time is based not only on distributional information of lexical nature, but also on text genres. We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models. On top of providing a full description of meaning change over time, we show that Bayesian mixture models are highly competitive approaches to detect binary semantic change in both Ancient Greek and Latin.

Rethinking Domain Generalization Baselines arxiv:2101.09060 📈 3

Francesco Cappio Borlino, Antonio D'Innocente, Tatiana Tommasi

**Abstract:** Despite being very powerful in standard learning settings, deep learning models can be extremely brittle when deployed in scenarios different from those on which they were trained. Domain generalization methods investigate this problem and data augmentation strategies have shown to be helpful tools to increase data variability, supporting model robustness across domains. In our work we focus on style transfer data augmentation and we present how it can be implemented with a simple and inexpensive strategy to improve generalization. Moreover, we analyze the behavior of current state of the art domain generalization methods when integrated with this augmentation solution: our thorough experimental evaluation shows that their original effect almost always disappears with respect to the augmented baseline. This issue open new scenarios for domain generalization research, highlighting the need of novel methods properly able to take advantage of the introduced data variability.

DSAL: Deeply Supervised Active Learning from Strong and Weak Labelers for Biomedical Image Segmentation arxiv:2101.09057 📈 3

Ziyuan Zhao, Zeng Zeng, Kaixin Xu, Cen Chen, Cuntai Guan

**Abstract:** Image segmentation is one of the most essential biomedical image processing problems for different imaging modalities, including microscopy and X-ray in the Internet-of-Medical-Things (IoMT) domain. However, annotating biomedical images is knowledge-driven, time-consuming, and labor-intensive, making it difficult to obtain abundant labels with limited costs. Active learning strategies come into ease the burden of human annotation, which queries only a subset of training data for annotation. Despite receiving attention, most of active learning methods generally still require huge computational costs and utilize unlabeled data inefficiently. They also tend to ignore the intermediate knowledge within networks. In this work, we propose a deep active semi-supervised learning framework, DSAL, combining active learning and semi-supervised learning strategies. In DSAL, a new criterion based on deep supervision mechanism is proposed to select informative samples with high uncertainties and low uncertainties for strong labelers and weak labelers respectively. The internal criterion leverages the disagreement of intermediate features within the deep learning network for active sample selection, which subsequently reduces the computational costs. We use the proposed criteria to select samples for strong and weak labelers to produce oracle labels and pseudo labels simultaneously at each active learning iteration in an ensemble learning manner, which can be examined with IoMT Platform. Extensive experiments on multiple medical image datasets demonstrate the superiority of the proposed method over state-of-the-art active learning methods.

Baseline Pruning-Based Approach to Trojan Detection in Neural Networks arxiv:2101.12016 📈 2

Peter Bajcsy, Michael Majurski

**Abstract:** This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing systematically pruned NN models. Our pruning-based approach consists of three main steps. First, detect any deviations from the reference look-up tables of model file sizes and model graphs. Next, measure the accuracy of a set of systematically pruned NN models following multiple pruning schemas. Finally, classify a NN model as clean or poisoned by applying a mapping between accuracy measurements and NN model labels. This work outlines a theoretical and experimental framework for finding the optimal mapping over a large search space of pruning parameters. Based on our experiments using Round 1 and Round 2 TrojAI Challenge datasets, the approach achieves average classification accuracy of 69.73 % and 82.41% respectively with an average processing time of less than 60 s per model. For both datasets random guessing would produce 50% classification accuracy. Reference model graphs and source code are available from GitHub.

Using Finite-State Machines to Automatically Scan Classical Greek Hexameter arxiv:2101.11437 📈 2

Anne-Kathrin Schumann, Christoph Beierle, Norbert Blößner

**Abstract:** This paper presents a fully automatic approach to the scansion of Classical Greek hexameter verse. In particular, the paper describes an algorithm that uses deterministic finite-state automata and local linguistic rules to implement a targeted search for valid spondeus patterns and, in addition, a weighted finite-state transducer to correct and complete partial analyses and to reject invalid candidates. The paper also details the results of an empirical evaluation of the annotation quality resulting from this approach on hand-annotated data. It is shown that a finite-state approach provides quick and linguistically sound analyses of hexameter verses as well as an efficient formalisation of linguistic knowledge. The project code is available (see https://github.com/anetschka/greek_scansion).

Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature arxiv:2101.10537 📈 2

Joseph Marvin Imperial, Ethel Ong

**Abstract:** Proper identification of grade levels of children's reading materials is an important step towards effective learning. Recent studies in readability assessment for the English domain applied modern approaches in natural language processing (NLP) such as machine learning (ML) techniques to automate the process. There is also a need to extract the correct linguistic features when modeling readability formulas. In the context of the Filipino language, limited work has been done [1, 2], especially in considering the language's lexical complexity as main features. In this paper, we explore the use of lexical features towards improving the development of readability identification of children's books written in Filipino. Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) used by previous works such as sentence length, average syllable length, polysyllabic words, word, sentence, and phrase counts increased the performance of readability models by almost a 5% margin (from 42% to 47.2%). Further analysis and ranking of the most important features were shown to identify which features contribute the most in terms of reading complexity.

Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning arxiv:2101.09192 📈 2

Dariush Bahrami, Sadegh Pouriyan Zadeh

**Abstract:** We introduce Gravity, another algorithm for gradient-based optimization. In this paper, we explain how our novel idea change parameters to reduce the deep learning model's loss. It has three intuitive hyper-parameters that the best values for them are proposed. Also, we propose an alternative to moving average. To compare the performance of the Gravity optimizer with two common optimizers, Adam and RMSProp, five standard datasets were trained on two VGGNet models with a batch size of 128 for 100 epochs. Gravity hyper-parameters did not need to be tuned for different models. As will be explained more in the paper, to investigate the direct impact of the optimizer itself on loss reduction no overfitting prevention technique was used. The obtained results show that the Gravity optimizer has more stable performance than Adam and RMSProp and gives greater values of validation accuracy for datasets with more output classes like CIFAR-100 (Fine).

Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid Beamforming Design arxiv:2101.09137 📈 2

Chongwen Huang, Zhaohui Yang, George C. Alexandropoulos, Kai Xiong, Li Wei, Chau Yuen, Zhaoyang Zhang, Merouane Debbah

**Abstract:** Wireless communication in the TeraHertz band (0.1--10 THz) is envisioned as one of the key enabling technologies for the future sixth generation (6G) wireless communication systems scaled up beyond massive multiple input multiple output (Massive-MIMO) technology. However, very high propagation attenuations and molecular absorptions of THz frequencies often limit the signal transmission distance and coverage range. Benefited from the recent breakthrough on the reconfigurable intelligent surfaces (RIS) for realizing smart radio propagation environment, we propose a novel hybrid beamforming scheme for the multi-hop RIS-assisted communication networks to improve the coverage range at THz-band frequencies. Particularly, multiple passive and controllable RISs are deployed to assist the transmissions between the base station (BS) and multiple single-antenna users. We investigate the joint design of digital beamforming matrix at the BS and analog beamforming matrices at the RISs, by leveraging the recent advances in deep reinforcement learning (DRL) to combat the propagation loss. To improve the convergence of the proposed DRL-based algorithm, two algorithms are then designed to initialize the digital beamforming and the analog beamforming matrices utilizing the alternating optimization technique. Simulation results show that our proposed scheme is able to improve 50\% more coverage range of THz communications compared with the benchmarks. Furthermore, it is also shown that our proposed DRL-based method is a state-of-the-art method to solve the NP-hard beamforming problem, especially when the signals at RIS-assisted THz communication networks experience multiple hops.

A Universal Deep Learning Framework for Real-Time Denoising of Ultrasound Images arxiv:2101.09122 📈 2

Simone Cammarasana, Paolo Nicolardi, Giuseppe Patanè

**Abstract:** Ultrasound images are widespread in medical diagnosis for muscle-skeletal, cardiac, and obstetrical diseases, due to the efficiency and non-invasiveness of the acquisition methodology. However, ultrasound acquisition introduces a speckle noise in the signal, that corrupts the resulting image and affects further processing operations, and the visual analysis that medical experts conduct to estimate patient diseases. Our main goal is to define a universal deep learning framework for real-time denoising of ultrasound images. We analyse and compare state-of-the-art methods for the smoothing of ultrasound images (e.g., spectral, low-rank, and deep learning denoising algorithms), in order to select the best one in terms of accuracy, preservation of anatomical features, and computational cost. Then, we propose a tuned version of the selected state-of-the-art denoising methods (e.g., WNNM), to improve the quality of the denoised images, and extend its applicability to ultrasound images. To handle large data sets of ultrasound images with respect to applications and industrial requirements, we introduce a denoising framework that exploits deep learning and HPC tools, and allows us to replicate the results of state-of-the-art denoising methods in a real-time execution.

Query Abandonment Prediction with Recurrent Neural Models of Mouse Cursor Movements arxiv:2101.09066 📈 2

Lukas Brückner, Ioannis Arapakis, Luis A. Leiva

**Abstract:** Most successful search queries do not result in a click if the user can satisfy their information needs directly on the SERP. Modeling query abandonment in the absence of click-through data is challenging because search engines must rely on other behavioral signals to understand the underlying search intent. We show that mouse cursor movements make a valuable, low-cost behavioral signal that can discriminate good and bad abandonment. We model mouse movements on SERPs using recurrent neural nets and explore several data representations that do not rely on expensive hand-crafted features and do not depend on a particular SERP structure. We also experiment with data resampling and augmentation techniques that we adopt for sequential data. Our results can help search providers to gauge user satisfaction for queries without clicks and ultimately contribute to a better understanding of search engine performance.

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations arxiv:2101.09056 📈 2

Barry Smyth, Mark T Keane

**Abstract:** Counterfactual explanations provide a potentially significant solution to the Explainable AI (XAI) problem, but good, native counterfactuals have been shown to rarely occur in most datasets. Hence, the most popular methods generate synthetic counterfactuals using blind perturbation. However, such methods have several shortcomings: the resulting counterfactuals (i) may not be valid data-points (they often use features that do not naturally occur), (ii) may lack the sparsity of good counterfactuals (if they modify too many features), and (iii) may lack diversity (if the generated counterfactuals are minimal variants of one another). We describe a method designed to overcome these problems, one that adapts native counterfactuals in the original dataset, to generate sparse, diverse synthetic counterfactuals from naturally occurring features. A series of experiments are reported that systematically explore parametric variations of this novel method on common datasets to establish the conditions for optimal performance.

CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text arxiv:2101.09004 📈 2

Suman Dowlagar, Radhika Mamidi

**Abstract:** Code-mixing(CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. CM is mostly practiced on various social media platforms and in informal conversations. Sentiment analysis (SA) is a fundamental step in NLP and is well studied in the monolingual text. Code-mixing adds a challenge to sentiment analysis due to its non-standard representations. This paper proposes a meta embedding with a transformer method for sentiment analysis on the Dravidian code-mixed dataset. In our method, we used meta embeddings to capture rich text representations. We used the proposed method for the Task: "Sentiment Analysis for Dravidian Languages in Code-Mixed Text", and it achieved an F1 score of $0.58$ and $0.66$ for the given Dravidian code mixed data sets. The code is provided in the Github https://github.com/suman101112/fire-2020-Dravidian-CodeMix.

Linear Regression with Distributed Learning: A Generalization Error Perspective arxiv:2101.09001 📈 2

Martin Hellkvist, Ayça Özçelikkale, Anders Ahlén

**Abstract:** Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear regression where the model parameters, i.e., the unknowns, are distributed over the network. We adopt a statistical learning approach. In contrast to works that focus on the performance on the training data, we focus on the generalization error, i.e., the performance on unseen data. We provide high-probability bounds on the generalization error for both isotropic and correlated Gaussian data as well as sub-gaussian data. These results reveal the dependence of the generalization performance on the partitioning of the model over the network. In particular, our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution even when the error on the training data is at the same level for both the centralized and distributed approaches. Our numerical results illustrate the performance with both real-world image data as well as synthetic data.

Automatic Volumetric Segmentation of Additive Manufacturing Defects with 3D U-Net arxiv:2101.08993 📈 2

Vivian Wen Hui Wong, Max Ferguson, Kincho H. Law, Yung-Tsun Tina Lee, Paul Witherell

**Abstract:** Segmentation of additive manufacturing (AM) defects in X-ray Computed Tomography (XCT) images is challenging, due to the poor contrast, small sizes and variation in appearance of defects. Automatic segmentation can, however, provide quality control for additive manufacturing. Over recent years, three-dimensional convolutional neural networks (3D CNNs) have performed well in the volumetric segmentation of medical images. In this work, we leverage techniques from the medical imaging domain and propose training a 3D U-Net model to automatically segment defects in XCT images of AM samples. This work not only contributes to the use of machine learning for AM defect detection but also demonstrates for the first time 3D volumetric segmentation in AM. We train and test with three variants of the 3D U-Net on an AM dataset, achieving a mean intersection of union (IOU) value of 88.4%.

Adaptively Sparse Regularization for Blind Image Restoration arxiv:2101.09401 📈 1

Ningshan Xu

**Abstract:** Image quality is the basis of image communication and understanding tasks. Due to the blur and noise effects caused by imaging, transmission and other processes, the image quality is degraded. Blind image restoration is widely used to improve image quality, where the main goal is to faithfully estimate the blur kernel and the latent sharp image. In this study, based on experimental observation and research, an adaptively sparse regularized minimization method is originally proposed. The high-order gradients combine with low-order ones to form a hybrid regularization term, and an adaptive operator derived from the image entropy is introduced to maintain a good convergence. Extensive experiments were conducted on different blur kernels and images. Compared with existing state-of-the-art blind deblurring methods, our method demonstrates superiority on the recovery accuracy.

On the Local Linear Rate of Consensus on the Stiefel Manifold arxiv:2101.09346 📈 1

Shixiang Chen, Alfredo Garcia, Mingyi Hong, Shahin Shahrampour

**Abstract:** We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local linear convergence rate to global consensus. More importantly, this local rate asymptotically scales with the second largest singular value of the communication matrix, which is on par with the well-known rate in the Euclidean space. To the best of our knowledge, this is the first work showing the equality of the two rates. The main technical challenges include (i) developing a Riemannian restricted secant inequality for convergence analysis, and (ii) to identify the conditions (e.g., suitable step-size and initialization) under which the algorithm always stays in the local region.

AI-Empowered VNF Migration as a Cost-Loss-Effective Solution for Network Resilience arxiv:2101.09343 📈 1

Amina Lejla Ibrahimpasic, Bin Han, Hans D. Schotten

**Abstract:** With a wide deployment of Multi-Access Edge Computing (MEC) in the Fifth Generation (5G) mobile networks, virtual network functions (VNF) can be flexibly migrated between difference locations, and therewith significantly enhances the network resilience to counter the degradation in quality of service (QoS) due to network function outages. A balance has to be taken carefully, between the loss reduced by VNF migration and the operations cost generated thereby. To achieve this in practical scenarios with realistic user behavior, it calls for models of both cost and user mobility. This paper proposes a novel cost model and a AI-empowered approach for a rational migration of stateful VNFs, which minimizes the sum of operations cost and potential loss caused by outages, and is capable to deal with the complex realistic user mobility patterns.

A Novel Genetic Algorithm with Hierarchical Evaluation Strategy for Hyperparameter Optimisation of Graph Neural Networks arxiv:2101.09300 📈 1

Yingfang Yuan, Wenjun Wang, George M. Coghill, Wei Pang

**Abstract:** Graph representation of structured data can facilitate the extraction of stereoscopic features, and it has demonstrated excellent ability when working with deep learning systems, the so-called Graph Neural Networks (GNNs). Choosing a promising architecture for constructing GNNs can be transferred to a hyperparameter optimisation problem, a very challenging task due to the size of the underlying search space and high computational cost for evaluating candidate GNNs. To address this issue, this research presents a novel genetic algorithm with a hierarchical evaluation strategy (HESGA), which combines the full evaluation of GNNs with a fast evaluation approach. By using full evaluation, a GNN is represented by a set of hyperparameter values and trained on a specified dataset, and root mean square error (RMSE) will be used to measure the quality of the GNN represented by the set of hyperparameter values (for regression problems). While in the proposed fast evaluation process, the training will be interrupted at an early stage, the difference of RMSE values between the starting and interrupted epochs will be used as a fast score, which implies the potential of the GNN being considered. To coordinate both types of evaluations, the proposed hierarchical strategy uses the fast evaluation in a lower level for recommending candidates to a higher level, where the full evaluation will act as a final assessor to maintain a group of elite individuals. To validate the effectiveness of HESGA, we apply it to optimise two types of deep graph neural networks. The experimental results on three benchmark datasets demonstrate its advantages compared to Bayesian hyperparameter optimization.

Traffic Flow Estimation using LTE Radio Frequency Counters and Machine Learning arxiv:2101.09143 📈 1

Forough Yaghoubi, Armin Catovic, Arthur Gusmao, Jan Pieczkowski, Peter Boros

**Abstract:** As the demand for vehicles continues to outpace construction of new roads, it becomes imperative we implement strategies that improve utilization of existing transport infrastructure. Traffic sensors form a crucial part of many such strategies, giving us valuable insights into road utilization. However, due to cost and lead time associated with installation and maintenance of traffic sensors, municipalities and traffic authorities look toward cheaper and more scalable alternatives. Due to their ubiquitous nature and wide global deployment, cellular networks offer one such alternative. In this paper we present a novel method for traffic flow estimation using standardized LTE/4G radio frequency performance measurement counters. The problem is cast as a supervised regression task using both classical and deep learning methods. We further apply transfer learning to compensate that many locations lack traffic sensor data that could be used for training. We show that our approach benefits from applying transfer learning to generalize the solution not only in time but also in space (i.e., various parts of the city). The results are very promising and, unlike competing solutions, our approach utilizes aggregate LTE radio frequency counter data that is inherently privacy-preserving, readily available, and scales globally without any additional network impact.

Deepfakes and the 2020 US elections: what (did not) happen arxiv:2101.09092 📈 1

João Paulo Meneses

**Abstract:** Alarmed by the volume of disinformation that was assumed to have taken place during the 2016 US elections, scholars, politics and journalists predicted the worst when the first deepfakes began to emerge in 2018. After all, US Elections 2020 were believed to be the most secure in American history. This paper seeks explanations for an apparent contradiction: we believe that it was precisely the multiplication and conjugation of different types of warnings and fears that created the conditions that prevented malicious political deepfakes from affecting the 2020 US elections. From these warnings, we identified four factors (more active role of social networks, new laws, difficulties in accessing Artificial Intelligence and better awareness of society). But while this formula has proven to be effective in the case of the United States, 2020, it is not correct to assume that it can be repeated in other political contexts.

A novel DL approach to PE malware detection: exploring Glove vectorization, MCC_RCNN and feature fusion arxiv:2101.08969 📈 1

Yuzhou Lin

**Abstract:** In recent years, malware becomes more threatening. Concerning the increasing malware variants, there comes Machine Learning (ML)-based and Deep Learning (DL)-based approaches for heuristic detection. Nevertheless, the prediction accuracy of both needs to be improved. In response to the above issues in the PE malware domain, we propose the DL-based approaches for detection and use static-based features fed up into models. The contributions are as follows: we recapitulate existing malware detection methods. That is, we propose a vec-torized representation model of the malware instruction layer and semantic layer based on Glove. We implement a neural network model called MCC_RCNN (Malware Detection and Recurrent Convolutional Neural Network), comprising of the combination with CNN and RNN. Moreover, we provide a description of feature fusion in static behavior levels. With the numerical results generated from several comparative experiments towards evaluating the Glove-based vectoriza-tion, MCC_RCNN-based classification methodology and feature fusion stages, our proposed classification methods can obtain a higher prediction accuracy than the other baseline methods.

Neural networks for Anatomical Therapeutic Chemical (ATC) classification arxiv:2101.11713 📈 0

Loris Nanni, Alessandra Lumini, Sheryl Brahnam

**Abstract:** Motivation: Automatic Anatomical Therapeutic Chemical (ATC) classification is a critical and highly competitive area of research in bioinformatics because of its potential for expediting drug develop-ment and research. Predicting an unknown compound's therapeutic and chemical characteristics ac-cording to how these characteristics affect multiple organs/systems makes automatic ATC classifica-tion a challenging multi-label problem. Results: In this work, we propose combining multiple multi-label classifiers trained on distinct sets of features, including sets extracted from a Bidirectional Long Short-Term Memory Network (BiLSTM). Experiments demonstrate the power of this approach, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. Availability: All source code developed for this study is available at https://github.com/LorisNanni. Contact: loris.nanni@unipd.it

$k$-Neighbor Based Curriculum Sampling for Sequence Prediction arxiv:2101.09313 📈 0

James O' Neill, Danushka Bollegala

**Abstract:** Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. To improve generalization in neural language models and address compounding errors, we propose \textit{Nearest-Neighbor Replacement Sampling} -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy to a stochastic policy. A token at a given time-step is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between the original word and its top $k$ most similar words. This allows the learner to explore alternatives when the current policy provided by the teacher is sub-optimal or difficult to learn from. The proposed method is straightforward, online and requires little additional memory requirements. We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.

Safe Learning Reference Governor: Theory and Application to Fuel Truck Rollover Avoidance arxiv:2101.09298 📈 0

Kaiwen Liu, Nan Li, Ilya Kolmanovsky, Denise Rizzo, Anouck Girard

**Abstract:** This paper proposes a learning reference governor (LRG) approach to enforce state and control constraints in systems for which an accurate model is unavailable, and this approach enables the reference governor to gradually improve command tracking performance through learning while enforcing the constraints during learning and after learning is completed. The learning can be performed either on a black-box type model of the system or directly on the hardware. After introducing the LRG algorithm and outlining its theoretical properties, this paper investigates LRG application to fuel truck (tank truck) rollover avoidance. Through simulations based on a fuel truck model that accounts for liquid fuel sloshing effects, we show that the proposed LRG can effectively protect fuel trucks from rollover accidents under various operating conditions.

Estimating $α$-Rank by Maximizing Information Gain arxiv:2101.09178 📈 0

Tabish Rashid, Cheng Zhang, Kamil Ciosek

**Abstract:** Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling. For example, meta-games that arise in multi-agent evaluation can only be accessed by running a succession of expensive experiments that may involve simultaneous deployment of several agents. In this paper, we focus on $α$-rank, a popular game-theoretic solution concept designed to perform well in such scenarios. We aim to estimate the $α$-rank of the game using as few samples as possible. Our algorithm maximizes information gain between an epistemic belief over the $α$-ranks and the observed payoff. This approach has two main benefits. First, it allows us to focus our sampling on the entries that matter the most for identifying the $α$-rank. Second, the Bayesian formulation provides a facility to build in modeling assumptions by using a prior over game payoffs. We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

Prev: 2021.01.21 Next: 2021.01.23