Summary for 2021-08-11, created on 2021-12-18

Natural Language-Guided Programming arxiv:2108.05198 📈 95

Geert Heyman, Rafael Huysegems, Pascal Justen, Tom Van Cutsem

**Abstract:** In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a search engine and then manually adapt found examples to their specific context of use. We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process. The key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next, formulated in plain natural language. We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming. To show that this idea is feasible we design, implement and benchmark a tool that solves this problem in the context of a specific domain (data science) and a specific programming language (Python). Central to the tool is the use of language models trained on a large corpus of documented code. Our initial experiments confirm the feasibility of the idea but also make it clear that we have only scratched the surface of what may become possible in the future. We end the paper with a comprehensive research agenda to stimulate additional research in the budding area of natural language-guided programming.

Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather arxiv:2108.05249 📈 46

Martin Hahner, Christos Sakaridis, Dengxin Dai, Luc Van Gool

**Abstract:** This work addresses the challenging task of LiDAR-based 3D object detection in foggy weather. Collecting and annotating data in such a scenario is very time, labor and cost intensive. In this paper, we tackle this problem by simulating physically accurate fog into clear-weather scenes, so that the abundant existing real datasets captured in clear weather can be repurposed for our task. Our contributions are twofold: 1) We develop a physically valid fog simulation method that is applicable to any LiDAR dataset. This unleashes the acquisition of large-scale foggy training data at no extra cost. These partially synthetic data can be used to improve the robustness of several perception methods, such as 3D object detection and tracking or simultaneous localization and mapping, on real foggy data. 2) Through extensive experiments with several state-of-the-art detection approaches, we show that our fog simulation can be leveraged to significantly improve the performance for 3D object detection in the presence of fog. Thus, we are the first to provide strong 3D object detection baselines on the Seeing Through Fog dataset. Our code is available at www.trace.ethz.ch/lidar_fog_simulation.

Logic Explained Networks arxiv:2108.05149 📈 46

Gabriele Ciravegna, Pietro Barbiero, Francesco Giannini, Marco Gori, Pietro Lió, Marco Maggini, Stefano Melacci

**Abstract:** The large and still increasing popularity of deep learning clashes with a major limit of neural network architectures, that consists in their lack of capability in providing human-understandable motivations of their decisions. In situations in which the machine is expected to support the decision of human experts, providing a comprehensible explanation is a feature of crucial importance. The language used to communicate the explanations must be formal enough to be implementable in a machine and friendly enough to be understandable by a wide audience. In this paper, we propose a general approach to Explainable Artificial Intelligence in the case of neural architectures, showing how a mindful design of the networks leads to a family of interpretable deep learning models called Logic Explained Networks (LENs). LENs only require their inputs to be human-understandable predicates, and they provide explanations in terms of simple First-Order Logic (FOL) formulas involving such predicates. LENs are general enough to cover a large number of scenarios. Amongst them, we consider the case in which LENs are directly used as special classifiers with the capability of being explainable, or when they act as additional networks with the role of creating the conditions for making a black-box classifier explainable by FOL formulas. Despite supervised learning problems are mostly emphasized, we also show that LENs can learn and provide explanations in unsupervised learning settings. Experimental results on several datasets and tasks show that LENs may yield better classifications than established white-box models, such as decision trees and Bayesian rule lists, while providing more compact and meaningful explanations.

Learning to Hash Robustly, with Guarantees arxiv:2108.05433 📈 9

Alexandr Andoni, Daniel Beaglehole

**Abstract:** The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries, or apply to datasets with an assumed extra structure/model. In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. On the theoretical side, we exhibit a natural setting (dataset model) where our algorithm is much better than the standard theoretical one. On the practical side, we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.

Estimation of Fair Ranking Metrics with Incomplete Judgments arxiv:2108.05152 📈 9

Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yılmaz

**Abstract:** There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

Rethinking Coarse-to-Fine Approach in Single Image Deblurring arxiv:2108.05054 📈 9

Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, Sung-Jea Ko

**Abstract:** Coarse-to-fine strategies have been extensively used for the architecture design of single image deblurring networks. Conventional methods typically stack sub-networks with multi-scale input images and gradually improve sharpness of images from the bottom sub-network to the top sub-network, yielding inevitably high computational costs. Toward a fast and accurate deblurring network design, we revisit the coarse-to-fine strategy and present a multi-input multi-output U-net (MIMO-UNet). The MIMO-UNet has three distinct features. First, the single encoder of the MIMO-UNet takes multi-scale input images to ease the difficulty of training. Second, the single decoder of the MIMO-UNet outputs multiple deblurred images with different scales to mimic multi-cascaded U-nets using a single U-shaped network. Last, asymmetric feature fusion is introduced to merge multi-scale features in an efficient manner. Extensive experiments on the GoPro and RealBlur datasets demonstrate that the proposed network outperforms the state-of-the-art methods in terms of both accuracy and computational complexity. Source code is available for research purposes at https://github.com/chosj95/MIMO-UNet.

Robotic Testbed for Rendezvous and Optical Navigation: Multi-Source Calibration and Machine Learning Use Cases arxiv:2108.05529 📈 8

Tae Ha Park, Juergen Bosse, Simone D'Amico

**Abstract:** This work presents the most recent advances of the Robotic Testbed for Rendezvous and Optical Navigation (TRON) at Stanford University - the first robotic testbed capable of validating machine learning algorithms for spaceborne optical navigation. The TRON facility consists of two 6 degrees-of-freedom KUKA robot arms and a set of Vicon motion track cameras to reconfigure an arbitrary relative pose between a camera and a target mockup model. The facility includes multiple Earth albedo light boxes and a sun lamp to recreate the high-fidelity spaceborne illumination conditions. After the overview of the facility, this work details the multi-source calibration procedure which enables the estimation of the relative pose between the object and the camera with millimeter-level position and millidegree-level orientation accuracies. Finally, a comparative analysis of the synthetic and TRON simulated imageries is performed using a Convolutional Neural Network (CNN) pre-trained on the synthetic images. The result shows a considerable gap in the CNN's performance, suggesting the TRON simulated images can be used to validate the robustness of any machine learning algorithms trained on more easily accessible synthetic imagery from computer graphics.

Large-Scale Modeling of Mobile User Click Behaviors Using Deep Learning arxiv:2108.05342 📈 8

Xin Zhou, Yang Li

**Abstract:** Modeling tap or click sequences of users on a mobile device can improve our understandings of interaction behavior and offers opportunities for UI optimization by recommending next element the user might want to click on. We analyzed a large-scale dataset of over 20 million clicks from more than 4,000 mobile users who opted in. We then designed a deep learning model that predicts the next element that the user clicks given the user's click history, the structural information of the UI screen, and the current context such as the time of the day. We thoroughly investigated the deep model by comparing it with a set of baseline methods based on the dataset. The experiments show that our model achieves 48% and 71% accuracy (top-1 and top-3) for predicting next clicks based on a held-out dataset of test users, which significantly outperformed all the baseline methods with a large margin. We discussed a few scenarios for integrating the model in mobile interaction and how users can potentially benefit from the model.

ULTRA: An Unbiased Learning To Rank Algorithm Toolbox arxiv:2108.05073 📈 8

Anh Tran, Tao Yang, Qingyao Ai

**Abstract:** Learning to rank systems has become an important aspect of our daily life. However, the implicit user feedback that is used to train many learning to rank models is usually noisy and suffered from user bias (i.e., position bias). Thus, obtaining an unbiased model using biased feedback has become an important research field for IR. Existing studies on unbiased learning to rank (ULTR) can be generalized into two families-algorithms that attain unbiasedness with logged data, offline learning, and algorithms that achieve unbiasedness by estimating unbiased parameters with real-time user interactions, namely online learning. While there exist many algorithms from both families, there lacks a unified way to compare and benchmark them. As a result, it can be challenging for researchers to choose the right technique for their problems or for people who are new to the field to learn and understand existing algorithms. To solve this problem, we introduced ULTRA, which is a flexible, extensible, and easily configure ULTR toolbox. Its key features include support for multiple ULTR algorithms with configurable hyperparameters, a variety of built-in click models that can be used separately to simulate clicks, different ranking model architecture and evaluation metrics, and simple learning to rank pipeline creation. In this paper, we discuss the general framework of ULTR, briefly describe the algorithms in ULTRA, detailed the structure, and pipeline of the toolbox. We experimented on all the algorithms supported by ultra and showed that the toolbox performance is reasonable. Our toolbox is an important resource for researchers to conduct experiments on ULTR algorithms with different configurations as well as testing their own algorithms with the supported features.

Variable-Length Music Score Infilling via XLNet and Musically Specialized Positional Encoding arxiv:2108.05064 📈 8

Chin-Jui Chang, Chun-Yi Lee, Yi-Hsuan Yang

**Abstract:** This paper proposes a new self-attention based model for music score infilling, i.e., to generate a polyphonic music sequence that fills in the gap between given past and future contexts. While existing approaches can only fill in a short segment with a fixed number of notes, or a fixed time span between the past and future contexts, our model can infill a variable number of notes (up to 128) for different time spans. We achieve so with three major technical contributions. First, we adapt XLNet, an autoregressive model originally proposed for unsupervised model pre-training, to music score infilling. Second, we propose a new, musically specialized positional encoding called relative bar encoding that better informs the model of notes' position within the past and future context. Third, to capitalize relative bar encoding, we perform look-ahead onset prediction to predict the onset of a note one time step before predicting the other attributes of the note. We compare our proposed model with two strong baselines and show that our model is superior in both objective and subjective analyses.

DEMix Layers: Disentangling Domains for Modular Language Modeling arxiv:2108.05036 📈 8

Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer

**Abstract:** We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity, increase training efficiency, and enable rapid adaptation with little overhead. We show that mixing experts during inference, using a parameter-free weighted ensemble, allows the model to better generalize to heterogeneous or unseen domains. We also show that experts can be added to iteratively incorporate new domains without forgetting older ones, and that experts can be removed to restrict access to unwanted domains, without additional training. Overall, these results demonstrate benefits of explicitly conditioning on textual domains during language modeling.

Putting RDF2vec in Order arxiv:2108.05280 📈 7

Jan Portisch, Heiko Paulheim

**Abstract:** The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on tasks where entities of different classes are involved.

Empirical Risk Minimization for Time Series: Nonparametric Performance Bounds for Prediction arxiv:2108.05184 📈 7

Christian Brownlees, Jordi Llorens-Terrazas

**Abstract:** Empirical risk minimization is a standard principle for choosing algorithms in learning theory. In this paper we study the properties of empirical risk minimization for time series. The analysis is carried out in a general framework that covers different types of forecasting applications encountered in the literature. We are concerned with 1-step-ahead prediction of a univariate time series generated by a parameter-driven process. A class of recursive algorithms is available to forecast the time series. The algorithms are recursive in the sense that the forecast produced in a given period is a function of the lagged values of the forecast and of the time series. The relationship between the generating mechanism of the time series and the class of algorithms is unspecified. Our main result establishes that the algorithm chosen by empirical risk minimization achieves asymptotically the optimal predictive performance that is attainable within the class of algorithms.

Voxel-level Importance Maps for Interpretable Brain Age Estimation arxiv:2108.05388 📈 6

Kyriaki-Margarita Bintsi, Vasileios Baltatzis, Alexander Hammers, Daniel Rueckert

**Abstract:** Brain aging, and more specifically the difference between the chronological and the biological age of a person, may be a promising biomarker for identifying neurodegenerative diseases. For this purpose accurate prediction is important but the localisation of the areas that play a significant role in the prediction is also crucial, in order to gain clinicians' trust and reassurance about the performance of a prediction model. Most interpretability methods are focused on classification tasks and cannot be directly transferred to regression tasks. In this study, we focus on the task of brain age regression from 3D brain Magnetic Resonance (MR) images using a Convolutional Neural Network, termed prediction model. We interpret its predictions by extracting importance maps, which discover the parts of the brain that are the most important for brain age. In order to do so, we assume that voxels that are not useful for the regression are resilient to noise addition. We implement a noise model which aims to add as much noise as possible to the input without harming the performance of the prediction model. We average the importance maps of the subjects and end up with a population-based importance map, which displays the regions of the brain that are influential for the task. We test our method on 13,750 3D brain MR images from the UK Biobank, and our findings are consistent with the existing neuropathology literature, highlighting that the hippocampus and the ventricles are the most relevant regions for brain aging.

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling arxiv:2108.05301 📈 6

Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, Radu Timofte

**Abstract:** Normalizing flows have recently demonstrated promising results for low-level vision tasks. For image super-resolution (SR), it learns to predict diverse photo-realistic high-resolution (HR) images from the low-resolution (LR) image rather than learning a deterministic mapping. For image rescaling, it achieves high accuracy by jointly modelling the downscaling and upscaling processes. While existing approaches employ specialized techniques for these two tasks, we set out to unify them in a single formulation. In this paper, we propose the hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling. More specifically, HCFlow learns a bijective mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously. In particular, the high-frequency component is conditional on the LR image in a hierarchical manner. To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training. Extensive experiments on general image SR, face image SR and image rescaling have demonstrated that the proposed HCFlow achieves state-of-the-art performance in terms of both quantitative metrics and visual quality.

Asymptotic optimality and minimal complexity of classification by random projection arxiv:2108.06339 📈 5

Mireille Boutin, Evzenie Coupkova

**Abstract:** The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. Roughly speaking, the more complex the family, the greater the potential disparity between the training error and the population error of the classifier. This principle is embodied in layman's terms by Occam's razor principle, which suggests favoring low-complexity hypotheses over complex ones. We study a family of low-complexity classifiers consisting of thresholding the one-dimensional feature obtained by projecting the data on a random line after embedding it into a higher dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n (based on its performance on training data) is chosen. We obtain a bound on the generalization error of these low-complexity classifiers. The bound is less than that of any classifier with a non-trivial VC dimension, and thus less than that of a linear classifier. We also show that, given full knowledge of the class conditional densities, the error of the classifiers would converge to the optimal (Bayes) error as k and n go to infinity; if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity.

Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms arxiv:2108.05490 📈 5

Anahita Samadi, Debapriya Banerjee, Shirin Nilizadeh

**Abstract:** Recently, some studies have shown that text classification tasks are vulnerable to poisoning and evasion attacks. However, little work has investigated attacks against decision making algorithms that use text embeddings, and their output is a ranking. In this paper, we focus on ranking algorithms for recruitment process, that employ text embeddings for ranking applicants resumes when compared to a job description. We demonstrate both white box and black box attacks that identify text items, that based on their location in embedding space, have significant contribution in increasing the similarity score between a resume and a job description. The adversary then uses these text items to improve the ranking of their resume among others. We tested recruitment algorithms that use the similarity scores obtained from Universal Sentence Encoder (USE) and Term Frequency Inverse Document Frequency (TF IDF) vectors. Our results show that in both adversarial settings, on average the attacker is successful. We also found that attacks against TF IDF is more successful compared to USE.

Self-supervised Contrastive Learning for Irrigation Detection in Satellite Imagery arxiv:2108.05484 📈 5

Chitra Agastya, Sirak Ghebremusse, Ian Anderson, Colorado Reed, Hossein Vahabi, Alberto Todeschini

**Abstract:** Climate change has caused reductions in river runoffs and aquifer recharge resulting in an increasingly unsustainable crop water demand from reduced freshwater availability. Achieving food security while deploying water in a sustainable manner will continue to be a major challenge necessitating careful monitoring and tracking of agricultural water usage. Historically, monitoring water usage has been a slow and expensive manual process with many imperfections and abuses. Ma-chine learning and remote sensing developments have increased the ability to automatically monitor irrigation patterns, but existing techniques often require curated and labelled irrigation data, which are expensive and time consuming to obtain and may not exist for impactful areas such as developing countries. In this paper, we explore an end-to-end real world application of irrigation detection with uncurated and unlabeled satellite imagery. We apply state-of-the-art self-supervised deep learning techniques to optical remote sensing data, and find that we are able to detect irrigation with up to nine times better precision, 90% better recall and 40% more generalization ability than the traditional supervised learning methods.

Low-level Pose Control of Tilting Multirotor for Wall Perching Tasks Using Reinforcement Learning arxiv:2108.05457 📈 5

Hyungyu Lee, Myeongwoo Jeong, Chanyoung Kim, Hyungtae Lim, Changgue Park, Sungwon Hwang, Hyun Myung

**Abstract:** Recently, needs for unmanned aerial vehicles (UAVs) that are attachable to the wall have been highlighted. As one of the ways to address the need, researches on various tilting multirotors that can increase maneuverability has been employed. Unfortunately, existing studies on the tilting multirotors require considerable amounts of prior information on the complex dynamic model. Meanwhile, reinforcement learning on quadrotors has been studied to mitigate this issue. Yet, these are only been applied to standard quadrotors, whose systems are less complex than those of tilting multirotors. In this paper, a novel reinforcement learning-based method is proposed to control a tilting multirotor on real-world applications, which is the first attempt to apply reinforcement learning to a tilting multirotor. To do so, we propose a novel reward function for a neural network model that takes power efficiency into account. The model is initially trained over a simulated environment and then fine-tuned using real-world data in order to overcome the sim-to-real gap issue. Furthermore, a novel, efficient state representation with respect to the goal frame that helps the network learn optimal policy better is proposed. As verified on real-world experiments, our proposed method shows robust controllability by overcoming the complex dynamics of tilting multirotors.

Seven challenges for harmonizing explainability requirements arxiv:2108.05390 📈 5

Jiahao Chen, Victor Storchan

**Abstract:** Regulators have signalled an interest in adopting explainable AI(XAI) techniques to handle the diverse needs for model governance, operational servicing, and compliance in the financial services industry. In this short overview, we review the recent technical literature in XAI and argue that based on our current understanding of the field, the use of XAI techniques in practice necessitate a highly contextualized approach considering the specific needs of stakeholders for particular business applications.

Deep Learning Classification of Lake Zooplankton arxiv:2108.05258 📈 5

S. P. Kyathanahally, T. Hardeman, E. Merz, T. Kozakiewicz, M. Reyes, P. Isles, F. Pomati, M. Baity-Jesi

**Abstract:** Plankton are effective indicators of environmental change and ecosystem health in freshwater habitats, but collection of plankton data using manual microscopic methods is extremely labor-intensive and expensive. Automated plankton imaging offers a promising way forward to monitor plankton communities with high frequency and accuracy in real-time. Yet, manual annotation of millions of images proposes a serious challenge to taxonomists. Deep learning classifiers have been successfully applied in various fields and provided encouraging results when used to categorize marine plankton images. Here, we present a set of deep learning models developed for the identification of lake plankton, and study several strategies to obtain optimal performances,which lead to operational prescriptions for users. To this aim, we annotated into 35 classes over 17900 images of zooplankton and large phytoplankton colonies, detected in Lake Greifensee (Switzerland) with the Dual Scripps Plankton Camera. Our best models were based on transfer learning and ensembling, which classified plankton images with 98% accuracy and 93% F1 score. When tested on freely available plankton datasets produced by other automated imaging tools (ZooScan, FlowCytobot and ISIIS), our models performed better than previously used models. Our annotated data, code and classification models are freely available online.

Beyond Fairness Metrics: Roadblocks and Challenges for Ethical AI in Practice arxiv:2108.06217 📈 4

Jiahao Chen, Victor Storchan, Eren Kurshan

**Abstract:** We review practical challenges in building and deploying ethical AI at the scale of contemporary industrial and societal uses. Apart from the purely technical concerns that are the usual focus of academic research, the operational challenges of inconsistent regulatory pressures, conflicting business goals, data quality issues, development processes, systems integration practices, and the scale of deployment all conspire to create new ethical risks. Such ethical concerns arising from these practical considerations are not adequately addressed by existing research results. We argue that a holistic consideration of ethics in the development and deployment of AI systems is necessary for building ethical AI in practice, and exhort researchers to consider the full operational contexts of AI systems when assessing ethical risks.

An Approach to Partial Observability in Games: Learning to Both Act and Observe arxiv:2108.05701 📈 4

Elizabeth Gilmour, Noah Plotkin, Leslie Smith

**Abstract:** Reinforcement learning (RL) is successful at learning to play games where the entire environment is visible. However, RL approaches are challenged in complex games like Starcraft II and in real-world environments where the entire environment is not visible. In these more complex games with more limited visual information, agents must choose where to look and how to optimally use their limited visual information in order to succeed at the game. We verify that with a relatively simple model the agent can learn where to look in scenarios with a limited visual bandwidth. We develop a method for masking part of the environment in Atari games to force the RL agent to learn both where to look and how to play the game in order to study where the RL agent learns to look. In addition, we develop a neural network architecture and method for allowing the agent to choose where to look and what action to take in the Pong game. Further, we analyze the strategies the agent learns to better understand how the RL agent learns to play the game.

Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data arxiv:2108.05698 📈 4

Kuluhan Binici, Nam Trung Pham, Tulika Mitra, Karianto Leman

**Abstract:** With the increasing popularity of deep learning on edge devices, compressing large neural networks to meet the hardware requirements of resource-constrained devices became a significant research direction. Numerous compression methodologies are currently being used to reduce the memory sizes and energy consumption of neural networks. Knowledge distillation (KD) is among such methodologies and it functions by using data samples to transfer the knowledge captured by a large model (teacher) to a smaller one(student). However, due to various reasons, the original training data might not be accessible at the compression stage. Therefore, data-free model compression is an ongoing research problem that has been addressed by various works. In this paper, we point out that catastrophic forgetting is a problem that can potentially be observed in existing data-free distillation methods. Moreover, the sample generation strategies in some of these methods could result in a mismatch between the synthetic and real data distributions. To prevent such problems, we propose a data-free KD framework that maintains a dynamic collection of generated samples over time. Additionally, we add the constraint of matching the real data distribution in sample generation strategies that target maximum information gain. Our experiments demonstrate that we can improve the accuracy of the student models obtained via KD when compared with state-of-the-art approaches on the SVHN, Fashion MNIST and CIFAR100 datasets.

Extracting Semantics from Maintenance Records arxiv:2108.05454 📈 4

Sharad Dixit, Varish Mulwad, Abhinav Saxena

**Abstract:** Rapid progress in natural language processing has led to its utilization in a variety of industrial and enterprise settings, including in its use for information extraction, specifically named entity recognition and relation extraction, from documents such as engineering manuals and field maintenance reports. While named entity recognition is a well-studied problem, existing state-of-the-art approaches require large labelled datasets which are hard to acquire for sensitive data such as maintenance records. Further, industrial domain experts tend to distrust results from black box machine learning models, especially when the extracted information is used in downstream predictive maintenance analytics. We overcome these challenges by developing three approaches built on the foundation of domain expert knowledge captured in dictionaries and ontologies. We develop a syntactic and semantic rules-based approach and an approach leveraging a pre-trained language model, fine-tuned for a question-answering task on top of our base dictionary lookup to extract entities of interest from maintenance records. We also develop a preliminary ontology to represent and capture the semantics of maintenance records. Our evaluations on a real-world aviation maintenance records dataset show promising results and help identify challenges specific to named entity recognition in the context of noisy industrial data.

Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization arxiv:2108.05449 📈 4

Wei Zhu, Haitian Zheng, Haofu Liao, Weijian Li, Jiebo Luo

**Abstract:** Deep learning algorithms mine knowledge from the training data and thus would likely inherit the dataset's bias information. As a result, the obtained model would generalize poorly and even mislead the decision process in real-life applications. We propose to remove the bias information misused by the target task with a cross-sample adversarial debiasing (CSAD) method. CSAD explicitly extracts target and bias features disentangled from the latent representation generated by a feature extractor and then learns to discover and remove the correlation between the target and bias features. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. Moreover, we propose joint content and local structural representation learning to boost mutual information estimation for better performance. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.

Seismic wave propagation and inversion with Neural Operators arxiv:2108.05421 📈 4

Yan Yang, Angela F. Gao, Jorge C. Castellanos, Zachary E. Ross, Kamyar Azizzadenesheli, Robert W. Clayton

**Abstract:** Seismic wave propagation forms the basis for most aspects of seismological research, yet solving the wave equation is a major computational burden that inhibits the progress of research. This is exacerbated by the fact that new simulations must be performed when the velocity structure or source location is perturbed. Here, we explore a prototype framework for learning general solutions using a recently developed machine learning paradigm called Neural Operator. A trained Neural Operator can compute a solution in negligible time for any velocity structure or source location. We develop a scheme to train Neural Operators on an ensemble of simulations performed with random velocity models and source locations. As Neural Operators are grid-free, it is possible to evaluate solutions on higher resolution velocity models than trained on, providing additional computational efficiency. We illustrate the method with the 2D acoustic wave equation and demonstrate the method's applicability to seismic tomography, using reverse mode automatic differentiation to compute gradients of the wavefield with respect to the velocity structure. The developed procedure is nearly an order of magnitude faster than using conventional numerical methods for full waveform inversion.

Person Re-identification via Attention Pyramid arxiv:2108.05340 📈 4

Guangyi Chen, Tianpei Gu, Jiwen Lu, Jin-An Bao, Jie Zhou

**Abstract:** In this paper, we propose an attention pyramid method for person re-identification. Unlike conventional attention-based methods which only learn a global attention map, our attention pyramid exploits the attention regions in a multi-scale manner because human attention varies with different scales. Our attention pyramid imitates the process of human visual perception which tends to notice the foreground person over the cluttered background, and further focus on the specific color of the shirt with close observation. Specifically, we describe our attention pyramid by a "split-attend-merge-stack" principle. We first split the features into multiple local parts and learn the corresponding attentions. Then, we merge local attentions and stack these merged attentions with the residual connection as an attention pyramid. The proposed attention pyramid is a lightweight plug-and-play module that can be applied to off-the-shelf models. We implement our attention pyramid method in two different attention mechanisms including channel-wise attention and spatial attention. We evaluate our method on four largescale person re-identification benchmarks including Market-1501, DukeMTMC, CUHK03, and MSMT17. Experimental results demonstrate the superiority of our method, which outperforms the state-of-the-art methods by a large margin with limited computational cost.

Learning to Rearrange Voxels in Binary Segmentation Masks for Smooth Manifold Triangulation arxiv:2108.05269 📈 4

Jianning Li, Antonio Pepe, Christina Gsaxner, Yuan Jin, Jan Egger

**Abstract:** Medical images, especially volumetric images, are of high resolution and often exceed the capacity of standard desktop GPUs. As a result, most deep learning-based medical image analysis tasks require the input images to be downsampled, often substantially, before these can be fed to a neural network. However, downsampling can lead to a loss of image quality, which is undesirable especially in reconstruction tasks, where the fine geometric details need to be preserved. In this paper, we propose that high-resolution images can be reconstructed in a coarse-to-fine fashion, where a deep learning algorithm is only responsible for generating a coarse representation of the image, which consumes moderate GPU memory. For producing the high-resolution outcome, we propose two novel methods: learned voxel rearrangement of the coarse output and hierarchical image synthesis. Compared to the coarse output, the high-resolution counterpart allows for smooth surface triangulation, which can be 3D-printed in the highest possible quality. Experiments of this paper are carried out on the dataset of AutoImplant 2021 (https://autoimplant2021.grand-challenge.org/), a MICCAI challenge on cranial implant design. The dataset contains high-resolution skulls that can be viewed as 2D manifolds embedded in a 3D space. Codes associated with this study can be accessed at https://github.com/Jianningli/voxel_rearrangement.

Towards Practical Learned Indexing arxiv:2108.05117 📈 4

Mihail Stoian, Andreas Kipf, Ryan Marcus, Tim Kraska

**Abstract:** Latest research proposes to replace existing index structures with learned models. However, current learned indexes tend to have many hyperparameters, often do not provide any error guarantees, and are expensive to build. We introduce Practical Learned Index (PLEX). PLEX only has a single hyperparameter $ε$ (maximum prediction error) and offers a better trade-off between build and lookup time than state-of-the-art approaches. Similar to RadixSpline, PLEX consists of a spline and a (multi-level) radix layer. It first builds a spline satisfying the given $ε$ and then performs an ad-hoc analysis of the distribution of spline points to quickly tune the radix layer.

E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling arxiv:2108.13298 📈 3

Javier Albert, Dmitri Goldenberg

**Abstract:** Promotions and discounts are essential components of modern e-commerce platforms, where they are often used to incentivize customers towards purchase completion. Promotions also affect revenue and may incur a monetary loss that is often limited by a dedicated promotional budget. We study the Online Constrained Multiple-Choice Promotions Personalization Problem, where the optimization goal is to select for each customer which promotion to present in order to maximize purchase completions, while also complying with global budget limitations. Our work formalizes the problem as an Online Multiple Choice Knapsack Problem and extends the existent literature by addressing cases with negative weights and values. We provide a real-time adaptive method that guarantees budget constraints compliance and achieves above 99.7% of the optimal promotional impact on various datasets. Our method is evaluated on a large-scale experimental study at one of the leading online travel platforms in the world.

The Contextual Appointment Scheduling Problem arxiv:2108.05531 📈 3

Nima Salehi Sadghiani, Saeid Motiian

**Abstract:** This study is concerned with the determination of optimal appointment times for a sequence of jobs with uncertain duration. We investigate the data-driven Appointment Scheduling Problem (ASP) when one has $n$ observations of $p$ features (covariates) related to the jobs as well as historical data. We formulate ASP as an Integrated Estimation and Optimization problem using a task-based loss function. We justify the use of contexts by showing that not including the them yields to inconsistent decisions, which translates to sub-optimal appointments. We validate our approach through two numerical experiments.

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting arxiv:2108.05516 📈 3

Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou

**Abstract:** Keyword Spotting (KWS) remains challenging to achieve the trade-off between small footprint and high accuracy. Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size. However, for metric learning, due to data limitations, the speech anchor is highly susceptible to the acoustic environment and speakers. Also, we note that the 1D-CNN models have limited capability to capture long-term temporal acoustic features. To address the above problems, we propose to utilize text anchors to improve the stability of anchors. Furthermore, a new type of model (LG-Net) is exquisitely designed to promote long-short term acoustic feature modeling based on 1D-CNN and self-attention. Experiments are conducted on Google Speech Commands Dataset version 1 (GSCDv1) and 2 (GSCDv2). The results demonstrate that the proposed text anchor based metric learning method shows consistent improvements over speech anchor on representative CNN-based models. Moreover, our LG-Net model achieves SOTA accuracy of 97.67% and 96.79% on two datasets, respectively. It is encouraged to see that our lighter LG-Net with only 74k parameters obtains 96.82% KWS accuracy on the GSCDv1 and 95.77% KWS accuracy on the GSCDv2.

ProAI: An Efficient Embedded AI Hardware for Automotive Applications -- a Benchmark Study arxiv:2108.05170 📈 3

Sven Mantowsky, Falk Heuer, Syed Saqib Bukhari, Michael Keckeisen, Georg Schneider

**Abstract:** Development in the field of Single Board Computers (SBC) have been increasing for several years. They provide a good balance between computing performance and power consumption which is usually required for mobile platforms, like application in vehicles for Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD). However, there is an ever-increasing need of more powerful and efficient SBCs which can run power intensive Deep Neural Networks (DNNs) in real-time and can also satisfy necessary functional safety requirements such as Automotive Safety Integrity Level (ASIL). ProAI is being developed by ZF mainly to run powerful and efficient applications such as multitask DNNs and on top of that it also has the required safety certification for AD. In this work, we compare and discuss state of the art SBC on the basis of power intensive multitask DNN architecture called Multitask-CenterNet with respect to performance measures such as, FPS and power efficiency. As an automotive supercomputer, ProAI delivers an excellent combination of performance and efficiency, managing nearly twice the number of FPS per watt than a modern workstation laptop and almost four times compared to the Jetson Nano. Furthermore, it was also shown that there is still power in reserve for further and more complex tasks on the ProAI, based on the CPU and GPU utilization during the benchmark.

Cervical Optical Coherence Tomography Image Classification Based on Contrastive Self-Supervised Texture Learning arxiv:2108.05081 📈 3

Kaiyi Chen, Qingbin Wang, Yutao Ma

**Abstract:** Background: Cervical cancer seriously affects the health of the female reproductive system. Optical coherence tomography (OCT) emerged as a non-invasive, high-resolution imaging technology for cervical disease detection. However, OCT image annotation is knowledge-intensive and time-consuming, which impedes the training process of deep-learning-based classification models. Purpose: This study aims to develop a computer-aided diagnosis (CADx) approach to classifying in-vivo cervical OCT images based on self-supervised learning. Methods: In addition to high-level semantic features extracted by a convolutional neural network (CNN), the proposed CADx approach leverages unlabeled cervical OCT images' texture features learned by contrastive texture learning. We conducted ten-fold cross-validation on the OCT image dataset from a multi-center clinical study on 733 patients from China. Results: In a binary classification task for detecting high-risk diseases, including high-grade squamous intraepithelial lesion and cervical cancer, our method achieved an area-under-the-curve value of 0.9798 plus or minus 0.0157 with a sensitivity of 91.17 plus or minus 4.99% and a specificity of 93.96 plus or minus 4.72% for OCT image patches; also, it outperformed two out of four medical experts on the test set. Furthermore, our method achieved a 91.53% sensitivity and 97.37% specificity on an external validation dataset containing 287 3D OCT volumes from 118 Chinese patients in a new hospital using a cross-shaped threshold voting strategy. Conclusions: The proposed contrastive-learning-based CADx method outperformed the end-to-end CNN models and provided better interpretability based on texture features, which holds great potential to be used in the clinical protocol of "see-and-treat."

Tracking Hand Hygiene Gestures with Leap Motion Controller arxiv:2109.00884 📈 2

Rashmi Bakshi, Jane Courtney, Damon Berry, Graham Gavin

**Abstract:** The process of hand washing, according to the WHO, is divided into stages with clearly defined two handed dynamic gestures. In this paper, videos of hand washing experts are segmented and analyzed with the goal of extracting their corresponding features. These features can be further processed in software to classify particular hand movements, determine whether the stages have been successfully completed by the user and also assess the quality of washing. Having identified the important features, a 3D gesture tracker, the Leap Motion Controller (LEAP), was used to track and detect the hand features associated with these stages. With the help of sequential programming and threshold values, the hand features were combined together to detect the initiation and completion of a sample WHO Stage 2 (Rub hands Palm to Palm). The LEAP provides accurate raw positional data for tracking single hand gestures and two hands in separation but suffers from occlusion when hands are in contact. Other than hand hygiene the approaches shown here can be applied in other biomedical applications requiring close hand gesture analysis.

IT2CFNN: An Interval Type-2 Correlation-Aware Fuzzy Neural Network to Construct Non-Separable Fuzzy Rules with Uncertain and Adaptive Shapes for Nonlinear Function Approximation arxiv:2108.08704 📈 2

Armin Salimi-Badr

**Abstract:** In this paper, a new interval type-2 fuzzy neural network able to construct non-separable fuzzy rules with adaptive shapes is introduced. To reflect the uncertainty, the shape of fuzzy sets considered to be uncertain. Therefore, a new form of interval type-2 fuzzy sets based on a general Gaussian model able to construct different shapes (including triangular, bell-shaped, trapezoidal) is proposed. To consider the interactions among input variables, input vectors are transformed to new feature spaces with uncorrelated variables proper for defining each fuzzy rule. Next, the new features are fed to a fuzzification layer using proposed interval type-2 fuzzy sets with adaptive shape. Consequently, interval type-2 non-separable fuzzy rules with proper shapes, considering the local interactions of variables and the uncertainty are formed. For type reduction the contribution of the upper and lower firing strengths of each fuzzy rule are adaptively selected separately. To train different parameters of the network, the Levenberg-Marquadt optimization method is utilized. The performance of the proposed method is investigated on clean and noisy datasets to show the ability to consider the uncertainty. Moreover, the proposed paradigm, is successfully applied to real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed model outperforms other methods with a more parsimonious structure.

Learning to Segment Medical Images from Few-Shot Sparse Labels arxiv:2108.05476 📈 2

Pedro H. T. Gama, Hugo Oliveira, Jefersson A. dos Santos

**Abstract:** In this paper, we propose a novel approach for few-shot semantic segmentation with sparse labeled images. We investigate the effectiveness of our method, which is based on the Model-Agnostic Meta-Learning (MAML) algorithm, in the medical scenario, where the use of sparse labeling and few-shot can alleviate the cost of producing new annotated datasets. Our method uses sparse labels in the meta-training and dense labels in the meta-test, thus making the model learn to predict dense labels from sparse ones. We conducted experiments with four Chest X-Ray datasets to evaluate two types of annotations (grid and points). The results show that our method is the most suitable when the target domain highly differs from source domains, achieving Jaccard scores comparable to dense labels, using less than 2% of the pixels of an image with labels in few-shot scenarios.

Deep PET/CT fusion with Dempster-Shafer theory for lymphoma segmentation arxiv:2108.05422 📈 2

Ling Huang, Thierry Denoeux, David Tonnelet, Pierre Decazes, Su Ruan

**Abstract:** Lymphoma detection and segmentation from whole-body Positron Emission Tomography/Computed Tomography (PET/CT) volumes are crucial for surgical indication and radiotherapy. Designing automatic segmentation methods capable of effectively exploiting the information from PET and CT as well as resolving their uncertainty remain a challenge. In this paper, we propose an lymphoma segmentation model using an UNet with an evidential PET/CT fusion layer. Single-modality volumes are trained separately to get initial segmentation maps and an evidential fusion layer is proposed to fuse the two pieces of evidence using Dempster-Shafer theory (DST). Moreover, a multi-task loss function is proposed: in addition to the use of the Dice loss for PET and CT segmentation, a loss function based on the concordance between the two segmentation is added to constrain the final segmentation. We evaluate our proposal on a database of polycentric PET/CT volumes of patients treated for lymphoma, delineated by the experts. Our method get accurate segmentation results with Dice score of 0.726, without any user interaction. Quantitative results show that our method is superior to the state-of-the-art methods.

Ontology drift is a challenge for explainable data governance arxiv:2108.05401 📈 2

Jiahao Chen

**Abstract:** We introduce the needs for explainable AI that arise from Standard No. 239 from the Basel Committee on Banking Standards (BCBS 239), which outlines 11 principles for effective risk data aggregation and risk reporting for financial institutions. Of these, explainableAI is necessary for compliance in two key aspects: data quality, and appropriate reporting for multiple stakeholders. We describe the implementation challenges for one specific regulatory requirement:that of having a complete data taxonomy that is appropriate for firmwide use. The constantly evolving nature of financial ontologies necessitate a continuous updating process to ensure ongoing compliance.

Controlling the False Split Rate in Tree-Based Aggregation arxiv:2108.05350 📈 2

Simeng Shao, Jacob Bien, Adel Javanmard

**Abstract:** In many domains, data measurements can naturally be associated with the leaves of a tree, expressing the relationships among these measurements. For example, companies belong to industries, which in turn belong to ever coarser divisions such as sectors; microbes are commonly arranged in a taxonomic hierarchy from species to kingdoms; street blocks belong to neighborhoods, which in turn belong to larger-scale regions. The problem of tree-based aggregation that we consider in this paper asks which of these tree-defined subgroups of leaves should really be treated as a single entity and which of these entities should be distinguished from each other. We introduce the "false split rate", an error measure that describes the degree to which subgroups have been split when they should not have been. We then propose a multiple hypothesis testing algorithm for tree-based aggregation, which we prove controls this error measure. We focus on two main examples of tree-based aggregation, one which involves aggregating means and the other which involves aggregating regression coefficients. We apply this methodology to aggregate stocks based on their volatility and to aggregate neighborhoods of New York City based on taxi fares.

A Better Loss for Visual-Textual Grounding arxiv:2108.05308 📈 2

Davide Rigoni, Luciano Serafini, Alessandro Sperduti

**Abstract:** Given a textual phrase and an image, the visual grounding problem is defined as the task of locating the content of the image referenced by the sentence. It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution. In the last years, several works have addressed this problem with heavy and complex models that try to capture visual-textual dependencies better than before. These models are typically constituted by two main components that focus on how to learn useful multi-modal features for grounding and how to improve the predicted bounding box of the visual mention, respectively. Finding the right learning balance between these two sub-tasks is not easy, and the current models are not necessarily optimal with respect to this issue. In this work, we propose a model that, although using a simple multi-modal feature fusion component, is able to achieve a higher accuracy than state-of-the-art models thanks to the adoption of a more effective loss function, based on the classes probabilities, that reach, in the considered datasets, a better learning balance between the two sub-tasks mentioned above.

DeliData: A dataset for deliberation in multi-party problem solving arxiv:2108.05271 📈 2

Georgi Karadzhov, Tom Stafford, Andreas Vlachos

**Abstract:** Dialogue systems research is traditionally focused on dialogues between two interlocutors, largely ignoring group conversations. Moreover, most previous research is focused either on task-oriented dialogue (e.g.\ restaurant bookings) or user engagement (chatbots), while research on systems for collaborative dialogues is an under-explored area. To this end, we introduce the first publicly available dataset containing collaborative conversations on solving a cognitive task, consisting of 500 group dialogues and 14k utterances. Furthermore, we propose a novel annotation schema that captures deliberation cues and release 50 dialogues annotated with it. Finally, we demonstrate the usefulness of the annotated data in training classifiers to predict the constructiveness of a conversation. The data collection platform, dataset and annotated corpus are publicly available at https://delibot.xyz

Fast predictions of lattice energies by continuous isometry invariants of crystal structures arxiv:2108.07233 📈 1

Jakob Ropers, Marco M Mosca, Olga Anosova, Vitaliy Kurlin, Andrew I Cooper

**Abstract:** Crystal Structure Prediction (CSP) aims to discover solid crystalline materials by optimizing periodic arrangements of atoms, ions or molecules. CSP takes weeks of supercomputer time because of slow energy minimizations for millions of simulated crystals. The lattice energy is a key physical property, which determines thermodynamic stability of a crystal but has no simple analytic expression. Past machine learning approaches to predict the lattice energy used slow crystal descriptors depending on manually chosen parameters. The new area of Periodic Geometry offers much faster isometry invariants that are also continuous under perturbations of atoms. Our experiments on simulated crystals confirm that a small distance between the new invariants guarantees a small difference of energies. We compare several kernel methods for invariant-based predictions of energy and achieve the mean absolute error of less than 5kJ/mole or 0.05eV/atom on a dataset of 5679 crystals.

Local Correlation Clustering with Asymmetric Classification Errors arxiv:2108.05697 📈 1

Jafar Jafarov, Sanchit Kalhan, Konstantin Makarychev, Yury Makarychev

**Abstract:** In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as "similar" and "dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\text{dis}$, is a vector indexed by the vertices of $G$ such that the $v$-th coordinate $\text{dis}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in the range of $[α\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $α\mathbf{w}$ (where $α\leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}α)^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}α\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.

Correlation Clustering with Asymmetric Classification Errors arxiv:2108.05696 📈 1

Jafar Jafarov, Sanchit Kalhan, Konstantin Makarychev, Yury Makarychev

**Abstract:** In the Correlation Clustering problem, we are given a weighted graph $G$ with its edges labeled as "similar" or "dissimilar" by a binary classifier. The goal is to produce a clustering that minimizes the weight of "disagreements": the sum of the weights of "similar" edges across clusters and "dissimilar" edges within clusters. We study the correlation clustering problem under the following assumption: Every "similar" edge $e$ has weight $\mathbf{w}_e\in[α\mathbf{w}, \mathbf{w}]$ and every "dissimilar" edge $e$ has weight $\mathbf{w}_e\geq α\mathbf{w}$ (where $α\leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give a $(3 + 2 \log_e (1/α))$ approximation algorithm for this problem. This assumption captures well the scenario when classification errors are asymmetric. Additionally, we show an asymptotically matching Linear Programming integrality gap of $Ω(\log 1/α)$.

Composition Machines: Programming Self-Organising Software Models for the Emergence of Sequential Program Spaces arxiv:2108.05402 📈 1

Damian Arellanes

**Abstract:** We are entering a new era in which software systems are becoming more and more complex and larger. So, the composition of such systems is becoming infeasible by manual means. To address this challenge, self-organising software models represent a promising direction since they allow the (bottom-up) emergence of complex computational structures from simple rules. In this paper, we propose an abstract machine, called the composition machine, which allows the definition and the execution of such models. Unlike typical abstract machines, our proposal does not compute individual programs but enables the emergence of multiple programs at once. We particularly present the machine's semantics and provide examples to demonstrate its operation with well-known rules from the realm of Boolean logic and elementary cellular automata.

The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study arxiv:2108.05341 📈 1

Mohammad Masudur Rahman, Foutse Khomh, Shamima Yeasmin, Chanchal K. Roy

**Abstract:** Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports although they contain such queries. We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics, which has led us to actionable insights. Furthermore, we demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights to them.

Convergence bounds for nonlinear least squares and applications to tensor recovery arxiv:2108.05237 📈 1

Philipp Trunschke

**Abstract:** We consider the problem of approximating a function in general nonlinear subsets of $L^2$ when only a weighted Monte Carlo estimate of the $L^2$-norm can be computed. Of particular interest in this setting is the concept of sample complexity, the number of samples that are necessary to recover the best approximation. Bounds for this quantity have been derived in a previous work and depend primarily on the model class and are not influenced positively by the regularity of the sought function. This result however is only a worst-case bound and is not able to explain the remarkable performance of iterative hard thresholding algorithms that is observed in practice. We reexamine the results of the previous paper and derive a new bound that is able to utilize the regularity of the sought function. A critical analysis of our results allows us to derive a sample efficient algorithm for the model set of low-rank tensors. The viability of this algorithm is demonstrated by recovering quantities of interest for a classical high-dimensional random partial differential equation.

Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering arxiv:2108.05158 📈 1

Donggeon Lee, Seongho Choi, Youwon Jang, Byoung-Tak Zhang

**Abstract:** Video question answering has recently received a lot of attention from multimodal video researchers. Most video question answering datasets are usually in the form of multiple-choice. But, the model for the multiple-choice task does not infer the answer. Rather it compares the answer candidates for picking the correct answer. Furthermore, it makes it difficult to extend to other tasks. In this paper, we challenge the existing multiple-choice video question answering by changing it to open-ended video question answering. To tackle open-ended question answering, we use the pretrained GPT2 model. The model is fine-tuned with video inputs and subtitles. An ablation study is performed by changing the existing DramaQA dataset to an open-ended question answering, and it shows that performance can be improved using video metadata.

Prioritized SIPP for Multi-Agent Path Finding With Kinematic Constraints arxiv:2108.05145 📈 1

Zain Alabedeen Ali, Konstantin Yakovlev

**Abstract:** Multi-Agent Path Finding (MAPF) is a long-standing problem in Robotics and Artificial Intelligence in which one needs to find a set of collision-free paths for a group of mobile agents (robots) operating in the shared workspace. Due to its importance, the problem is well-studied and multiple optimal and approximate algorithms are known. However, many of them abstract away from the kinematic constraints and assume that the agents can accelerate/decelerate instantaneously. This complicates the application of the algorithms on the real robots. In this paper, we present a method that mitigates this issue to a certain extent. The suggested solver is essentially, a prioritized planner based on the well-known Safe Interval Path Planning (SIPP) algorithm. Within SIPP we explicitly reason about the speed and the acceleration thus the constructed plans directly take kinematic constraints of agents into account. We suggest a range of heuristic functions for that setting and conduct a thorough empirical evaluation of the suggested algorithm.

Overview of the TREC 2020 Fair Ranking Track arxiv:2108.05135 📈 1

Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier

**Abstract:** This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may be multiple group definitions (e.g. based on demographics, stature, topic) and hoped for the systems to be robust to these. We expected participants to develop systems that optimize for fairness and relevance for arbitrary group definitions, and did not reveal the exact group definitions until after the evaluation runs were submitted.The track contains two tasks,reranking and retrieval, with a shared evaluation.

Capture Uncertainties in Deep Neural Networks for Safe Operation of Autonomous Driving Vehicles arxiv:2108.05118 📈 1

Liuhui Ding, Dachuan Li, Bowen Liu, Wenxing Lan, Bing Bai, Qi Hao, Weipeng Cao, Ke Pei

**Abstract:** Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network model which detects 3D objects and quantitatively captures the associated aleatoric and epistemic uncertainties of DNNs; (2) An uncertainty-aware motion planning algorithm (PU-RRT) that accounts for uncertainties in object detection and ego-vehicle's motion. The proposed approaches are validated via simulated complex scenarios built in CARLA. Experimental results show that the proposed motion planning scheme can cope with uncertainties of DNN-based perception and vehicle motion, and improve the operational safety of autonomous vehicles while still achieving desirable efficiency.

Unsupervised Driver Behavior Profiling leveraging Recurrent Neural Networks arxiv:2108.05079 📈 1

Young Ah Choi, Kyung Ho Park, Eunji Park, Huy Kang Kim

**Abstract:** In the era of intelligent transportation, driver behavior profiling has become a beneficial technology as it provides knowledge regarding the driver's aggressiveness. Previous approaches achieved promising driver behavior profiling performance through establishing statistical heuristics rules or supervised learning-based models. Still, there exist limits that the practitioner should prepare a labeled dataset, and prior approaches could not classify aggressive behaviors which are not known a priori. In pursuit of improving the aforementioned drawbacks, we propose a novel approach to driver behavior profiling leveraging an unsupervised learning paradigm. First, we cast the driver behavior profiling problem as anomaly detection. Second, we established recurrent neural networks that predict the next feature vector given a sequence of feature vectors. We trained the model with normal driver data only. As a result, our model yields high regression error given a sequence of aggressive driver behavior and low error given at a sequence of normal driver behavior. We figured this difference of error between normal and aggressive driver behavior can be an adequate flag for driver behavior profiling and accomplished a precise performance in experiments. Lastly, we further analyzed the optimal level of sequence length for identifying each aggressive driver behavior. We expect the proposed approach to be a useful baseline for unsupervised driver behavior profiling and contribute to the efficient, intelligent transportation ecosystem.

NI-UDA: Graph Adversarial Domain Adaptation from Non-shared-and-Imbalanced Big Data to Small Imbalanced Applications arxiv:2108.05061 📈 1

Guangyi Xiao, Weiwei Xiang, Huan Liu, Hao Chen, Shun Peng, Jingzhi Guo, Zhiguo Gong

**Abstract:** We propose a new general Graph Adversarial Domain Adaptation (GADA) based on semantic knowledge reasoning of class structure for solving the problem of unsupervised domain adaptation (UDA) from the big data with non-shared and imbalanced classes to specified small and imbalanced applications (NI-UDA), where non-shared classes mean the label space out of the target domain. Our goal is to leverage priori hierarchy knowledge to enhance domain adversarial aligned feature representation with graph reasoning. In this paper, to address two challenges in NI-UDA, we equip adversarial domain adaptation with Hierarchy Graph Reasoning (HGR) layer and the Source Classifier Filter (SCF). For sparse classes transfer challenge, our HGR layer can aggregate local feature to hierarchy graph nodes by node prediction and enhance domain adversarial aligned feature with hierarchy graph reasoning for sparse classes. Our HGR contributes to learn direct semantic patterns for sparse classes by hierarchy attention in self-attention, non-linear mapping and graph normalization. our SCF is proposed for the challenge of knowledge sharing from non-shared data without negative transfer effect by filtering low-confidence non-shared data in HGR layer. Experiments on two benchmark datasets show our GADA methods consistently improve the state-of-the-art adversarial UDA algorithms, e.g. GADA(HGR) can greatly improve f1 of the MDD by \textbf{7.19\%} and GVB-GD by \textbf{7.89\%} respectively on imbalanced source task in Meal300 dataset. The code is available at https://gadatransfer.wixsite.com/gada.

MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach arxiv:2108.05060 📈 1

Falk Heuer, Sven Mantowsky, Syed Saqib Bukhari, Georg Schneider

**Abstract:** Multitask learning is a common approach in machine learning, which allows to train multiple objectives with a shared architecture. It has been shown that by training multiple tasks together inference time and compute resources can be saved, while the objectives performance remains on a similar or even higher level. However, in perception related multitask networks only closely related tasks can be found, such as object detection, instance and semantic segmentation or depth estimation. Multitask networks with diverse tasks and their effects with respect to efficiency on one another are not well studied. In this paper we augment the CenterNet anchor-free approach for training multiple diverse perception related tasks together, including the task of object detection and semantic segmentation as well as human pose estimation. We refer to this DNN as Multitask-CenterNet (MCN). Additionally, we study different MCN settings for efficiency. The MCN can perform several tasks at once while maintaining, and in some cases even exceeding, the performance values of its corresponding single task networks. More importantly, the MCN architecture decreases inference time and reduces network size when compared to a composition of single task networks.

Towards Top-Down Just Noticeable Difference Estimation of Natural Images arxiv:2108.05058 📈 1

Qiuping Jiang, Zhentao Liu, Shiqi Wang, Feng Shao, Weisi Lin

**Abstract:** Existing efforts on Just noticeable difference (JND) estimation mainly dedicate to modeling the visibility masking effects of different factors in spatial and frequency domains, and then fusing them into an overall JND estimate. However, the overall visibility masking effect can be related with more contributing factors beyond those have been considered in the literature and it is also insufficiently accurate to formulate the masking effect even for an individual factor. Moreover, the potential interactions among different masking effects are also difficult to be characterized with a simple fusion model. In this work, we turn to a dramatically different way to address these problems with a top-down design philosophy. Instead of formulating and fusing multiple masking effects in a bottom-up way, the proposed JND estimation model directly generates a critical perceptual lossless (CPL) image from a top-down perspective and calculates the difference map between the original image and the CPL image as the final JND map. Given an input image, an adaptively critical point (perceptual lossless threshold), defined as the minimum number of spectral components in Karhunen-Loéve Transform (KLT) used for perceptual lossless image reconstruction, is derived by exploiting the convergence characteristics of KLT coefficient energy. Then, the CPL image can be reconstructed via inverse KLT according to the derived critical point. Finally, the difference map between the original image and the CPL image is calculated as the JND map. The performance of the proposed JND model is evaluated with two applications including JND-guided noise injection and JND-guided image compression. Experimental results have demonstrated that our proposed JND model can achieve better performance than several latest JND models.

Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: an Assessment of Unsupervised Machine Learning Models arxiv:2108.05039 📈 1

Anastasia Dunca, Frederick R. Adler

**Abstract:** According to the National Cancer Institute, there were 9.5 million cancer-related deaths in 2018. A challenge in improving treatment is resistance in genetically unstable cells. The purpose of this study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors through analysis of single cell RNA sequencing(scRNAseq) data with a pipeline and evaluation metrics. scRNAseq quantifies mRNA in cells and characterizes cell phenotypes. One scRNAseq dataset was analyzed (tumor/non-tumor cells of different molecular subtypes and patient identifications). The pipeline consisted of data filtering, dimensionality reduction with Principal Component Analysis, projection with Uniform Manifold Approximation and Projection, clustering with nine approaches (Ward, BIRCH, Gaussian Mixture Model, DBSCAN, Spectral, Affinity Propagation, Agglomerative Clustering, Mean Shift, and K-Means), and evaluation. Seven models divided tumor versus non-tumor cells and molecular subtype while six models classified different patient identification (13 of which were presented in the dataset); K-Means, Ward, and BIRCH often ranked highest with ~80% accuracy on the tumor versus non-tumor task and ~60% for molecular subtype and patient ID. An optimized classification pipeline using K-Means, Ward, and BIRCH models was evaluated to be most effective for further analysis. In clinical research where there is currently no standard protocol for scRNAseq analysis, clusters generated from this pipeline can be used to understand cancer cell behavior and malignant growth, directly affecting the success of treatment.

Parallel algorithms for mining of frequent itemsets arxiv:2108.05038 📈 1

Robert Kessl

**Abstract:** In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the data do not fit into main memory and the processing time can become quite huge. Therefore, we need parallel data mining algorithms. One of the popular and important data mining algorithm is the algorithm for generation of so called frequent itemsets. The problem of mining of frequent itemsets can be explained on the following example: customers goes in a store put into theirs baskets some goods; the owner of the store collects the baskets and wants to know the set of goods that are bought together in at least p% of the baskets. Currently, the sequential algorithms for mining of frequent itemsets are quite good in the means of performance. However, the parallel algorithms for mining of frequent itemsets still do not achieve good speedup. In this thesis, we develop a parallel method for mining of frequent itemsets that can be used for an arbitrary depth first search sequential algorithms on a distributed memory parallel computer. Our method achieves speedup of ~ 6 on 10 processors. The method is based on an approximate estimation of processor load from a database sample - however it always computes the set of frequent itemsets from the whole database. In this thesis, we show a theory underlying our method and show the performance of the estimation process.

Does Explicit Prediction Matter in Deep Reinforcement Learning-Based Energy Management? arxiv:2108.05099 📈 0

Zhaoming Qin, Huaying Zhang, Yuzhou Zhao, Hong Xie, Junwei Cao

**Abstract:** As a model-free optimization and decision-making method, deep reinforcement learning (DRL) has been widely applied to the filed of energy management in energy Internet. While, some DRL-based energy management schemes also incorporate the prediction module used by the traditional model-based methods, which seems to be unnecessary and even adverse. In this work, we implement the standard energy management scheme with prediction using supervised learning and DRL, and the counterpart without prediction using end-to-end DRL. Then, these two schemes are compared in the unified energy management framework. The simulation results demonstrate that the energy management scheme without prediction is superior over the scheme with prediction. This work intends to rectify the misuse of DRL methods in the field of energy management.

Turning Your Strength against You: Detecting and Mitigating Robust and Universal Adversarial Patch Attacks arxiv:2108.05075 📈 0

Zitao Chen, Pritam Dash, Karthik Pattabiraman

**Abstract:** Adversarial patch attacks against image classification deep neural networks (DNNs), which inject arbitrary distortions within a bounded region of an image, can generate adversarial perturbations that are robust (i.e., remain adversarial in physical world) and universal (i.e., remain adversarial on any input). Such attacks can lead to severe consequences in real-world DNN-based systems. This work proposes Jujutsu, a technique to detect and mitigate robust and universal adversarial patch attacks. For detection, Jujutsu exploits the attacks' universal property - Jujutsu first locates the region of the potential adversarial patch, and then strategically transfers it to a dedicated region in a new image to determine whether it is truly malicious. For attack mitigation, Jujutsu leverages the attacks' localized nature via image inpainting to synthesize the semantic contents in the pixels that are corrupted by the attacks, and reconstruct the ``clean'' image. We evaluate Jujutsu on four diverse datasets (ImageNet, ImageNette, CelebA and Place365), and show that Jujutsu achieves superior performance and significantly outperforms existing techniques. We find that Jujutsu can further defend against different variants of the basic attack, including 1) physical-world attack; 2) attacks that target diverse classes; 3) attacks that construct patches in different shapes and 4) adaptive attacks.

Next Page