Summary for 2021-10-01, created on 2021-12-16

ResNet strikes back: An improved training procedure in timm arxiv:2110.00476 📈 197

Ross Wightman, Hugo Touvron, Hervé Jégou

**Abstract:** The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4% top-1 accuracy at resolution 224x224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure.

Batch size-invariance for policy optimization arxiv:2110.00641 📈 114

Jacob Hilton, Karl Cobbe, John Schulman

**Abstract:** We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the learning rate. However, some policy optimization algorithms (such as PPO) do not have this property, because of how they control the size of policy updates. In this work we show how to make these algorithms batch size-invariant. Our key insight is to decouple the proximal policy (used for controlling policy updates) from the behavior policy (used for off-policy corrections). Our experiments help explain why these algorithms work, and additionally show how they can make more efficient use of stale data.

Powerpropagation: A sparsity inducing weight reparameterisation arxiv:2110.00296 📈 24

Jonathan Schwarz, Siddhant M. Jayakumar, Razvan Pascanu, Peter E. Latham, Yee Whye Teh

**Abstract:** The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a "rich get richer" dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the use of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods.

Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads arxiv:2110.00459 📈 22

Guin Gilman, Robert J. Walls

**Abstract:** We investigate the performance of the concurrency mechanisms available on NVIDIA's new Ampere GPU microarchitecture under deep learning training and inference workloads. In contrast to previous studies that treat the GPU as a black box, we examine scheduling at the microarchitectural level. We find that the lack of fine-grained preemption mechanisms, robust task prioritization options, and contention-aware thread block placement policies limits the effectiveness of NVIDIA's concurrency mechanisms. In summary, the sequential nature of deep learning workloads and their fluctuating resource requirements and kernel runtimes make executing such workloads while maintaining consistently high utilization and low, predictable turnaround times difficult on current NVIDIA hardware.

Reconstruction for Powerful Graph Representations arxiv:2110.00577 📈 12

Leonardo Cotta, Christopher Morris, Bruno Ribeiro

**Abstract:** Graph neural networks (GNNs) have limited expressive power, failing to represent many graph classes correctly. While more expressive graph representation learning (GRL) alternatives can distinguish some of these classes, they are significantly harder to implement, may not scale well, and have not been shown to outperform well-tuned GNNs in real-world tasks. Thus, devising simple, scalable, and expressive GRL architectures that also achieve real-world improvements remains an open challenge. In this work, we show the extent to which graph reconstruction -- reconstructing a graph from its subgraphs -- can mitigate the theoretical and practical problems currently faced by GRL architectures. First, we leverage graph reconstruction to build two new classes of expressive graph representations. Secondly, we show how graph reconstruction boosts the expressive power of any GNN architecture while being a (provably) powerful inductive bias for invariances to vertex removals. Empirically, we show how reconstruction can boost GNN's expressive power -- while maintaining its invariance to permutations of the vertices -- by solving seven graph property tasks not solvable by the original GNN. Further, we demonstrate how it boosts state-of-the-art GNN's performance across nine real-world benchmark datasets.

Score-Based Generative Classifiers arxiv:2110.00473 📈 12

Roland S. Zimmermann, Lukas Schott, Yang Song, Benjamin A. Dunn, David A. Klindt

**Abstract:** The tremendous success of generative models in recent years raises the question whether they can also be used to perform classification. Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST, but this robustness has not been observed on more complex datasets like CIFAR-10. Additionally, on natural image datasets, previous results have suggested a trade-off between the likelihood of the data and classification accuracy. In this work, we investigate score-based generative models as classifiers for natural images. We show that these models not only obtain competitive likelihood values but simultaneously achieve state-of-the-art classification accuracy for generative classifiers on CIFAR-10. Nevertheless, we find that these models are only slightly, if at all, more robust than discriminative baseline models on out-of-distribution tasks based on common image corruptions. Similarly and contrary to prior results, we find that score-based are prone to worst-case distribution shifts in the form of adversarial perturbations. Our work highlights that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models. While they do not yet deliver on the promise of adversarial and out-of-domain robustness, they provide a different approach to classification that warrants further research.

TEACh: Task-driven Embodied Agents that Chat arxiv:2110.00534 📈 11

Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

**Abstract:** Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human--human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment to complete tasks varying in complexity from "Make Coffee" to "Prepare Breakfast", asking questions and getting additional information from the Commander. We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models' abilities in dialogue understanding, language grounding, and task execution.

PhiNets: a scalable backbone for low-power AI at the edge arxiv:2110.00337 📈 10

Francesco Paissan, Alberto Ancilotto, Elisabetta Farella

**Abstract:** In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as the end-nodes of a sensor network, is a challenging task and requires the design of innovative methods. In this work, we present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all the available resources. With a YoloV2 detection head and Simple Online and Realtime Tracking, the proposed architecture has achieved the state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets reduce the parameter count of 87% to 93% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on a STM32H743 microcontroller (MCU) with 2MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub.

Learning Reward Functions from Scale Feedback arxiv:2110.00284 📈 10

Nils Wilde, Erdem Bıyık, Dorsa Sadigh, Stephen L. Smith

**Abstract:** Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We introduce a probabilistic model on how users would provide feedback and derive a learning framework for the robot. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies suggesting that scale feedback enables more effective learning in practice.

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation arxiv:2110.00704 📈 8

Josiah Wong, Viktor Makoviychuk, Anima Anandkumar, Yuke Zhu

**Abstract:** Learning performant robot manipulation policies can be challenging due to high-dimensional continuous actions and complex physics-based dynamics. This can be alleviated through intelligent choice of action space. Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. Nonetheless, its strength depends on the underlying modeling fidelity, and is prone to failure when there are modeling errors. In this work, we propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors by inferring relevant dynamics parameters from online trajectories. OSCAR decomposes dynamics learning into task-agnostic and task-specific phases, decoupling the dynamics dependencies of the robot and the extrinsics due to its environment. This structure enables robust zero-shot performance under out-of-distribution and rapid adaptation to significant domain shifts through additional finetuning. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines. For more results and information, please visit https://cremebrule.github.io/oscar-web/.

Do Self-Supervised and Supervised Methods Learn Similar Visual Representations? arxiv:2110.00528 📈 8

Tom George Grigg, Dan Busbridge, Jason Ramapuram, Russ Webb

**Abstract:** Despite the success of a number of recent techniques for visual self-supervised deep learning, there has been limited investigation into the representations that are ultimately learned. By leveraging recent advances in the comparison of neural representations, we explore in this direction by comparing a contrastive self-supervised algorithm to supervision for simple image data in a common architecture. We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers. We investigate this divergence, finding that these layers strongly fit to their distinct learning objectives. We also find that the contrastive objective implicitly fits the supervised objective in intermediate layers, but that the reverse is not true. Our work particularly highlights the importance of the learned intermediate representations, and raises critical questions for auxiliary task design.

Smooth Normalizing Flows arxiv:2110.00351 📈 8

Jonas Köhler, Andreas Krämer, Frank Noé

**Abstract:** Normalizing flows are a promising tool for modeling probability distributions in physical systems. While state-of-the-art flows accurately approximate distributions and energies, applications in physics additionally require smooth energies to compute forces and higher-order derivatives. Furthermore, such densities are often defined on non-trivial topologies. A recent example are Boltzmann Generators for generating 3D-structures of peptides and small proteins. These generative models leverage the space of internal coordinates (dihedrals, angles, and bonds), which is a product of hypertori and compact intervals. In this work, we introduce a class of smooth mixture transformations working on both compact intervals and hypertori. Mixture transformations employ root-finding methods to invert them in practice, which has so far prevented bi-directional flow training. To this end, we show that parameter gradients and forces of such inverses can be computed from forward evaluations via the inverse function theorem. We demonstrate two advantages of such smooth flows: they allow training by force matching to simulation data and can be used as potentials in molecular dynamics simulations.

Is There More Pattern in Knowledge Graph? Exploring Proximity Pattern for Knowledge Graph Embedding arxiv:2110.00720 📈 7

Ren Li, Yanan Cao, Qiannan Zhu, Xiaoxue Li, Fang Fang

**Abstract:** Modeling of relation pattern is the core focus of previous Knowledge Graph Embedding works, which represents how one entity is related to another semantically by some explicit relation. However, there is a more natural and intuitive relevancy among entities being always ignored, which is that how one entity is close to another semantically, without the consideration of any explicit relation. We name such semantic phenomenon in knowledge graph as proximity pattern. In this work, we explore the problem of how to define and represent proximity pattern, and how it can be utilized to help knowledge graph embedding. Firstly, we define the proximity of any two entities according to their statistically shared queries, then we construct a derived graph structure and represent the proximity pattern from global view. Moreover, with the original knowledge graph, we design a Chained couPle-GNN (CP-GNN) architecture to deeply merge the two patterns (graphs) together, which can encode a more comprehensive knowledge embedding. Being evaluated on FB15k-237 and WN18RR datasets, CP-GNN achieves state-of-the-art results for Knowledge Graph Completion task, and can especially boost the modeling capacity for complex queries that contain multiple answer entities, proving the effectiveness of introduced proximity pattern.

Personalized Retrogress-Resilient Framework for Real-World Medical Federated Learning arxiv:2110.00394 📈 7

Zhen Chen, Meilu Zhu, Chen Yang, Yixuan Yuan

**Abstract:** Nowadays, deep learning methods with large-scale datasets can produce clinically useful models for computer-aided diagnosis. However, the privacy and ethical concerns are increasingly critical, which make it difficult to collect large quantities of data from multiple institutions. Federated Learning (FL) provides a promising decentralized solution to train model collaboratively by exchanging client models instead of private data. However, the server aggregation of existing FL methods is observed to degrade the model performance in real-world medical FL setting, which is termed as retrogress. To address this problem, we propose a personalized retrogress-resilient framework to produce a superior personalized model for each client. Specifically, we devise a Progressive Fourier Aggregation (PFA) at the server to achieve more stable and effective global knowledge gathering by integrating client models from low-frequency to high-frequency gradually. Moreover, with an introduced deputy model to receive the aggregated server model, we design a Deputy-Enhanced Transfer (DET) strategy at the client and conduct three steps of Recover-Exchange-Sublimate to ameliorate the personalized local model by transferring the global knowledge smoothly. Extensive experiments on real-world dermoscopic FL dataset prove that our personalized retrogress-resilient framework outperforms state-of-the-art FL methods, as well as the generalization on an out-of-distribution cohort. The code and dataset are available at https://github.com/CityU-AIM-Group/PRR-FL.

TyXe: Pyro-based Bayesian neural nets for Pytorch arxiv:2110.00276 📈 7

Hippolyt Ritter, Theofanis Karaletsos

**Abstract:** We introduce TyXe, a Bayesian neural network library built on top of Pytorch and Pyro. Our leading design principle is to cleanly separate architecture, prior, inference and likelihood specification, allowing for a flexible workflow where users can quickly iterate over combinations of these components. In contrast to existing packages TyXe does not implement any layer classes, and instead relies on architectures defined in generic Pytorch code. TyXe then provides modular choices for canonical priors, variational guides, inference techniques, and layer selections for a Bayesian treatment of the specified architecture. Sampling tricks for variance reduction, such as local reparameterization or flipout, are implemented as effect handlers, which can be applied independently of other specifications. We showcase the ease of use of TyXe to explore Bayesian versions of popular models from various libraries: toy regression with a pure Pytorch neural network; large-scale image classification with torchvision ResNets; graph neural networks based on DGL; and Neural Radiance Fields built on top of Pytorch3D. Finally, we provide convenient abstractions for variational continual learning. In all cases the change from a deterministic to a Bayesian neural network comes with minimal modifications to existing code, offering a broad range of researchers and practitioners alike practical access to uncertainty estimation techniques. The library is available at https://github.com/TyXe-BDL/TyXe.

A Survey of Knowledge Enhanced Pre-trained Models arxiv:2110.00269 📈 7

Jian Yang, Gang Xiao, Yulong Shen, Wei Jiang, Xinyu Hu, Ying Zhang, Jinghui Peng

**Abstract:** Pre-trained models learn contextualized word representations on large-scale text corpus through a self-supervised learning method, which has achieved promising performance after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. Pre-trained models with knowledge injection, which we call knowledge enhanced pre-trained models (KEPTMs), possess deep understanding and logical reasoning and introduce interpretability to some extent. In this survey, we provide a comprehensive overview of KEPTMs for natural language processing. We first introduce the progress of pre-trained models and knowledge representation learning. Then we systematically categorize existing KEPTMs from three different perspectives. Finally, we outline some potential directions of KEPTMs for future research.

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification arxiv:2110.00685 📈 6

Jiong Zhang, Wei-cheng Chang, Hsiang-fu Yu, Inderjit S. Dhillon

**Abstract:** Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%.

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) arxiv:2110.00684 📈 6

Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne

**Abstract:** A central goal in deep learning is to learn compact representations of features at every layer of a neural network, which is useful for both unsupervised representation learning and structured network pruning. While there is a growing body of work in structured pruning, current state-of-the-art methods suffer from two key limitations: (i) instability during training, and (ii) need for an additional step of fine-tuning, which is resource-intensive. At the core of these limitations is the lack of a systematic approach that jointly prunes and refines weights during training in a single stage, and does not require any fine-tuning upon convergence to achieve state-of-the-art performance. We present a novel single-stage structured pruning method termed DiscriminAtive Masking (DAM). The key intuition behind DAM is to discriminatively prefer some of the neurons to be refined during the training process, while gradually masking out other neurons. We show that our proposed DAM approach has remarkably good performance over various applications, including dimensionality reduction, recommendation system, graph representation learning, and structured pruning for image classification. We also theoretically show that the learning objective of DAM is directly related to minimizing the L0 norm of the masking layer.

Unpacking the Interdependent Systems of Discrimination: Ableist Bias in NLP Systems through an Intersectional Lens arxiv:2110.00521 📈 6

Saad Hassan, Matt Huenerfauth, Cecilia Ovesdotter Alm

**Abstract:** Much of the world's population experiences some form of disability during their lifetime. Caution must be exercised while designing natural language processing (NLP) systems to prevent systems from inadvertently perpetuating ableist bias against people with disabilities, i.e., prejudice that favors those with typical abilities. We report on various analyses based on word predictions of a large-scale BERT language model. Statistically significant results demonstrate that people with disabilities can be disadvantaged. Findings also explore overlapping forms of discrimination related to interconnected gender and race identities.

Belief propagation for permutations, rankings, and partial orders arxiv:2110.00513 📈 6

George T. Cantwell, Cristopher Moore

**Abstract:** Many datasets give partial information about an ordering or ranking by indicating which team won a game, which item a user prefers, or who infected whom. We define a continuous spin system whose Gibbs distribution is the posterior distribution on permutations, given a probabilistic model of these interactions. Using the cavity method we derive a belief propagation algorithm that computes the marginal distribution of each node's position. In addition, the Bethe free energy lets us approximate the number of linear extensions of a partial order and perform model selection.

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference arxiv:2110.00478 📈 6

Jude Haris, Perry Gibson, José Cano, Nicolas Bohm Agostini, David Kaeli

**Abstract:** Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput and lower power consumption than general purpose processors, they are especially well-suited for DNN acceleration. However, existing solutions for designing FPGA-based DNN accelerators for edge devices come with high development overheads, given the cost of repeated FPGA synthesis passes, reimplementation in a Hardware Description Language (HDL) of the simulated design, and accelerator system integration. In this paper we propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized DNN inference accelerators on edge devices with FPGAs. SECDA combines cost-effective SystemC simulation with hardware execution, streamlining design space exploration and the development process via reduced design evaluation time. As a case study, we use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We quickly and iteratively explore the system's hardware/software stack, while identifying and mitigating performance bottlenecks. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$\times$ with a 2.9$\times$ reduction in energy consumption over CPU-only inference. Our code is available at https://github.com/gicLAB/SECDA

Sim and Real: Better Together arxiv:2110.00445 📈 6

Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

**Abstract:** Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samples from the high throughput but less accurate simulation and the low-throughput, high-fidelity and costly samples from the real environment. We achieve that by maintaining a replay buffer for each environment the agent interacts with. We analyze such multi-environment interaction theoretically, and provide convergence properties, through a novel theoretical replay buffer analysis. We demonstrate the efficacy of our method on a sim-to-real environment.

Improving Object Permanence using Agent Actions and Reasoning arxiv:2110.00238 📈 6

Ying Siu Liang, Chen Zhang, Dongkyu Choi, Kenneth Kwok

**Abstract:** Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation actions performed on an object prior to its disappearance allows us to reason about its location, e.g., that the object has been placed in a carrier. In this paper we argue that object permanence can be improved when the robot uses knowledge about executed actions and describe an approach to infer hidden object states from agent actions. We show that considering agent actions not only improves rule-based reasoning models but also purely neural approaches, showing its general applicability. Then, we conduct quantitative experiments on a snitch localization task using a dataset of 1,371 synthesized videos, where we compare the performance of different object permanence models with and without action annotations. We demonstrate that models with action annotations can significantly increase performance of both neural and rule-based approaches. Finally, we evaluate the usability of our approach in real-world applications by conducting qualitative experiments with two Universal Robots (UR5 and UR16e) in both lab and industrial settings. The robots complete benchmark tasks for a gearbox assembly and demonstrate the object permanence capabilities with real sensor data in an industrial environment.

Surrogate-Based Black-Box Optimization Method for Costly Molecular Properties arxiv:2110.03522 📈 5

Jules Leguy, Thomas Cauchy, Beatrice Duval, Benoit Da Mota

**Abstract:** AI-assisted molecular optimization is a very active research field as it is expected to provide the next-generation drugs and molecular materials. An important difficulty is that the properties to be optimized rely on costly evaluations. Machine learning methods are investigated with success to predict these properties, but show generalization issues on less known areas of the chemical space. We propose here a surrogate-based black box optimization method, to tackle jointly the optimization and machine learning problems. It consists in optimizing the expected improvement of the surrogate of a molecular property using an evolutionary algorithm. The surrogate is defined as a Gaussian Process Regression (GPR) model, learned on a relevant area of the search space with respect to the property to be optimized. We show that our approach can successfully optimize a costly property of interest much faster than a purely metaheuristic approach.

Robust and Decomposable Average Precision for Image Retrieval arxiv:2110.01445 📈 5

Elias Ramzi, Nicolas Thome, Clément Rambour, Nicolas Audebert, Xavier Bitot

**Abstract:** In image retrieval, standard evaluation metrics rely on score ranking, e.g. average precision (AP). In this paper, we introduce a method for robust and decomposable average precision (ROADMAP) addressing two major challenges for end-to-end training of deep neural networks with AP: non-differentiability and non-decomposability. Firstly, we propose a new differentiable approximation of the rank function, which provides an upper bound of the AP loss and ensures robust training. Secondly, we design a simple yet effective loss function to reduce the decomposability gap between the AP in the whole training set and its averaged batch approximation, for which we provide theoretical guarantees. Extensive experiments conducted on three image retrieval datasets show that ROADMAP outperforms several recent AP approximation methods and highlight the importance of our two contributions. Finally, using ROADMAP for training deep models yields very good performances, outperforming state-of-the-art results on the three datasets.

Universal Adversarial Spoofing Attacks against Face Recognition arxiv:2110.00708 📈 5

Takuma Amada, Seng Pei Liew, Kazuya Kakizaki, Toshinori Araki

**Abstract:** We assess the vulnerabilities of deep face recognition systems for images that falsify/spoof multiple identities simultaneously. We demonstrate that, by manipulating the deep feature representation extracted from a face image via imperceptibly small perturbations added at the pixel level using our proposed Universal Adversarial Spoofing Examples (UAXs), one can fool a face verification system into recognizing that the face image belongs to multiple different identities with a high success rate. One characteristic of the UAXs crafted with our method is that they are universal (identity-agnostic); they are successful even against identities not known in advance. For a certain deep neural network, we show that we are able to spoof almost all tested identities (99\%), including those not known beforehand (not included in training). Our results indicate that a multiple-identity attack is a real threat and should be taken into account when deploying face recognition systems.

Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning arxiv:2110.00678 📈 5

Toshiko Shibano, Xinyi Zhang, Mia Taige Li, Haejin Cho, Peter Sullivan, Muhammad Abdul-Mageed

**Abstract:** To address the performance gap of English ASR models on L2 English speakers, we evaluate fine-tuning of pretrained wav2vec 2.0 models (Baevski et al., 2020; Xu et al., 2021) on L2-ARCTIC, a non-native English speech corpus (Zhao et al., 2018) under different training settings. We compare \textbf{(a)} models trained with a combination of diverse accents to ones trained with only specific accents and \textbf{(b)} results from different single-accent models. Our experiments demonstrate the promise of developing ASR models for non-native English speakers, even with small amounts of L2 training data and even without a language model. Our models also excel in the zero-shot setting where we train on multiple L2 datasets and test on a blind L2 test set.

Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models arxiv:2110.00672 📈 5

Robert Wolfe, Aylin Caliskan

**Abstract:** We use a dataset of U.S. first names with labels based on predominant gender and racial group to examine the effect of training corpus frequency on tokenization, contextualization, similarity to initial representation, and bias in BERT, GPT-2, T5, and XLNet. We show that predominantly female and non-white names are less frequent in the training corpora of these four language models. We find that infrequent names are more self-similar across contexts, with Spearman's r between frequency and self-similarity as low as -.763. Infrequent names are also less similar to initial representation, with Spearman's r between frequency and linear centered kernel alignment (CKA) similarity to initial representation as high as .702. Moreover, we find Spearman's r between racial bias and name frequency in BERT of .492, indicating that lower-frequency minority group names are more associated with unpleasantness. Representations of infrequent names undergo more processing, but are more self-similar, indicating that models rely on less context-informed representations of uncommon and minority names which are overfit to a lower number of observed contexts.

ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets arxiv:2110.00635 📈 5

Rebecca M. C. Taylor, Johan A. du Preez

**Abstract:** Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Additionally, to perform more fine grained evaluations and comparisons, we use simulations that enable comparisons with the ground truth via Kullback-Leibler divergence (KLD). Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.

Reconstructing group wavelet transform from feature maps with a reproducing kernel iteration arxiv:2110.00600 📈 5

Davide Barbieri

**Abstract:** In this paper we consider the problem of reconstructing an image that is downsampled in the space of its $SE(2)$ wavelet transform, which is motivated by classical models of simple cells receptive fields and feature preference maps in primary visual cortex. We prove that, whenever the problem is solvable, the reconstruction can be obtained by an elementary project and replace iterative scheme based on the reproducing kernel arising from the group structure, and show numerical results on real images.

Unsupervised Motion Representation Learning with Capsule Autoencoders arxiv:2110.00529 📈 5

Ziwei Xu, Xudong Shen, Yongkang Wong, Mohan S Kankanhalli

**Abstract:** We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. MCAE models motion in a two-level hierarchy. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets. Notably, it achieves better results than baselines on Trajectory20 with considerably fewer parameters and state-of-the-art performance on the unsupervised skeleton-based action recognition task.

Optic Disc Segmentation using Disk-Centered Patch Augmentation arxiv:2110.00512 📈 5

Saeid Motevali, Aashis Khanal, Rolando Estrada

**Abstract:** The optic disc is a crucial diagnostic feature in the eye since changes to its physiognomy is correlated with the severity of various ocular and cardiovascular diseases. While identifying the bulk of the optic disc in a color fundus image is straightforward, accurately segmenting its boundary at the pixel level is very challenging. In this work, we propose disc-centered patch augmentation (DCPA) -- a simple, yet novel training scheme for deep neural networks -- to address this problem. DCPA achieves state-of-the-art results on full-size images even when using small neural networks, specifically a U-Net with only 7 million parameters as opposed to the original 31 million. In DCPA, we restrict the training data to patches that fully contain the optic nerve. In addition, we also train the network using dynamic cost functions to increase its robustness. We tested DCPA-trained networks on five retinal datasets: DRISTI, DRIONS-DB, DRIVE, AV-WIDE, and CHASE-DB. The first two had available optic disc ground truth, and we manually estimated the ground truth for the latter three. Our approach achieved state-of-the-art F1 and IOU results on four datasets (95 % F1, 91 % IOU on DRISTI; 92 % F1, 84 % IOU on DRIVE; 83 % F1, 71 % IOU on AV-WIDE; 83 % F1, 71 % IOU on CHASEDB) and competitive results on the fifth (95 % F1, 91 % IOU on DRIONS-DB), confirming its generality. Our open-source code and ground-truth annotations are available at: https://github.com/saeidmotevali/fundusdisk

Topologically-Informed Atlas Learning arxiv:2110.00429 📈 5

Thomas Cohn, Nikhil Devraj, Odest Chadwicke Jenkins

**Abstract:** We present a new technique that enables manifold learning to accurately embed data manifolds that contain holes, without discarding any topological information. Manifold learning aims to embed high dimensional data into a lower dimensional Euclidean space by learning a coordinate chart, but it requires that the entire manifold can be embedded in a single chart. This is impossible for manifolds with holes. In such cases, it is necessary to learn an atlas: a collection of charts that collectively cover the entire manifold. We begin with many small charts, and combine them in a bottom-up approach, where charts are only combined if doing so will not introduce problematic topological features. When it is no longer possible to combine any charts, each chart is individually embedded with standard manifold learning techniques, completing the construction of the atlas. We show the efficacy of our method by constructing atlases for challenging synthetic manifolds; learning human motion embeddings from motion capture data; and learning kinematic models of articulated objects.

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation arxiv:2110.00329 📈 5

Zheng Li, Xiang Li, Lingfeng Yang, Jian Yang, Zhigeng Pan

**Abstract:** Knowledge distillation usually transfers the knowledge from a pre-trained cumbersome teacher network to a compact student network, which follows the classical teacher-teaching-student paradigm. Based on this paradigm, previous methods mostly focus on how to efficiently train a better student network for deployment. Different from the existing practices, in this paper, we propose a novel student-helping-teacher formula, Teacher Evolution via Self-Knowledge Distillation (TESKD), where the target teacher (for deployment) is learned with the help of multiple hierarchical students by sharing the structural backbone. The diverse feedback from multiple students allows the teacher to improve itself through the shared feature representations. The effectiveness of our proposed framework is demonstrated by extensive experiments with various network settings on two standard benchmarks including CIFAR-100 and ImageNet. Notably, when trained together with our proposed method, ResNet-18 achieves 79.15% and 71.14% accuracy on CIFAR-100 and ImageNet, outperforming the baseline results by 4.74% and 1.43%, respectively. The code is available at: https://github.com/zhengli427/TESKD.

CCS-GAN: COVID-19 CT-scan classification with very few positive training images arxiv:2110.01605 📈 4

Sumeet Menon, Jayalakshmi Mangalagiri, Josh Galita, Michael Morris, Babak Saboury, Yaacov Yesha, Yelena Yesha, Phuong Nguyen, Aryya Gangopadhyay, David Chapman

**Abstract:** We present a novel algorithm that is able to classify COVID-19 pneumonia from CT Scan slices using a very small sample of training images exhibiting COVID-19 pneumonia in tandem with a larger number of normal images. This algorithm is able to achieve high classification accuracy using as few as 10 positive training slices (from 10 positive cases), which to the best of our knowledge is one order of magnitude fewer than the next closest published work at the time of writing. Deep learning with extremely small positive training volumes is a very difficult problem and has been an important topic during the COVID-19 pandemic, because for quite some time it was difficult to obtain large volumes of COVID-19 positive images for training. Algorithms that can learn to screen for diseases using few examples are an important area of research. We present the Cycle Consistent Segmentation Generative Adversarial Network (CCS-GAN). CCS-GAN combines style transfer with pulmonary segmentation and relevant transfer learning from negative images in order to create a larger volume of synthetic positive images for the purposes of improving diagnostic classification performance. The performance of a VGG-19 classifier plus CCS-GAN was trained using a small sample of positive image slices ranging from at most 50 down to as few as 10 COVID-19 positive CT-scan images. CCS-GAN achieves high accuracy with few positive images and thereby greatly reduces the barrier of acquiring large training volumes in order to train a diagnostic classifier for COVID-19.

Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS arxiv:2110.00604 📈 4

Tommaso Giovannelli, Griffin Kent, Luis Nunes Vicente

**Abstract:** Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as neural architecture search, continual learning, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. The goal of this paper is twofold. First, we aim at promoting the use of bilevel optimization in large-scale learning and we introduce a practical bilevel stochastic gradient method (BSG-1) that requires neither lower level second-order derivatives nor system solves (and dismisses any matrix-vector products). Our BSG-1 method is close to first-order principles, which allows it to achieve a performance better than those that are not, such as DARTS. Second, we develop bilevel stochastic gradient descent for bilevel problems with lower level constraints, and we introduce a convergence theory that covers the unconstrained and constrained cases and abstracts as much as possible from the specifics of the bilevel gradient calculation.

Conditional Deep Gaussian Processes: empirical Bayes hyperdata learning arxiv:2110.00568 📈 4

Chi-Ken Lu, Patrick Shafto

**Abstract:** It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success in adopting a deep network for feature extraction followed by a GP used as function model. Recently,it was suggested that, albeit training with marginal likelihood, the deterministic nature of feature extractor might lead to overfitting while the replacement with a Bayesian network seemed to cure it. Here, we propose the conditional Deep Gaussian Process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. We follow our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We shall show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.

A Graph-theoretic Algorithm for Small Bowel Path Tracking in CT Scans arxiv:2110.00466 📈 4

Seung Yeon Shin, Sungwon Lee, Ronald M. Summers

**Abstract:** We present a novel graph-theoretic method for small bowel path tracking. It is formulated as finding the minimum cost path between given start and end nodes on a graph that is constructed based on the bowel wall detection. We observed that a trivial solution with many short-cuts is easily made even with the wall detection, where the tracked path penetrates indistinct walls around the contact between different parts of the small bowel. Thus, we propose to include must-pass nodes in finding the path to better cover the entire course of the small bowel. The proposed method does not entail training with ground-truth paths while the previous methods do. We acquired ground-truth paths that are all connected from start to end of the small bowel for 10 abdominal CT scans, which enables the evaluation of the path tracking for the entire course of the small bowel. The proposed method showed clear improvements in terms of several metrics compared to the baseline method. The maximum length of the path that is tracked without an error per scan, by the proposed method, is above 800mm on average.

MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation arxiv:2110.00464 📈 4

Jonas Heylen, Mark De Wolf, Bruno Dawagne, Marc Proesmans, Luc Van Gool, Wim Abbeloos, Hazem Abdelkawy, Daniel Olmeda Reino

**Abstract:** Monocular 3D object detection has recently shown promising results, however there remain challenging problems. One of those is the lack of invariance to different camera intrinsic parameters, which can be observed across different 3D object datasets. Little effort has been made to exploit the combination of heterogeneous 3D object datasets. In contrast to general intuition, we show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data. In this paper we propose a category-level pose estimation method based on instance segmentation, using camera independent geometric reasoning to cope with the varying camera viewpoints and intrinsics of different datasets. Every pixel of an instance predicts the object dimensions, the 3D object reference points projected in 2D image space and, optionally, the local viewing angle. Camera intrinsics are only used outside of the learned network to lift the predicted 2D reference points to 3D. We surpass camera independent methods on the challenging KITTI3D benchmark and show the key benefits compared to camera dependent methods.

Guiding Evolutionary Strategies by Differentiable Robot Simulators arxiv:2110.00438 📈 4

Vladislav Kurenkov, Bulat Maksudov

**Abstract:** In recent years, Evolutionary Strategies were actively explored in robotic tasks for policy search as they provide a simpler alternative to reinforcement learning algorithms. However, this class of algorithms is often claimed to be extremely sample-inefficient. On the other hand, there is a growing interest in Differentiable Robot Simulators (DRS) as they potentially can find successful policies with only a handful of trajectories. But the resulting gradient is not always useful for the first-order optimization. In this work, we demonstrate how DRS gradient can be used in conjunction with Evolutionary Strategies. Preliminary results suggest that this combination can reduce sample complexity of Evolutionary Strategies by 3x-5x times in both simulation and the real world.

GAN-based Reactive Motion Synthesis with Class-aware Discriminators for Human-human Interaction arxiv:2110.00380 📈 4

Qianhui Men, Hubert P. H. Shum, Edmond S. L. Ho, Howard Leung

**Abstract:** Creating realistic characters that can react to the users' or another character's movement can benefit computer graphics, games and virtual reality hugely. However, synthesizing such reactive motions in human-human interactions is a challenging task due to the many different ways two humans can interact. While there are a number of successful researches in adapting the generative adversarial network (GAN) in synthesizing single human actions, there are very few on modelling human-human interactions. In this paper, we propose a semi-supervised GAN system that synthesizes the reactive motion of a character given the active motion from another character. Our key insights are two-fold. First, to effectively encode the complicated spatial-temporal information of a human motion, we empower the generator with a part-based long short-term memory (LSTM) module, such that the temporal movement of different limbs can be effectively modelled. We further include an attention module such that the temporal significance of the interaction can be learned, which enhances the temporal alignment of the active-reactive motion pair. Second, as the reactive motion of different types of interactions can be significantly different, we introduce a discriminator that not only tells if the generated movement is realistic or not, but also tells the class label of the interaction. This allows the use of such labels in supervising the training of the generator. We experiment with the SBU and the HHOI datasets. The high quality of the synthetic motion demonstrates the effective design of our generator, and the discriminability of the synthesis also demonstrates the strength of our discriminator.

DCT based Fusion of Variable Exposure Images for HDRI arxiv:2110.00312 📈 4

Vivek Ramakarishnan, Dnyaneshwar Jageshwar Pete

**Abstract:** Combining images with different exposure settings are of prime importance in the field of computational photography. Both transform domain approach and filtering based approaches are possible for fusing multiple exposure images, to obtain the well-exposed image. We propose a Discrete Cosine Transform (DCT-based) approach for fusing multiple exposure images. The input image stack is processed in the transform domain by an averaging operation and the inverse transform is performed on the averaged image obtained to generate the fusion of multiple exposure image. The experimental observation leads us to the conjecture that the obtained DCT coefficients are indicators of parameters to measure well-exposedness, contrast and saturation as specified in the traditional exposure fusion based approach and the averaging performed indicates equal weights assigned to the DCT coefficients in this non-parametric and non pyramidal approach to fuse the multiple exposure stack.

From SLAM to Situational Awareness: Challenges and Survey arxiv:2110.00273 📈 4

Hriday Bavle, Jose Luis Sanchez-Lopez, Eduardo F. Schmidt, Holger Voos

**Abstract:** The knowledge that an intelligent and autonomous mobile robot has and is able to acquire of itself and the environment, namely the situation, limits its reasoning, decision-making, and execution skills to efficiently and safely perform complex missions. Situational awareness is a basic capability of humans that has been deeply studied in fields like Psychology, Military, Aerospace, Education, etc., but it has barely been considered in robotics, which has focused on ideas such as sensing, perception, sensor fusion, state estimation, localization and mapping, spatial AI, etc. In our research, we connected the broad multidisciplinary existing knowledge on situational awareness with its counterpart in mobile robotics. In this paper, we survey the state-of-the-art robotics algorithms, we analyze the situational awareness aspects that have been covered by them, and we discuss their missing points. We found out that the existing robotics algorithms are still missing manifold important aspects of situational awareness. As a consequence, we conclude that these missing features are limiting the performance of robotic situational awareness, and further research is needed to overcome this challenge. We see this as an opportunity, and provide our vision for future research on robotic situational awareness.

Inductive Representation Learning in Temporal Networks via Mining Neighborhood and Community Influences arxiv:2110.00267 📈 4

Meng Liu, Yong Liu

**Abstract:** Network representation learning aims to generate an embedding for each node in a network, which facilitates downstream machine learning tasks such as node classification and link prediction. Current work mainly focuses on transductive network representation learning, i.e. generating fixed node embeddings, which is not suitable for real-world applications. Therefore, we propose a new inductive network representation learning method called MNCI by mining neighborhood and community influences in temporal networks. We propose an aggregator function that integrates neighborhood influence with community influence to generate node embeddings at any time. We conduct extensive experiments on several real-world datasets and compare MNCI with several state-of-the-art baseline methods on various tasks, including node classification and network visualization. The experimental results show that MNCI achieves better performance than baselines.

Breast Cancer Diagnosis in Two-View Mammography Using End-to-End Trained EfficientNet-Based Convolutional Network arxiv:2110.01606 📈 3

Daniel G. P. Petrini, Carlos Shimizu, Rosimeire A. Roela, Gabriel V. Valente, Maria A. A. K. Folgueira, Hae Yong Kim

**Abstract:** Some recent studies have described deep convolutional neural networks to diagnose breast cancer in mammograms with similar or even superior performance to that of human experts. Shen et al. (2019) present one of the best techniques that consists of two transfer learnings. The first uses a model trained on natural images to create a "patch classifier" that categorizes small subimages. The second uses the patch classifier to scan the whole mammogram and create the "single-view whole-image classifier". We propose to make a third transfer learning to obtain a "two-view classifier" to use the two mammographic views: bilateral craniocaudal and mediolateral oblique. We use modern EfficientNet as the basis of our model. We "end-to-end" train the entire system using CBIS-DDSM dataset. To ensure statistical robustness, we test our system twice using: (a) 5-fold cross validation; and (b) the original training/test division of the dataset. Our technique reached an AUC of 0.934 using 5-fold cross validation (sensitivity and specificity are 85.13% at the equal error rate of ROC). Using the original dataset division, our technique achieved an AUC of 0.8483, the largest AUC reported for this problem, as far as we know.

Improving Zero-shot Multilingual Neural Machine Translation for Low-Resource Languages arxiv:2110.00712 📈 3

Chenyang Li, Gongxu Luo

**Abstract:** Although the multilingual Neural Machine Translation(NMT), which extends Google's multilingual NMT, has ability to perform zero-shot translation and the iterative self-learning algorithm can improve the quality of zero-shot translation, it confronts with two problems: the multilingual NMT model is prone to generate wrong target language when implementing zero-shot translation; the self-learning algorithm, which uses beam search to generate synthetic parallel data, demolishes the diversity of the generated source language and amplifies the impact of the same noise during the iterative learning process. In this paper, we propose the tagged-multilingual NMT model and improve the self-learning algorithm to handle these two problems. Firstly, we extend the Google's multilingual NMT model and add target tokens to the target languages, which associates the start tag with the target language to ensure that the source language can be translated to the required target language. Secondly, we improve the self-learning algorithm by replacing beam search with random sample to increases the diversity of the generated data and makes it properly cover the true data distribution. Experimental results on IWSLT show that the adjusted tagged-multilingual NMT separately obtains 9.41 and 7.85 BLEU scores over the multilingual NMT on 2010 and 2017 Romanian-Italian test sets. Similarly, it obtains 9.08 and 7.99 BLEU scores on Italian-Romanian zero-shot translation. Furthermore, the improved self-learning algorithm shows its superiorities over the conventional self-learning algorithm on zero-shot translations.

Multi-view SA-LA Net: A framework for simultaneous segmentation of RV on multi-view cardiac MR Images arxiv:2110.00682 📈 3

Sana Jabbar, Syed Talha Bukhari, Hassan Mohy-ud-Din

**Abstract:** We proposed a multi-view SA-LA model for simultaneous segmentation of RV on the short-axis (SA) and long-axis (LA) cardiac MR images. The multi-view SA-LA model is a multi-encoder, multi-decoder U-Net architecture based on the U-Net model. One encoder-decoder pair segments the RV on SA images and the other pair on LA images. Multi-view SA-LA model assembles an extremely rich set of synergistic features, at the root of the encoder branch, by combining feature maps learned from matched SA and LA cardiac MR images. Segmentation performance is further enhanced by: (1) incorporating spatial context of LV as a prior and (2) performing deep supervision in the last three layers of the decoder branch. Multi-view SA-LA model was extensively evaluated on the MICCAI 2021 Multi- Disease, Multi-View, and Multi- Centre RV Segmentation Challenge dataset (M&Ms-2021). M&Ms-2021 dataset consists of multi-phase, multi-view cardiac MR images of 360 subjects acquired at four clinical centers with three different vendors. On the challenge cohort (160 subjects), the proposed multi-view SA-LA model achieved a Dice Score of 91% and Hausdorff distance of 11.2 mm on short-axis images and a Dice Score of 89.6% and Hausdorff distance of 8.1 mm on long-axis images. Moreover, multi-view SA-LA model exhibited strong generalization to unseen RV related pathologies including Dilated Right Ventricle (DSC: SA 91.41%, LA 89.63%) and Tricuspidal Regurgitation (DSC: SA 91.40%, LA 90.40%) with low variance (std_DSC: SA <5%, LA<6%).

Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration arxiv:2110.00653 📈 3

Yan Sun, Wenjun Xiong, Faming Liang

**Abstract:** Deep learning has powered recent successes of artificial intelligence (AI). However, the deep neural network, as the basic model of deep learning, has suffered from issues such as local traps and miscalibration. In this paper, we provide a new framework for sparse deep learning, which has the above issues addressed in a coherent way. In particular, we lay down a theoretical foundation for sparse deep learning and propose prior annealing algorithms for learning sparse neural networks. The former has successfully tamed the sparse deep neural network into the framework of statistical modeling, enabling prediction uncertainty correctly quantified. The latter can be asymptotically guaranteed to converge to the global optimum, enabling the validity of the down-stream statistical inference. Numerical result indicates the superiority of the proposed method compared to the existing ones.

Motion Planning for Autonomous Vehicles in the Presence of Uncertainty Using Reinforcement Learning arxiv:2110.00640 📈 3

Kasra Rezaee, Peyman Yadmellat, Simon Chamorro

**Abstract:** Motion planning under uncertainty is one of the main challenges in developing autonomous driving vehicles. In this work, we focus on the uncertainty in sensing and perception, resulted from a limited field of view, occlusions, and sensing range. This problem is often tackled by considering hypothetical hidden objects in occluded areas or beyond the sensing range to guarantee passive safety. However, this may result in conservative planning and expensive computation, particularly when numerous hypothetical objects need to be considered. We propose a reinforcement learning (RL) based solution to manage uncertainty by optimizing for the worst case outcome. This approach is in contrast to traditional RL, where the agents try to maximize the average expected reward. The proposed approach is built on top of the Distributional RL with its policy optimization maximizing the stochastic outcomes' lower bound. This modification can be applied to a range of RL algorithms. As a proof-of-concept, the approach is applied to two different RL algorithms, Soft Actor-Critic and DQN. The approach is evaluated against two challenging scenarios of pedestrians crossing with occlusion and curved roads with a limited field of view. The algorithm is trained and evaluated using the SUMO traffic simulator. The proposed approach yields much better motion planning behavior compared to conventional RL algorithms and behaves comparably to humans driving style.

Delayed rejection Hamiltonian Monte Carlo for sampling multiscale distributions arxiv:2110.00610 📈 3

Chirag Modi, Alex Barnett, Bob Carpenter

**Abstract:** The efficiency of Hamiltonian Monte Carlo (HMC) can suffer when sampling a distribution with a wide range of length scales, because the small step sizes needed for stability in high-curvature regions are inefficient elsewhere. To address this we present a delayed rejection variant: if an initial HMC trajectory is rejected, we make one or more subsequent proposals each using a step size geometrically smaller than the last. We extend the standard delayed rejection framework by allowing the probability of a retry to depend on the probability of accepting the previous proposal. We test the scheme in several sampling tasks, including multiscale model distributions such as Neal's funnel, and statistical applications. Delayed rejection enables up to five-fold performance gains over optimally-tuned HMC, as measured by effective sample size per gradient evaluation. Even for simpler distributions, delayed rejection provides increased robustness to step size misspecification. Along the way, we provide an accessible but rigorous review of detailed balance for HMC.

Algorithm Fairness in AI for Medicine and Healthcare arxiv:2110.00603 📈 3

Richard J. Chen, Tiffany Y. Chen, Jana Lipkova, Judy J. Wang, Drew F. K. Williamson, Ming Y. Lu, Sharifa Sahai, Faisal Mahmood

**Abstract:** In the current development and deployment of many artificial intelligence (AI) systems in healthcare, algorithm fairness is a challenging problem in delivering equitable care. Recent evaluation of AI models stratified across race sub-populations have revealed enormous inequalities in how patients are diagnosed, given treatments, and billed for healthcare costs. In this perspective article, we summarize the intersectional field of fairness in machine learning through the context of current issues in healthcare, outline how algorithmic biases (e.g. - image acquisition, genetic variation, intra-observer labeling variability) arise in current clinical workflows and their resulting healthcare disparities. Lastly, we also review emerging strategies for mitigating bias via decentralized learning, disentanglement, and model explainability.

Natural language understanding for logical games arxiv:2110.00558 📈 3

Adrian Groza, Cristian Nitu

**Abstract:** We developed a system able to automatically solve logical puzzles in natural language. Our solution is composed by a parser and an inference module. The parser translates the text into first order logic (FOL), while the MACE4 model finder is used to compute the models of the given FOL theory. We also empower our software agent with the capability to provide Yes/No answers to natural language questions related to each puzzle. Moreover, in line with Explainalbe Artificial Intelligence (XAI), the agent can back its answer, providing a graphical representation of the proof. The advantage of using reasoning for Natural Language Understanding (NLU) instead of Machine learning is that the user can obtain an explanation of the reasoning chain. We illustrate how the system performs on various types of natural language puzzles, including 382 knights and knaves puzzles. These features together with the overall performance rate of 80.89\% makes the proposed solution an improvement upon similar solvers for natural language understanding in the puzzles domain.

LEMON: Explainable Entity Matching arxiv:2110.00516 📈 3

Nils Barlaug

**Abstract:** State-of-the-art entity matching (EM) methods are hard to interpret, and there is significant value in bringing explainable AI to EM. Unfortunately, most popular explainability methods do not work well out of the box for EM and need adaptation. In this paper, we identify three challenges of applying local post hoc feature attribution methods to entity matching: cross-record interaction effects, non-match explanations, and variation in sensitivity. We propose our novel model-agnostic and schema-flexible method LEMON that addresses all three challenges by (i) producing dual explanations to avoid cross-record interaction effects, (ii) introducing the novel concept of attribution potential to explain how two records could have matched, and (iii) automatically choosing explanation granularity to match the sensitivity of the matcher and record pair in question. Experiments on public datasets demonstrate that the proposed method is more faithful to the matcher and does a better job of helping users understand the decision boundary of the matcher than previous work. Furthermore, user studies show that the rate at which human subjects can construct counterfactual examples after seeing an explanation from our proposed method increases from 54% to 64% for matches and from 15% to 49% for non-matches compared to explanations from a standard adaptation of LIME.

Instance Segmentation Challenge Track Technical Report, VIPriors Workshop at ICCV 2021: Task-Specific Copy-Paste Data Augmentation Method for Instance Segmentation arxiv:2110.00470 📈 3

Jahongir Yunusov, Shohruh Rakhmatov, Abdulaziz Namozov, Abdulaziz Gaybulayev, Tae-Hyong Kim

**Abstract:** Copy-Paste has proven to be a very effective data augmentation for instance segmentation which can improve the generalization of the model. We used a task-specific Copy-Paste data augmentation method to achieve good performance on the instance segmentation track of the 2nd VIPriors workshop challenge. We also applied additional data augmentation techniques including RandAugment and GridMask. Our segmentation model is the HTC detector on the CBSwin-B with CBFPN with some tweaks. This model was trained at the multi-scale mode by a random sampler on the 6x schedule and tested at the single-scale mode. By combining these techniques, we achieved 0.398 AP@0.50:0.95 with the validation set and 0.433 AP@0.50:0.95 with the test set. Finally, we reached 0.477 AP@0.50:0.95 with the test set by adding the validation set to the training data. Source code is available at https://github.com/jahongir7174/VIP2021.

Phonology Recognition in American Sign Language arxiv:2110.00453 📈 3

Federico Tavella, Aphrodite Galata, Angelo Cangelosi

**Abstract:** Inspired by recent developments in natural language processing, we propose a novel approach to sign language processing based on phonological properties validated by American Sign Language users. By taking advantage of datasets composed of phonological data and people speaking sign language, we use a pretrained deep model based on mesh reconstruction to extract the 3D coordinates of the signers keypoints. Then, we train standard statistical and deep machine learning models in order to assign phonological classes to each temporal sequence of coordinates. Our paper introduces the idea of exploiting the phonological properties manually assigned by sign language users to classify videos of people performing signs by regressing a 3D mesh. We establish a new baseline for this problem based on the statistical distribution of 725 different signs. Our best-performing models achieve a micro-averaged F1-score of 58% for the major location class and 70% for the sign type using statistical and deep learning algorithms, compared to their corresponding baselines of 35% and 39%.

SAM: A Self-adaptive Attention Module for Context-Aware Recommendation System arxiv:2110.00452 📈 3

Jiabin Liu, Zheng Wei, Zhengpin Li, Xiaojun Mao, Jian Wang, Zhongyu Wei, Qi Zhang

**Abstract:** Recently, textual information has been proved to play a positive role in recommendation systems. However, most of the existing methods only focus on representation learning of textual information in ratings, while potential selection bias induced by the textual information is ignored. In this work, we propose a novel and general self-adaptive module, the Self-adaptive Attention Module (SAM), which adjusts the selection bias by capturing contextual information based on its representation. This module can be embedded into recommendation systems that contain learning components of contextual information. Experimental results on three real-world datasets demonstrate the effectiveness of our proposal, and the state-of-the-art models with SAM significantly outperform the original ones.

Arbitrary Marginal Neural Ratio Estimation for Simulation-based Inference arxiv:2110.00449 📈 3

François Rozet, Gilles Louppe

**Abstract:** In many areas of science, complex phenomena are modeled by stochastic parametric simulators, often featuring high-dimensional parameter spaces and intractable likelihoods. In this context, performing Bayesian inference can be challenging. In this work, we present a novel method that enables amortized inference over arbitrary subsets of the parameters, without resorting to numerical integration, which makes interpretation of the posterior more convenient. Our method is efficient and can be implemented with arbitrary neural network architectures. We demonstrate the applicability of the method on parameter inference of binary black hole systems from gravitational waves observations.

Divergence-Regularized Multi-Agent Actor-Critic arxiv:2110.00304 📈 3

Kefan Su, Zongqing Lu

**Abstract:** Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process. Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Mathematically, we derive the update rule of DMAC which is naturally off-policy, guarantees a monotonic policy improvement and is not biased by the regularization. DMAC is a flexible framework and can be combined with many existing MARL algorithms. We evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and empirically show that DMAC substantially improves the performance of existing MARL algorithms.

Lightweight Transformer in Federated Setting for Human Activity Recognition arxiv:2110.00244 📈 3

Ali Raza, Kim Phuc Tran, Ludovic Koehl, Shujun Li, Xianyi Zeng, Khaled Benzaidi

**Abstract:** Human Activity Recognition (HAR) has been a challenging problem yet it needs to be solved. It will mainly be used for eldercare and healthcare as an assistive technology when ensemble with other technologies like Internet of Things(IoT). HAR can be achieved with the help of sensors, smartphones or images. Deep neural network techniques like artificial neural networks, convolutional neural networks and recurrent neural networks have been used in HAR, both in centralized and federated setting. However, these techniques have certain limitations. RNNs have limitation of parallelization, CNNS have the limitation of sequence length and they are computationally expensive. In this paper, to address the state of art challenges, we present a inertial sensors-based novel one patch transformer which gives the best of both RNNs and CNNs for Human activity recognition. We also design a testbed to collect real-time human activity data. The data collected is further used to train and test the proposed transformer. With the help of experiments, we show that the proposed transformer outperforms the state of art CNN and RNN based classifiers, both in federated and centralized setting. Moreover, the proposed transformer is computationally inexpensive as it uses very few parameter compared to the existing state of art CNN and RNN based classifier. Thus its more suitable for federated learning as it provides less communication and computational cost.

Real-Time Predictive Maintenance using Autoencoder Reconstruction and Anomaly Detection arxiv:2110.01447 📈 2

Sean Givnan, Carl Chalmers, Paul Fergus, Sandra Ortega, Tom Whalley

**Abstract:** Rotary machine breakdown detection systems are outdated and dependent upon routine testing to discover faults. This is costly and often reactive in nature. Real-time monitoring offers a solution for detecting faults without the need for manual observation. However, manual interpretation for threshold anomaly detection is often subjective and varies between industrial experts. This approach is ridged and prone to a large number of false positives. To address this issue, we propose a Machine Learning (ML) approach to model normal working operation and detect anomalies. The approach extracts key features from signals representing known normal operation to model machine behaviour and automatically identify anomalies. The ML learns generalisations and generates thresholds based on fault severity. This provides engineers with a traffic light system were green is normal behaviour, amber is worrying and red signifies a machine fault. This scale allows engineers to undertake early intervention measures at the appropriate time. The approach is evaluated on windowed real machine sensor data to observe normal and abnormal behaviour. The results demonstrate that it is possible to detect anomalies within the amber range and raise alarms before machine failure.

Deep Learning for Rain Fade Prediction in Satellite Communications arxiv:2110.00695 📈 2

Aidin Ferdowsi, David Whitefield

**Abstract:** Line of sight satellite systems, unmanned aerial vehicles, high-altitude platforms, and microwave links that operate on frequency bands such as Ka-band or higher are extremely susceptible to rain. Thus, rain fade forecasting for these systems is critical because it allows the system to switch between ground gateways proactively before a rain fade event to maintain seamless service. Although empirical, statistical, and fade slope models can predict rain fade to some extent, they typically require statistical measurements of rain characteristics in a given area and cannot be generalized to a large scale system. Furthermore, such models typically predict near-future rain fade events but are incapable of forecasting far into the future, making proactive resource management more difficult. In this paper, a deep learning (DL)-based architecture is proposed that forecasts future rain fade using satellite and radar imagery data as well as link power measurements. Furthermore, the data preprocessing and architectural design have been thoroughly explained and multiple experiments have been conducted. Experiments show that the proposed DL architecture outperforms current state-of-the-art machine learning-based algorithms in rain fade forecasting in the near and long term. Moreover, the results indicate that radar data with weather condition information is more effective for short-term prediction, while satellite data with cloud movement information is more effective for long-term predictions.

Learning through atypical ''phase transitions'' in overparameterized neural networks arxiv:2110.00683 📈 2

Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, Gabriele Perugini, Riccardo Zecchina

**Abstract:** Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex neural network models. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalisation performance. We find that there exist a gap between the SAT/UNSAT interpolation transition where solutions begin to exist and the point where algorithms start to find solutions, i.e. where accessible solutions appear. This second phase transition coincides with the discontinuous appearance of atypical solutions that are locally extremely entropic, i.e., flat regions of the weight space that are particularly solution-dense and have good generalization properties. Although exponentially rare compared to typical solutions (which are narrower and extremely difficult to sample), entropic solutions are accessible to the algorithms used in learning. We can characterize the generalization error of different solutions and optimize the Bayesian prediction, for data generated from a structurally different network. Numerical tests on observables suggested by the theory confirm that the scenario extends to realistic deep networks.

A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data arxiv:2110.00681 📈 2

Xiaowen Cao, Li Xing, Elham Majd, Hua He, Junhua Gu, Xuekui Zhang

**Abstract:** Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.

ML4C: Seeing Causality Through Latent Vicinity arxiv:2110.00637 📈 2

Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang

**Abstract:** Supervised Causal Learning (SCL) aims to learn causal relations from observational data by accessing previously seen datasets associated with ground truth causal relations. This paper presents a first attempt at addressing a fundamental question: What are the benefits from supervision and how does it benefit? Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability. Following this paradigm, we tackle the problem of SCL on discrete data and propose ML4C. The core of ML4C is a binary classifier with a novel learning target: it classifies whether an Unshielded Triple (UT) is a v-structure or not. Starting from an input dataset with the corresponding skeleton provided, ML4C orients each UT once it is classified as a v-structure. These v-structures are together used to construct the final output. To address the fundamental question of SCL, we propose a principled method for ML4C featurization: we exploit the vicinity of a given UT (i.e., the neighbors of UT in skeleton), and derive features by considering the conditional dependencies and structural entanglement within the vicinity. We further prove that ML4C is asymptotically perfect. Last but foremost, thorough experiments conducted on benchmark datasets demonstrate that ML4C remarkably outperforms other state-of-the-art algorithms in terms of accuracy, robustness, tolerance and transferability. In summary, ML4C shows promising results on validating the effectiveness of supervision for causal learning.

Factored couplings in multi-marginal optimal transport via difference of convex programming arxiv:2110.00629 📈 2

Quang Huy Tran, Hicham Janati, Ievgen Redko, Rémi Flamary, Nicolas Courty

**Abstract:** Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes.

Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum arxiv:2110.00625 📈 2

Guojing Cong, Tianyi Liu

**Abstract:** Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

Predicting erectile dysfunction after treatment for localized prostate cancer arxiv:2110.00615 📈 2

Hajar Hasannejadasl, Cheryl Roumen, Henk van der Poel, Ben Vanneste, Joep van Roermund, Katja Aben, Petros Kalendralis, Biche Osong, Lambertus Kiemeney, Inge Van Oort, Renee Verwey, Laura Hochstenbach, Esther J. Bloemen- van Gurp, Andre Dekker, Rianne R. R. Fijten

**Abstract:** While the 10-year survival rate for localized prostate cancer patients is very good (>98%), side effects of treatment may limit quality of life significantly. Erectile dysfunction (ED) is a common burden associated with increasing age as well as prostate cancer treatment. Although many studies have investigated the factors affecting erectile dysfunction (ED) after prostate cancer treatment, only limited studies have investigated whether ED can be predicted before the start of treatment. The advent of machine learning (ML) based prediction tools in oncology offers a promising approach to improve accuracy of prediction and quality of care. Predicting ED may help aid shared decision making by making the advantages and disadvantages of certain treatments clear, so that a tailored treatment for an individual patient can be chosen. This study aimed to predict ED at 1-year and 2-year post-diagnosis based on patient demographics, clinical data and patient-reported outcomes (PROMs) measured at diagnosis.

SMATE: Semi-Supervised Spatio-Temporal Representation Learning on Multivariate Time Series arxiv:2110.00578 📈 2

Jingwei Zuo, Karine Zeitouni, Yehia Taher

**Abstract:** Learning from Multivariate Time Series (MTS) has attracted widespread attention in recent years. In particular, label shortage is a real challenge for the classification task on MTS, considering its complex dimensional and sequential data structure. Unlike self-training and positive unlabeled learning that rely on distance-based classifiers, in this paper, we propose SMATE, a novel semi-supervised model for learning the interpretable Spatio-Temporal representation from weakly labeled MTS. We validate empirically the learned representation on 30 public datasets from the UEA MTS archive. We compare it with 13 state-of-the-art baseline methods for fully supervised tasks and four baselines for semi-supervised tasks. The results show the reliability and efficiency of our proposed method.

Predicting Consumer Purchasing Decision in The Online Food Delivery Industry arxiv:2110.00502 📈 2

Batool Madani, Hussam Alshraideh

**Abstract:** This transformation of food delivery businesses to online platforms has gained high attention in recent years. This due to the availability of customizing ordering experiences, easy payment methods, fast delivery, and others. The competition between online food delivery providers has intensified to attain a wider range of customers. Hence, they should have a better understanding of their customers' needs and predict their purchasing decisions. Machine learning has a significant impact on companies' bottom line. They are used to construct models and strategies in industries that rely on big data and need a system to evaluate it fast and effectively. Predictive modeling is a type of machine learning that uses various regression algorithms, analytics, and statistics to estimate the probability of an occurrence. The incorporation of predictive models helps online food delivery providers to understand their customers. In this study, a dataset collected from 388 consumers in Bangalore, India was provided to predict their purchasing decisions. Four prediction models are considered: CART and C4.5 decision trees, random forest, and rule-based classifiers, and their accuracies in providing the correct class label are evaluated. The findings show that all models perform similarly, but the C4.5 outperforms them all with an accuracy of 91.67%.

Probabilistic Robust Autoencoders for Anomaly Detection arxiv:2110.00494 📈 2

Yariv Aizenbud, Ofir Lindenbaum, Yuval Kluger

**Abstract:** Empirical observations often consist of anomalies (or outliers) that contaminate the data. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose a new type of autoencoder (AE) which we term Probabilistic Robust autoencoder (PRAE). PRAE is designed to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first describe Robust AE (RAE) as a model that aims to split the data to inlier samples from which a low dimensional representation is learned via an AE, and anomalous (outlier) samples that are excluded as they do not fit the low dimensional representation. Robust AE minimizes the reconstruction of the AE while attempting to incorporate as many observations as possible. This could be realized by subtracting from the reconstruction term an $\ell_0$ norm counting the number of selected observations. Since the $\ell_0$ norm is not differentiable, we propose two probabilistic relaxations for the RAE approach and demonstrate that they can effectively identify anomalies. We prove that the solution to PRAE is equivalent to the solution of RAE and demonstrate using extensive simulations that PRAE is at par with state-of-the-art methods for anomaly detection.

Preconditioned Plug-and-Play ADMM with Locally Adjustable Denoiser for Image Restoration arxiv:2110.00493 📈 2

Mikael Le Pendu, Christine Guillemot

**Abstract:** Plug-and-Play optimization recently emerged as a powerful technique for solving inverse problems by plugging a denoiser into a classical optimization algorithm. The denoiser accounts for the regularization and therefore implicitly determines the prior knowledge on the data, hence replacing typical handcrafted priors. In this paper, we extend the concept of plug-and-play optimization to use denoisers that can be parameterized for non-constant noise variance. In that aim, we introduce a preconditioning of the ADMM algorithm, which mathematically justifies the use of such an adjustable denoiser. We additionally propose a procedure for training a convolutional neural network for high quality non-blind image denoising that also allows for pixel-wise control of the noise standard deviation. We show that our pixel-wise adjustable denoiser, along with a suitable preconditioning strategy, can further improve the plug-and-play ADMM approach for several applications, including image completion, interpolation, demosaicing and Poisson denoising.

New Evolutionary Computation Models and their Applications to Machine Learning arxiv:2110.00468 📈 2

Mihai Oltean

**Abstract:** Automatic Programming is one of the most important areas of computer science research today. Hardware speed and capability have increased exponentially, but the software is years behind. The demand for software has also increased significantly, but it is still written in old fashion: by using humans. There are multiple problems when the work is done by humans: cost, time, quality. It is costly to pay humans, it is hard to keep them satisfied for a long time, it takes a lot of time to teach and train them and the quality of their output is in most cases low (in software, mostly due to bugs). The real advances in human civilization appeared during the industrial revolutions. Before the first revolution, most people worked in agriculture. Today, very few percent of people work in this field. A similar revolution must appear in the computer programming field. Otherwise, we will have so many people working in this field as we had in the past working in agriculture. How do people know how to write computer programs? Very simple: by learning. Can we do the same for software? Can we put the software to learn how to write software? It seems that is possible (to some degree) and the term is called Machine Learning. It was first coined in 1959 by the first person who made a computer perform a serious learning task, namely, Arthur Samuel. However, things are not so easy as in humans (well, truth to be said - for some humans it is impossible to learn how to write software). So far we do not have software that can learn perfectly to write software. We have some particular cases where some programs do better than humans, but the examples are sporadic at best. Learning from experience is difficult for computer programs. Instead of trying to simulate how humans teach humans how to write computer programs, we can simulate nature.

Predicting Flat-Fading Channels via Meta-Learned Closed-Form Linear Filters and Equilibrium Propagation arxiv:2110.00414 📈 2

Sangwoo Park, Osvaldo Simeone

**Abstract:** Predicting fading channels is a classical problem with a vast array of applications, including as an enabler of artificial intelligence (AI)-based proactive resource allocation for cellular networks. Under the assumption that the fading channel follows a stationary complex Gaussian process, as for Rayleigh and Rician fading models, the optimal predictor is linear, and it can be directly computed from the Doppler spectrum via standard linear minimum mean squared error (LMMSE) estimation. However, in practice, the Doppler spectrum is unknown, and the predictor has only access to a limited time series of estimated channels. This paper proposes to leverage meta-learning in order to mitigate the requirements in terms of training data for channel fading prediction. Specifically, it first develops an offline low-complexity solution based on linear filtering via a meta-trained quadratic regularization. Then, an online method is proposed based on gradient descent and equilibrium propagation (EP). Numerical results demonstrate the advantages of the proposed approach, showing its capacity to approach the genie-aided LMMSE solution with a small number of training data points.

Learning of Inter-Label Geometric Relationships Using Self-Supervised Learning: Application To Gleason Grade Segmentation arxiv:2110.00404 📈 2

Dwarikanath Mahapatra

**Abstract:** Segmentation of Prostate Cancer (PCa) tissues from Gleason graded histopathology images is vital for accurate diagnosis. Although deep learning (DL) based segmentation methods achieve state-of-the-art accuracy, they rely on large datasets with manual annotations. We propose a method to synthesize for PCa histopathology images by learning the geometrical relationship between different disease labels using self-supervised learning. We use a weakly supervised segmentation approach that uses Gleason score to segment the diseased regions and the resulting segmentation map is used to train a Shape Restoration Network (ShaRe-Net) to predict missing mask segments in a self-supervised manner. Using DenseUNet as the backbone generator architecture we incorporate latent variable sampling to inject diversity in the image generation process and thus improve robustness. Experiments on multiple histopathology datasets demonstrate the superiority of our method over competing image synthesis methods for segmentation tasks. Ablation studies show the benefits of integrating geometry and diversity in generating high-quality images, and our self-supervised approach with limited class-labeled data achieves similar performance as fully supervised learning.

Cellular traffic offloading via Opportunistic Networking with Reinforcement Learning arxiv:2110.00397 📈 2

Lorenzo Valerio, Raffaele Bruno, Andrea Passarella

**Abstract:** The widespread diffusion of mobile phones is triggering an exponential growth of mobile data traffic that is likely to cause, in the near future, considerable traffic overload issues even in last-generation cellular networks. Offloading part of the traffic to other networks is considered a very promising approach and, in particular, in this paper, we consider offloading through opportunistic networks of users' devices. However, the performance of this solution strongly depends on the pattern of encounters between mobile nodes, which should therefore be taken into account when designing offloading control algorithms. In this paper, we propose an adaptive offloading solution based on the Reinforcement Learning framework and we evaluate and compare the performance of two well-known learning algorithms: Actor-Critic and Q-Learning. More precisely, in our solution the controller of the dissemination process, once trained, is able to select a proper number of content replicas to be injected into the opportunistic network to guarantee the timely delivery of contents to all interested users. We show that our system based on Reinforcement Learning is able to automatically learn a very efficient strategy to reduce the traffic on the cellular network, without relying on any additional context information about the opportunistic network. Our solution achieves a higher level of offloading with respect to other state-of-the-art approaches, in a range of different mobility settings. Moreover, we show that a more refined learning solution, based on the Actor-Critic algorithm, is significantly more efficient than a simpler solution based on Q-learning.

Online Primal-Dual Algorithms with Predictions for Packing Problems arxiv:2110.00391 📈 2

Nguyen Kim Thang, Christoph Durr

**Abstract:** The domain of online algorithms with predictions has been extensively studied for different applications such as scheduling, caching (paging), clustering, ski rental, etc. Recently, Bamas et al., aiming for an unified method, have provided a primal-dual framework for linear covering problems. They extended the online primal-dual method by incorporating predictions in order to achieve a performance beyond the worst-case case analysis. In this paper, we consider this research line and present a framework to design algorithms with predictions for non-linear packing problems. We illustrate the applicability of our framework in submodular maximization and in particular ad-auction maximization in which the optimal bound is given and supporting experiments are provided.

Learn to Communicate with Neural Calibration: Scalability and Generalization arxiv:2110.00272 📈 2

Yifan Ma, Yifei Shen, Xianghao Yu, Jun Zhang, S. H. Song, Khaled B. Letaief

**Abstract:** The conventional design of wireless communication systems typically relies on established mathematical models that capture the characteristics of different communication modules. Unfortunately, such design cannot be easily and directly applied to future wireless networks, which will be characterized by large-scale ultra-dense networks whose design complexity scales exponentially with the network size. Furthermore, such networks will vary dynamically in a significant way, which makes it intractable to develop comprehensive analytical models. Recently, deep learning-based approaches have emerged as potential alternatives for designing complex and dynamic wireless systems. However, existing learning-based methods have limited capabilities to scale with the problem size and to generalize with varying network settings. In this paper, we propose a scalable and generalizable neural calibration framework for future wireless system design, where a neural network is adopted to calibrate the input of conventional model-based algorithms. Specifically, the backbone of a traditional time-efficient algorithm is integrated with deep neural networks to achieve a high computational efficiency, while enjoying enhanced performance. The permutation equivariance property, carried out by the topological structure of wireless systems, is furthermore utilized to develop a generalizable neural network architecture. The proposed neural calibration framework is applied to solve challenging resource management problems in massive multiple-input multiple-output (MIMO) systems. Simulation results will show that the proposed neural calibration approach enjoys significantly improved scalability and generalization compared with the existing learning-based methods.

The Complexity of Learning Approval-Based Multiwinner Voting Rules arxiv:2110.00254 📈 2

Ioannis Caragiannis, Karl Fehrs

**Abstract:** We study the PAC learnability of multiwinner voting, focusing on the class of approval-based committee scoring (ABCS) rules. These are voting rules applied on profiles with approval ballots, where each voter approves some of the candidates. ABCS rules adapt positional scoring rules in single-winner voting by assuming that each committee of $k$ candidates collects from each voter a score, that depends on the size of the voter's ballot and on the size of its intersection with the committee. Then, committees of maximum score are the winning ones. Our goal is to learn a target rule (i.e., to learn the corresponding scoring function) using information about the winning committees of a small number of sampled profiles. Despite the existence of exponentially many outcomes compared to single-winner elections, we show that the sample complexity is still low: a polynomial number of samples carries enough information for learning the target committee with high confidence and accuracy. Unfortunately, even simple tasks that need to be solved for learning from these samples are intractable. We prove that deciding whether there exists some ABCS rule that makes a given committee winning in a given profile is a computationally hard problem. Our results extend to the class of sequential Thiele rules, which have received attention due to their simplicity.

Spiking Hyperdimensional Network: Neuromorphic Models Integrated with Memory-Inspired Framework arxiv:2110.00214 📈 2

Zhuowen Zou, Haleh Alimohamadi, Farhad Imani, Yeseong Kim, Mohsen Imani

**Abstract:** Recently, brain-inspired computing models have shown great potential to outperform today's deep learning solutions in terms of robustness and energy efficiency. Particularly, Spiking Neural Networks (SNNs) and HyperDimensional Computing (HDC) have shown promising results in enabling efficient and robust cognitive learning. Despite the success, these two brain-inspired models have different strengths. While SNN mimics the physical properties of the human brain, HDC models the brain on a more abstract and functional level. Their design philosophies demonstrate complementary patterns that motivate their combination. With the help of the classical psychological model on memory, we propose SpikeHD, the first framework that fundamentally combines Spiking neural network and hyperdimensional computing. SpikeHD generates a scalable and strong cognitive learning system that better mimics brain functionality. SpikeHD exploits spiking neural networks to extract low-level features by preserving the spatial and temporal correlation of raw event-based spike data. Then, it utilizes HDC to operate over SNN output by mapping the signal into high-dimensional space, learning the abstract information, and classifying the data. Our extensive evaluation on a set of benchmark classification problems shows that SpikeHD provides the following benefit compared to SNN architecture: (1) significantly enhance learning capability by exploiting two-stage information processing, (2) enables substantial robustness to noise and failure, and (3) reduces the network size and required parameters to learn complex information.

Natural Computational Architectures for Cognitive Info-Communication arxiv:2110.06339 📈 1

Gordana Dodig-Crnkovic

**Abstract:** Recent comprehensive overview of 40 years of research in cognitive architectures, (Kotseruba and Tsotsos 2020), evaluates modelling of the core cognitive abilities in humans, but only marginally addresses biologically plausible approaches based on natural computation. This mini review presents a set of perspectives and approaches which have shaped the development of biologically inspired computational models in the recent past that can lead to the development of biologically more realistic cognitive architectures. For describing continuum of natural cognitive architectures, from basal cellular to human-level cognition, we use evolutionary info-computational framework, where natural/ physical/ morphological computation leads to evolution of increasingly complex cognitive systems. Forty years ago, when the first cognitive architectures have been proposed, understanding of cognition, embodiment and evolution was different. So was the state of the art of information physics, bioinformatics, information chemistry, computational neuroscience, complexity theory, self-organization, theory of evolution, information and computation. Novel developments support a constructive interdisciplinary framework for cognitive architectures in the context of computing nature, where interactions between constituents at different levels of organization lead to complexification of agency and increased cognitive capacities. We identify several important research questions for further investigation that can increase understanding of cognition in nature and inspire new developments of cognitive technologies. Recently, basal cell cognition attracted a lot of interest for its possible applications in medicine, new computing technologies, as well as micro- and nanorobotics.

One Timestep is All You Need: Training Spiking Neural Networks with Ultra Low Latency arxiv:2110.05929 📈 1

Sayeed Shafayet Chowdhury, Nitin Rathi, Kaushik Roy

**Abstract:** Spiking Neural Networks (SNNs) are energy efficient alternatives to commonly used deep neural networks (DNNs). Through event-driven information processing, SNNs can reduce the expensive compute requirements of DNNs considerably, while achieving comparable performance. However, high inference latency is a significant hindrance to the edge deployment of deep SNNs. Computation over multiple timesteps not only increases latency as well as overall energy budget due to higher number of operations, but also incurs memory access overhead of fetching membrane potentials, both of which lessen the energy benefits of SNNs. To overcome this bottleneck and leverage the full potential of SNNs, we propose an Iterative Initialization and Retraining method for SNNs (IIR-SNN) to perform single shot inference in the temporal axis. The method starts with an SNN trained with T timesteps (T>1). Then at each stage of latency reduction, the network trained at previous stage with higher timestep is utilized as initialization for subsequent training with lower timestep. This acts as a compression method, as the network is gradually shrunk in the temporal domain. In this paper, we use direct input encoding and choose T=5, since as per literature, it is the minimum required latency to achieve satisfactory performance on ImageNet. The proposed scheme allows us to obtain SNNs with up to unit latency, requiring a single forward pass during inference. We achieve top-1 accuracy of 93.05%, 70.15% and 67.71% on CIFAR-10, CIFAR-100 and ImageNet, respectively using VGG16, with just 1 timestep. In addition, IIR-SNNs perform inference with 5-2500X reduced latency compared to other state-of-the-art SNNs, maintaining comparable or even better accuracy. Furthermore, in comparison with standard DNNs, the proposed IIR-SNNs provide25-33X higher energy efficiency, while being comparable to them in classification performance.

Complex Spin Hamiltonian Represented by Artificial Neural Network arxiv:2110.00724 📈 1

Hongyu Yu, Changsong Xu, Feng Lou, L. Bellaiche, Zhenpeng Hu, Xingao Gong, Hongjun Xiang

**Abstract:** The effective spin Hamiltonian method is widely adopted to simulate and understand the behavior of magnetism. However, the magnetic interactions of some systems, such as itinerant magnets, are too complex to be described by any explicit function, which prevents an accurate description of magnetism in such systems. Here, we put forward a machine learning (ML) approach, applying an artificial neural network (ANN) and a local spin descriptor to develop effective spin potentials for any form of interaction. The constructed Hamiltonians include an explicit Heisenberg part and an implicit non-linear ANN part. Such a method successfully reproduces artificially constructed models and also sufficiently describe the itinerant magnetism of bulk Fe3GeTe2. Our work paves a new way for investigating complex magnetic phenomena (e.g., skyrmions) of magnetic materials.

One-Bit Matrix Completion with Differential Privacy arxiv:2110.00719 📈 1

Zhengpin Li, Zheng Wei, Xiaojun Mao, Jian Wang

**Abstract:** Matrix completion is a prevailing collaborative filtering method for recommendation systems that requires the data offered by users to provide personalized service. However, due to insidious attacks and unexpected inference, the release of user data often raises serious privacy concerns. Most of the existing solutions focus on improving the privacy guarantee for general matrix completion. As a special case, in recommendation systems where the observations are binary, one-bit matrix completion covers a broad range of real-life situations. In this paper, we propose a novel framework for one-bit matrix completion under the differential privacy constraint. In this framework, we develop several perturbation mechanisms and analyze the privacy-accuracy trade-off offered by each mechanism. The experiments conducted on both synthetic and real-world datasets demonstrate that our proposed approaches can maintain high-level privacy with little loss of completion accuracy.

Multi-lane Cruising Using Hierarchical Planning and Reinforcement Learning arxiv:2110.00650 📈 1

Kasra Rezaee, Peyman Yadmellat, Masoud S. Nosrati, Elmira Amirloo Abolfathi, Mohammed Elmahgiubi, Jun Luo

**Abstract:** Competent multi-lane cruising requires using lane changes and within-lane maneuvers to achieve good speed and maintain safety. This paper proposes a design for autonomous multi-lane cruising by combining a hierarchical reinforcement learning framework with a novel state-action space abstraction. While the proposed solution follows the classical hierarchy of behavior decision, motion planning and control, it introduces a key intermediate abstraction within the motion planner to discretize the state-action space according to high level behavioral decisions. We argue that this design allows principled modular extension of motion planning, in contrast to using either monolithic behavior cloning or a large set of hand-written rules. Moreover, we demonstrate that our state-action space abstraction allows transferring of the trained models without retraining from a simulated environment with virtually no dynamics to one with significantly more realistic dynamics. Together, these results suggest that our proposed hierarchical architecture is a promising way to allow reinforcement learning to be applied to complex multi-lane cruising in the real world.

How To Not Drive: Learning Driving Constraints from Demonstration arxiv:2110.00645 📈 1

Kasra Rezaee, Peyman Yadmellat

**Abstract:** We propose a new scheme to learn motion planning constraints from human driving trajectories. Behavioral and motion planning are the key components in an autonomous driving system. The behavioral planning is responsible for high-level decision making required to follow traffic rules and interact with other road participants. The motion planner role is to generate feasible, safe trajectories for a self-driving vehicle to follow. The trajectories are generated through an optimization scheme to optimize a cost function based on metrics related to smoothness, movability, and comfort, and subject to a set of constraints derived from the planned behavior, safety considerations, and feasibility. A common practice is to manually design the cost function and constraints. Recent work has investigated learning the cost function from human driving demonstrations. While effective, the practical application of such approaches is still questionable in autonomous driving. In contrast, this paper focuses on learning driving constraints, which can be used as an add-on module to existing autonomous driving solutions. To learn the constraint, the planning problem is formulated as a constrained Markov Decision Process, whose elements are assumed to be known except the constraints. The constraints are then learned by learning the distribution of expert trajectories and estimating the probability of optimal trajectories belonging to the learned distribution. The proposed scheme is evaluated using NGSIM dataset, yielding less than 1\% collision rate and out of road maneuvers when the learned constraints is used in an optimization-based motion planner.

STRONG: Synchronous and asynchronous RObust Network localization, under Non-Gaussian noise arxiv:2110.00594 📈 1

Claudia Soares, João Gomes

**Abstract:** Real-world network applications must cope with failing nodes, malicious attacks, or nodes facing corrupted data - data classified as outliers. Our work addresses these concerns in the scope of the sensor network localization problem where, despite the abundance of technical literature, prior research seldom considered outlier data. We propose robust, fast, and distributed network localization algorithms, resilient to high-power noise, but also precise under regular Gaussian noise. We use a Huber M-estimator, thus obtaining a robust (but nonconvex) optimization problem. We convexify and change the problem representation, to allow for distributed robust localization algorithms: a synchronous distributed method that has optimal convergence rate and an asynchronous one with proven convergence guarantees. A major highlight of our contribution lies on the fact that we pay no price for provable distributed computation neither in accuracy, nor in communication cost or convergence speed. Simulations showcase the superior performance of our algorithms, both in the presence of outliers and under regular Gaussian noise: our method exceeds the accuracy of alternative approaches, distributed and centralized, even under heavy additive and multiplicative outlier noise.

Weight Vector Tuning and Asymptotic Analysis of Binary Linear Classifiers arxiv:2110.00567 📈 1

Lama B. Niyazi, Abla Kammoun, Hayssam Dahrouj, Mohamed-Slim Alouini, Tareq Al-Naffouri

**Abstract:** Unlike its intercept, a linear classifier's weight vector cannot be tuned by a simple grid search. Hence, this paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar which controls the trade-off between conflicting informative and noisy terms. By varying this parameter, the original weight vector is modified in a meaningful way. Applying this method to a number of linear classifiers under a variety of data dimensionality and sample size settings reveals that the classification performance loss due to non-optimal native hyperparameters can be compensated for by weight vector tuning. This yields computational savings as the proposed tuning method reduces to tuning a scalar compared to tuning the native hyperparameter, which may involve repeated weight vector generation along with its burden of optimization, dimensionality reduction, etc., depending on the classifier. It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise. Proceeding from this second finding, an asymptotic study of the misclassification probability of the parameterized LDA classifier in the growth regime where the data dimensionality and sample size are comparable is conducted. Using random matrix theory, the misclassification probability is shown to converge to a quantity that is a function of the true statistics of the data. Additionally, an estimator of the misclassification probability is derived. Finally, computationally efficient tuning of the parameter using this estimator is demonstrated on real data.

A Cramér Distance perspective on Non-crossing Quantile Regression in Distributional Reinforcement Learning arxiv:2110.00535 📈 1

Alix Lhéritier, Nicolas Bondoux

**Abstract:** Distributional reinforcement learning (DRL) extends the value-based approach by using a deep convolutional network to approximate the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile-based methods like QR-DQN project arbitrary distributions onto a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance, however, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Recently, monotonicity constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramér distance yields a projection that coincides with the 1-Wasserstein one and that, under monotonicity constraints, the squared Cramér and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a novel non-crossing neural architecture that allows a good training performance using a novel algorithm to compute the Cramér distance, yielding significant improvements over QR-DQN in a number of games of the standard Atari 2600 benchmark.

SOUL: An Energy-Efficient Unsupervised Online Learning Seizure Detection Classifier arxiv:2110.02169 📈 0

Adelson Chua, Michael I. Jordan, Rikky Muller

**Abstract:** Implantable devices that record neural activity and detect seizures have been adopted to issue warnings or trigger neurostimulation to suppress epileptic seizures. Typical seizure detection systems rely on high-accuracy offline-trained machine learning classifiers that require manual retraining when seizure patterns change over long periods of time. For an implantable seizure detection system, a low power, at-the-edge, online learning algorithm can be employed to dynamically adapt to the neural signal drifts, thereby maintaining high accuracy without external intervention. This work proposes SOUL: Stochastic-gradient-descent-based Online Unsupervised Logistic regression classifier. After an initial offline training phase, continuous online unsupervised classifier updates are applied in situ, which improves sensitivity in patients with drifting seizure features. SOUL was tested on two human electroencephalography (EEG) datasets: the CHB-MIT scalp EEG dataset, and a long (>100 hours) NeuroVista intracranial EEG dataset. It was able to achieve an average sensitivity of 97.5% and 97.9% for the two datasets respectively, at >95% specificity. Sensitivity improved by at most 8.2% on long-term data when compared to a typical seizure detection classifier. SOUL was fabricated in TSMC's 28 nm process occupying 0.1 mm2 and achieves 1.5 nJ/classification energy efficiency, which is at least 24x more efficient than state-of-the-art.

Prediction of Energy Consumption for Variable Customer Portfolios Including Aleatoric Uncertainty Estimation arxiv:2110.02166 📈 0

Oliver Mey, André Schneider, Olaf Enge-Rosenblatt, Yesnier Bravo, Pit Stenzel

**Abstract:** Using hourly energy consumption data recorded by smart meters, retailers can estimate the day-ahead energy consumption of their customer portfolio. Deep neural networks are especially suited for this task as a huge amount of historical consumption data is available from smart meter recordings to be used for model training. Probabilistic layers further enable the estimation of the uncertainty of the consumption forecasts. Here, we propose a method to calculate hourly day-ahead energy consumption forecasts which include an estimation of the aleatoric uncertainty. To consider the statistical properties of energy consumption values, the aleatoric uncertainty is modeled using lognormal distributions whose parameters are calculated by deep neural networks. As a result, predictions of the hourly day-ahead energy consumption of single customers are represented by random variables drawn from lognormal distributions obtained as output from the neural network. We further demonstrate, how these random variables corresponding to single customers can be aggregated to probabilistic forecasts of customer portfolios of arbitrary composition.

A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines -- From Medical to Remote Sensing arxiv:2110.01442 📈 0

Ankan Dash, Junyi Ye, Guiling Wang

**Abstract:** We look into Generative Adversarial Network (GAN), its prevalent variants and applications in a number of sectors. GANs combine two neural networks that compete against one another using zero-sum game theory, allowing them to create much crisper and discrete outputs. GANs can be used to perform image processing, video generation and prediction, among other computer vision applications. GANs can also be utilised for a variety of science-related activities, including protein engineering, astronomical data processing, remote sensing image dehazing, and crystal structure synthesis. Other notable fields where GANs have made gains include finance, marketing, fashion design, sports, and music. Therefore in this article we provide a comprehensive overview of the applications of GANs in a wide variety of disciplines. We first cover the theory supporting GAN, GAN variants, and the metrics to evaluate GANs. Then we present how GAN and its variants can be applied in twelve domains, ranging from STEM fields, such as astronomy and biology, to business fields, such as marketing and finance, and to arts, such as music. As a result, researchers from other fields may grasp how GANs work and apply them to their own study. To the best of our knowledge, this article provides the most comprehensive survey of GAN's applications in different fields.

A Deep Learning Approach To Dead-Reckoning Navigation For Autonomous Underwater Vehicles With Limited Sensor Payloads arxiv:2110.00661 📈 0

Ivar Bjørgo Saksvik, Alex Alcocer, Vahid Hassani

**Abstract:** This paper presents a deep learning approach to aid dead-reckoning (DR) navigation using a limited sensor suite. A Recurrent Neural Network (RNN) was developed to predict the relative horizontal velocities of an Autonomous Underwater Vehicle (AUV) using data from an IMU, pressure sensor, and control inputs. The RNN network is trained using experimental data, where a doppler velocity logger (DVL) provided ground truth velocities. The predictions of the relative velocities were implemented in a dead-reckoning algorithm to approximate north and east positions. The studies in this paper were twofold I) Experimental data from a Long-Range AUV was investigated. Datasets from a series of surveys in Monterey Bay, California (U.S) were used to train and test the RNN network. II) The second study explore datasets generated by a simulated autonomous underwater glider. Environmental variables e.g ocean currents were implemented in the simulation to reflect real ocean conditions. The proposed neural network approach to DR navigation was compared to the on-board navigation system and ground truth simulated positions.

Evolved neuromorphic radar-based altitude controller for an autonomous open-source blimp arxiv:2110.00646 📈 0

Marina González-Álvarez, Julien Dupeyroux, Federico Corradi, Guido de Croon

**Abstract:** Robotic airships offer significant advantages in terms of safety, mobility, and extended flight times. However, their highly restrictive weight constraints pose a major challenge regarding the available computational power to perform the required control tasks. Spiking neural networks (SNNs) are a promising research direction for addressing this problem. By mimicking the biological process for transferring information between neurons using spikes or impulses, they allow for low power consumption and asynchronous event-driven processing. In this paper, we propose an evolved altitude controller based on a SNN for a robotic airship which relies solely on the sensory feedback provided by an airborne radar. Starting from the design of a lightweight, low-cost, open-source airship, we also present a SNN-based controller architecture, an evolutionary framework for training the network in a simulated environment, and a control scheme for ameliorating the gap with reality. The system's performance is evaluated through real-world experiments, demonstrating the advantages of our approach by comparing it with an artificial neural network and a linear controller. The results show an accurate tracking of the altitude command with an efficient control effort.

A survey on active noise control techniques -- Part I: Linear systems arxiv:2110.00531 📈 0

Lu Lu, Kai-Li Yin, Rodrigo C. de Lamare, Zongsheng Zheng, Yi Yu, Xiaomin Yang, Badong Chen

**Abstract:** Active noise control (ANC) is an effective way for reducing the noise level in electroacoustic or electromechanical systems. Since its first introduction in 1936, this approach has been greatly developed. This paper focuses on discussing the development of ANC techniques over the past decade. Linear ANC algorithms, including the celebrated filtered-x least-mean-square (FxLMS)-based algorithms and distributed ANC algorithms, are investigated and evaluated. Nonlinear ANC (NLANC) techniques, such as functional link artificial neural network (FLANN)-based algorithms, are pursued in Part II. Furthermore, some novel methods and applications of ANC emerging in the past decade are summarized. Finally, future research challenges regarding the ANC technique are discussed.

Next Page